Re: [ANNOUNCE] New PMC Member: Alexey Romanenko

2020-06-16 Thread rahul patwari
Congrats Alexey! On Tue, Jun 16, 2020 at 9:27 PM Ismaël Mejía wrote: > Please join me and the rest of Beam PMC in welcoming Alexey Romanenko as > our > newest PMC member. > > Alexey has significantly contributed to the project in different ways: new > features and improvements in the Spark

Re: [VOTE] Release 2.21.0, release candidate #1

2020-05-19 Thread rahul patwari
y 19, 2020 at 8:13 PM rahul patwari > wrote: > >> Hi, >> >> Can the PR: https://github.com/apache/beam/pull/11609 be cherry-picked >> for 2.21.0 release? >> If not, the fix version has to be changed for BEAM-9887 >> <https://issues.apache.org/jira/bro

Re: [VOTE] Release 2.21.0, release candidate #1

2020-05-19 Thread rahul patwari
Hi, Can the PR: https://github.com/apache/beam/pull/11609 be cherry-picked for 2.21.0 release? If not, the fix version has to be changed for BEAM-9887 . Regards, Rahul On Wed, May 20, 2020 at 6:05 AM Ahmet Altay wrote: > +1, I validated python

Re: Parallelism in Combine.GroupedValues

2020-05-12 Thread rahul patwari
ning happens in parallel is when the withFewKeys > hint is used on the combiner or when there is partial combining[1] > happening on the mapper side before the grouping operation. > > 1: https://s.apache.org/beam-runner-api-combine-model > > On Tue, May 12, 2020 at 7:05 AM r

Parallelism in Combine.GroupedValues

2020-05-12 Thread rahul patwari
Hi, In the Javadoc for Combine.GroupedValues[1], it has been described that *combining the values associated with a single key can happen in parallel*. The logic to combine values associated with a key can be provided by CombineFnWithContext (or) CombineFn. Both CombineFnWithContext.apply()[2]

Re: Jenkins jobs not running for my PR 10438

2020-05-11 Thread rahul patwari
Hi, Can you please trigger pre-commit checks for https://github.com/apache/beam/pull/11581 Thanks, Rahul On Tue, May 12, 2020 at 7:12 AM Ahmet Altay wrote: > Done for both Yoshiki and Tomo's PRs. > > On Mon, May 11, 2020 at 6:33 PM Tomo Suzuki wrote: > >> Hi Beam committers, >> >> Would you

Re: NPE in Calcite dialect when input PCollection has logical type in schema, from JdbcIO Transform

2020-05-01 Thread rahul patwari
logical types can be moved to org.apache.beam.sdk.schemas.logicaltypes or use the logical types which are already defined in this package, so that these IOs can be used with BeamSql. On Sat, May 2, 2020 at 3:00 AM Brian Hulette wrote: > > > On Thu, Apr 30, 2020 at 11:26 PM rahul patwari

NPE in Calcite dialect when input PCollection has logical type in schema, from JdbcIO Transform

2020-05-01 Thread rahul patwari
Hi, A JIRA ticket is raised to track this bug: BEAM-8307 I have raised a PR: https://github.com/apache/beam/pull/11581 to fix the issue. This PR takes care of using BeamSql with JdbcIO. I would be interested to contribute if any other IOs

Re: Jenkins jobs not running for my PR 10438

2020-04-30 Thread rahul patwari
Hi Committers, Can you please trigger tests for https://github.com/apache/beam/pull/11569 and https://github.com/apache/beam/pull/11581 Thanks, Rahul On Tue, 28 Apr 2020, 10:58 pm Alexey Romanenko, wrote: > Thanks Udi! I'll track for updates on this. > > On 28 Apr 2020, at 19:16, Udi Meiri

Latency in advancing Spark Watermarks

2020-03-19 Thread rahul patwari
Hi, *Usage Info*: We are using Beam: 2.16.0, Spark: 2.4.2 We are running Spark on Kubernetes. We are using Spark Streaming(legacy) Runner with Beam Java SDK The Pipeline has been run with default configurations i.e. default configurations for SparkPipelineOptions. *Issue*: When a Beam Pipeline

Re: Issue with KafkaIO for list of topics

2020-02-28 Thread rahul patwari
this. > > Thanks and regards, > Maulik > > On Fri, Feb 28, 2020 at 3:03 PM rahul patwari > wrote: > >> Hi Maulik, >> >> This seems like an issue with Watermark. >> According to >> https://github.com/apache/beam/blob/f0930f958d47042948c06041e074

Re: Issue with KafkaIO for list of topics

2020-02-28 Thread rahul patwari
Hi Maulik, This seems like an issue with Watermark. According to https://github.com/apache/beam/blob/f0930f958d47042948c06041e074ef9f1b0872d9/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java#L240 , If there are multiple partitions (or) multiple topics,

Re: Pointers on Contributing to Structured Streaming Spark Runner

2019-09-18 Thread rahul patwari
t;> Sure, and great ! Thanks for proposing ! >> If you want details, here is the presentation I did 30 mins ago at the >> apachecon. You will find the video on youtube shortly but in the meantime, >> here is my presentation slides. >> >> And here is the structu

Beam Summit Videos in youtube

2019-09-18 Thread rahul patwari
Hi, The videos of Beam Summit that has happened recently have disappeared from YouTube Apache Beam channel. Is uploading the videos a WIP? Thanks, Rahul

Re: Watermark is lagging in Spark Runner with kafkaIO

2019-09-10 Thread rahul patwari
Forgot to mention: A FixedWindow of duration 1 minute is applied before applying SqlTransform. On Tue, Sep 10, 2019 at 6:03 PM rahul patwari wrote: > Hi, > I am facing this issue too. > +dev > > Here is the Pipeline that we are using(providing a very simple pipeline to > h

Re: Watermark is lagging in Spark Runner with kafkaIO

2019-09-10 Thread rahul patwari
Hi, I am facing this issue too. +dev Here is the Pipeline that we are using(providing a very simple pipeline to highlight the issue): KafkaSource -> SqlTransform -> KafkaSink We are reading from a single topic in KafkaSource with a single partition. Here is the data that we are producing to

Re: Inconsistent Results with GroupIntoBatches PTransform

2019-08-09 Thread rahul patwari
you using? There was a bug [1], which was fixed > in 2.14.0. This bug could cause what you observe. > > Jan > > [1] https://issues.apache.org/jira/browse/BEAM-7269 > On 8/9/19 10:35 AM, rahul patwari wrote: > > Hi Robert, > > When PCollection is created using "Create.of

Re: Inconsistent Results with GroupIntoBatches PTransform

2019-08-09 Thread rahul patwari
quot;, the results are always correct. Regards, Rahul On Fri, Aug 9, 2019 at 1:05 PM Robert Bradshaw wrote: > Could you clarify what you mean by "inconsistent" and "incorrect"? Are > elements missing/duplicated, or just batched differently? > > On Fri, Aug 9, 201

Re: Inconsistent Results with GroupIntoBatches PTransform

2019-08-08 Thread rahul patwari
; > Are you setting --streaming when executing? > > On Thu, Aug 8, 2019 at 10:23 AM rahul patwari > wrote: > >> Hi, >> >> I am getting inconsistent results when using GroupIntoBatches PTransform. >> I am using Create.of() PTransform to create a PCollection from in-me

Inconsistent Results with GroupIntoBatches PTransform

2019-08-08 Thread rahul patwari
Hi, I am getting inconsistent results when using GroupIntoBatches PTransform. I am using Create.of() PTransform to create a PCollection from in-memory. When a coder is given with Create.of() PTransform, I am facing the issue. If the coder is not provided, the results are consistent and

Re: Enhancement for Joining Unbounded PCollections of different WindowFns

2019-07-26 Thread rahul patwari
t;>> >>> Actually I was still referring to make "LookupStream" as >>> PCollectionView to perform sideinput join, which then doesn't have mismatch >>> WindowFn problem. Otherwise, we shouldn't check special case of WindowFn to >>> decide if perform a sideinput

Re: Enhancement for Joining Unbounded PCollections of different WindowFns

2019-07-26 Thread rahul patwari
heck special case of WindowFn to decide > if perform a sideinput join for two unbounded PCollection when their > WindowFn does not match. > > And "data completeness" really means is sideinput is triggered so it could > change, and then the question is when sideinput is

Re: Enhancement for Joining Unbounded PCollections of different WindowFns

2019-07-25 Thread rahul patwari
uts> >> in >> BeamSQL, such sidinput join can be done that way. At least it worth >> exploring it until we identify blockers. I also think this pattern is >> already useful to users. >> >> In terms of Join schematic, I think it's hard to reason

Re: Enhancement for Joining Unbounded PCollections of different WindowFns

2019-07-25 Thread rahul patwari
Collection has window assigned, the > window columns are added before the SQL is applied. It is a bit strange but > might enable your use. > > Kenn > > On Mon, Jul 22, 2019 at 10:39 AM rahul patwari > wrote: > >> Hi, >> >> Beam currently doesn't support Join

Re: Stateful ParDo on Non-Keyed PCollection

2019-07-25 Thread rahul patwari
[per Key, Per Window] Processing. On Thu, Jul 25, 2019 at 10:08 PM Reuven Lax wrote: > Have you looked at the GroupIntoBatches transform? > > On Thu, Jul 25, 2019 at 9:34 AM rahul patwari > wrote: > >> So, If an RPC call has to be performed for a batch of >> Rows(PC

Re: Stateful ParDo on Non-Keyed PCollection

2019-07-25 Thread rahul patwari
hen apply a > Stateful DoFn, though in that case all elements would get processed on > the same worker.) > > On Thu, Jul 25, 2019 at 6:06 PM rahul patwari > wrote: > > > > Hi, > > > > https://beam.apache.org/blog/2017/02/13/stateful-processing.html gives >

Stateful ParDo on Non-Keyed PCollection

2019-07-25 Thread rahul patwari
Hi, https://beam.apache.org/blog/2017/02/13/stateful-processing.html gives an example of assigning an arbitrary-but-consistent index to each element on a per key-and-window basis. If the Stateful ParDo is applied on a Non-Keyed PCollection, say, PCollection with Fixed Windows, the state is

Enhancement for Joining Unbounded PCollections of different WindowFns

2019-07-22 Thread rahul patwari
Hi, Beam currently doesn't support Join of Unbounded PCollections of different WindowFns ( https://beam.apache.org/documentation/programming-guide/#groupbykey-and-unbounded-pcollections ). BeamSql performs [Unbounded PCollection] JOIN [Bounded PCollection], by performing 'SideInputJoin' with

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-18 Thread rahul patwari
ollectionView? go sideinput join. > > > [1]: > https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L185 > > -Rui > > On Thu, Jul 18, 2019 at 7:52 AM rahul patwari > wrote: > >>

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-17 Thread rahul patwari
t;> https://beam.apache.org/documentation/patterns/side-input-patterns/. >> >> Please note in the DoFn that feeds the View.asSingleton() you will need >> to manually call BigQuery using the BigQuery client. >> >> Regards >> >> Reza >> >> O

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread rahul patwari
code example ) >>> https://beam.apache.org/documentation/patterns/side-input-patterns/. >>> >>> Please note in the DoFn that feeds the View.asSingleton() you will need >>> to manually call BigQuery using the BigQuery client. >>> >>> Regards >

Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread rahul patwari
Hi, we are following [*Pattern: Slowly-changing lookup cache*] from https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1 We have a use case to read slowly changing bounded data as a PCollection along with the main PCollection from Kafka(windowed) and

Re: PR#6675 Updates

2019-07-06 Thread rahul patwari
On Fri 5 Jul, 2019, 9:25 PM Ismaël Mejía, wrote: > This is a holiday week in the US and a good chunk of the people in > the project have been busy between Beam summit and other events in the > last days, this is why reviews are taking longer than expected. Sorry, > next week most things will be

Re: Custom Watermark Instance being created multiple times for KafkaIO

2019-05-22 Thread rahul patwari
/sdk/io/kafka/CustomTimestampPolicyWithLimitedDelay.java#L49 Thanks, Rahul On Thu, May 23, 2019 at 12:23 AM Lukasz Cwik wrote: > > > On Wed, May 22, 2019 at 11:17 AM rahul patwari > wrote: > >> will watermark also get checkpointed by default along with the offset of >&

Re: Custom Watermark Instance being created multiple times for KafkaIO

2019-05-22 Thread rahul patwari
rahul patwari wrote: > Hi, > > We are using withTimestampPolicyFactory > <https://beam.apache.org/releases/javadoc/2.11.0/org/apache/beam/sdk/io/kafka/KafkaIO.Read.html#withTimestampPolicyFactory-org.apache.beam.sdk.io.kafka.TimestampPolicyFactory-> > (TimestampP

Custom Watermark Instance being created multiple times for KafkaIO

2019-05-21 Thread rahul patwari
Hi, We are using withTimestampPolicyFactory (TimestampPolicyFactory

Re: NullPointerException - Session windows with Lateness in FlinkRunner

2019-03-27 Thread rahul patwari
+dev On Wed 27 Mar, 2019, 9:47 PM rahul patwari, wrote: > Hi, > I am using Beam 2.11.0, Runner - beam-runners-flink-1.7, Flink Cluster - > 1.7.2. > > I have this flow in my pipeline: > KafkaSource(withCreateTime()) --> ApplyWindow(SessionWindow with > gapDuration=1 Mi