[jira] [Commented] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn

2020-06-08 Thread josson paul kalapparambath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128687#comment-17128687
 ] 

josson paul kalapparambath commented on BEAM-9977:
--

[~boyuanz] . Looks like I don't need BEAM-10123.   But I need few more 
clarifiations.

Let us say the pipeline failed and restarted.

1) WatchGrothFn will fetch *ALL* the Kafka partitions again and sends it 
further down. Now,  does ReadFromKafkaViaSDF get each partition and start from 
where it check pointed last?.

 

> Build Kafka Read on top of Java SplittableDoFn
> --
>
> Key: BEAM-9977
> URL: https://issues.apache.org/jira/browse/BEAM-9977
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: P2
>  Time Spent: 10h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn

2020-06-08 Thread Boyuan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128533#comment-17128533
 ] 

Boyuan Zhang commented on BEAM-9977:


Hi Josson,

BEAM-10123  is not required for checkpoint. In my implementation, checkpoint 
happens when the kafka consumer polls empty result, or the runner issues a 
checkpoint request(which is supported by dataflow runner_v2). Would you like to 
share more details about why you need BEAM-10123  in your application?

Let's also discuss about your questions here : )
>From BEAM-727, it seems like you are going to create a growthFn, which takes 
>topics as input and emits new TopicPartition. So the pipeline looks like:
``` java
PCollection> output = pipeline
   
.apply(Create.of(topic1, topic2))
   
.apply(your grow transform)
   
.apply(ReadFromKafkaViaSDF);

```

1) How do I make sure that the 'growthFn' emits only the new Kafka partitions. 
It happens in KafkaNewPartitonPolFn class

The WatchGrowthFn will help you do this: 
https://github.com/apache/beam/blob/4743e131edadad42555e605be803e26cb37b7ce6/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L864
 

2) In the KafkaPartitionConsumer class, do I have to take care of Checkpointing 
the Kafka offset (so that, in case of pipeline faliure, the Kafka data will be 
read from where it stoped)

BEAM-10123 should take care of this case.

3) If Kafka partitions matches with parallelism of the Pipeline, how do I make 
sure that KafkaPartitionConsumer instance is distributed across parallel 
instances (Does it taken care automatically by the Runner. We are using Flink 
Runner.)

The distribution will be done automatically by the runner.

> Build Kafka Read on top of Java SplittableDoFn
> --
>
> Key: BEAM-9977
> URL: https://issues.apache.org/jira/browse/BEAM-9977
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: P2
>  Time Spent: 10h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn

2020-06-08 Thread josson paul kalapparambath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128479#comment-17128479
 ] 

josson paul kalapparambath commented on BEAM-9977:
--

[~boyuanz] 

To try this out, do I require https://issues.apache.org/jira/browse/BEAM-10123 
to be completed?. I need check point functionality. 

> Build Kafka Read on top of Java SplittableDoFn
> --
>
> Key: BEAM-9977
> URL: https://issues.apache.org/jira/browse/BEAM-9977
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: P2
>  Time Spent: 10h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn

2020-05-26 Thread Boyuan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116864#comment-17116864
 ] 

Boyuan Zhang commented on BEAM-9977:


Hi Alexey,

Yes, the SDF Kafka read will use the GrowableOffsetRangeTracker as 
RestrictionTracker.

> Build Kafka Read on top of Java SplittableDoFn
> --
>
> Key: BEAM-9977
> URL: https://issues.apache.org/jira/browse/BEAM-9977
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: P2
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn

2020-05-26 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116859#comment-17116859
 ] 

Alexey Romanenko commented on BEAM-9977:


[~boyuanz] I see that there is a PR #11715, that was merged recently, but it 
doesn't affect KafkaIO directly (as I understand). Do you expect other PRs for 
this Jira?

> Build Kafka Read on top of Java SplittableDoFn
> --
>
> Key: BEAM-9977
> URL: https://issues.apache.org/jira/browse/BEAM-9977
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: P2
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)