Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-19 Thread Jean-Baptiste Onofré
Hi Dan, thanks again for the detailed explanation. I will prepare some questions for you ;) Regards JB On 09/20/2016 01:37 AM, Dan Halperin wrote: Hey folks, Sorry for the confusion around sinks. Let me see if I can clear things up. In Beam, a Source+Reader is a very integral part of the mo

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-19 Thread Dan Halperin
+dev On Mon, Sep 19, 2016 at 4:37 PM, Dan Halperin wrote: > Hey folks, > > Sorry for the confusion around sinks. Let me see if I can clear things up. > > In Beam, a Source+Reader is a very integral part of the model. A source is > the root of a pipeline and it is where runners can do lots of imp

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-19 Thread Dan Halperin
Hey folks, Sorry for the confusion around sinks. Let me see if I can clear things up. In Beam, a Source+Reader is a very integral part of the model. A source is the root of a pipeline and it is where runners can do lots of important things like creating bundles, producing watermarks, and taking c

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-19 Thread Jean-Baptiste Onofré
Hi Thomas, thanks for the update. Stupid question: right now, most of the IO Sink use a simple DoFn (to write the PCollection elements). Does it mean that, when possible, we should use Sink abstract class, with a Writer that can write bundles (using open(), write(), close() methods) ? In t

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-19 Thread Thomas Groh
The model doesn't support dynamically creating PCollections, so the proposed transform producing a Bounded PCollection from a window of an Unbounded PCollection means that you end up with a Bounded Pipeline - in which case it is preferable to use a Bounded Read instead. As you're reading from an un

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-19 Thread Emanuele Cesena
Oh yes, even better I’d say. Best, > On Sep 19, 2016, at 9:48 AM, Jean-Baptiste Onofré wrote: > > Hi Emanuele, > > +1 to support Unbounded sink, but also, a very convenient function would be a > Window to create a bounded collection as a subset of a unbounded collection. > > Regards > JB >

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-19 Thread Jean-Baptiste Onofré
Hi Emanuele, +1 to support Unbounded sink, but also, a very convenient function would be a Window to create a bounded collection as a subset of a unbounded collection. Regards JB On 09/19/2016 05:59 PM, Emanuele Cesena wrote: Hi, This is a great insight. Is there any plan to support unboun

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-19 Thread Raghu Angadi
On Sun, Sep 18, 2016 at 10:38 PM, Aljoscha Krettek wrote: > use UnboundedFlinkSink to wrap a Flink Sink for use in a Beam pipeline. > The latter would allow you to use, for example, BucketingSink or > RollingSink from Flink. I'm only mentioning UnboundedFlinkSink for > completeness, I would not r

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-19 Thread Emanuele Cesena
Hi, This is a great insight. Is there any plan to support unbounded sink in Beam? On the temp kafka->kafka solution, this is exactly what we’re doing (and I wish to change). We have production stream pipelines that are kafka->kafka. Then we have 2 main use cases: kafka connect to dump into hive

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-19 Thread Emanuele Cesena
Hi Amit, Sorry I “unmounted” my playground, need to (find the time to) set it up again and I’ll be back to you. Best, > On Sep 18, 2016, at 2:12 PM, Amit Sela wrote: > > It appears that you did.. does this happen with the DirectRunner ? > Generally speaking, TextIO is a BoundedSource. I'm no

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-18 Thread Aljoscha Krettek
Hi, right now, writing to a Beam "Sink" is only supported for bounded streams, as you discovered. An unbounded stream cannot be transformed to a bounded stream using a window, this will just "chunk" the stream differently but it will still be unbounded. The options you have right now for writing a

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-18 Thread Emanuele Cesena
Thanks I’ll look into it, even if it’s not really the feature I need (exactly because it will stop execution). > On Sep 18, 2016, at 2:11 PM, Chawla,Sumit wrote: > > Hi Emanuele > > KafkaIO supports withMaxNumRecords(X) support which will create a bounded > source from Kafka. However, the

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-18 Thread Amit Sela
It appears that you did.. does this happen with the DirectRunner ? Generally speaking, TextIO is a BoundedSource. I'm not if it's supported. What's the stack trace ? Does the exception originate in Flink translation code or Beam code ? That should provide a good hint. On Mon, Sep 19, 2016, 00:00 E

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-18 Thread Chawla,Sumit
Hi Emanuele KafkaIO supports withMaxNumRecords(X) support which will create a bounded source from Kafka. However, the pipeline will finish once X number of records are read. Regards Sumit Chawla On Sun, Sep 18, 2016 at 2:00 PM, Emanuele Cesena wrote: > Hi, > > Thanks for the hint - I’ll deb

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-18 Thread Emanuele Cesena
Hi, Thanks for the hint - I’ll debug better but I thought I did that: https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java#L140 Best, > On Sep 18, 2016, at 1:54 PM, Jean-Baptiste Onofré wrote: > > Hi Emanuele > > You have to use

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-18 Thread Jean-Baptiste Onofré
Hi Emanuele You have to use a window to create a bounded collection from an unbounded source. Regards JB On Sep 18, 2016, 21:04, at 21:04, Emanuele Cesena wrote: >Hi, > >I wrote a while ago about a simple example I was building to test >KafkaIO: >https://github.com/ecesena/beam-starter/blob/