The model doesn't support dynamically creating PCollections, so the
proposed transform producing a Bounded PCollection from a window of an
Unbounded PCollection means that you end up with a Bounded Pipeline - in
which case it is preferable to use a Bounded Read instead. As you're
reading from an unbounded input, it may be necessary to write the windowed
values to a sink that supports continuous updates, from which you can
execute a Bounded Pipeline over the appropriate set of input instead of on
some arbitrary chunk of records (e.g. the window).

The current Write implementation works effectively only for sinks that can
assume they can do something exactly once and finish. This is not an
assumption that can be made within arbitrary pipelines. Sinks that
currently only work for Bounded inputs are generally written with
assumptions that mean they do not work in the presence of multiple trigger
firings, late data, or across multiple windows, and handling these facets
of the model requires different behavior or configuration from different
sinks.

On Mon, Sep 19, 2016 at 9:53 AM, Emanuele Cesena <emanu...@shopkick.com>
wrote:

> Oh yes, even better I’d say.
>
> Best,
>
>
> > On Sep 19, 2016, at 9:48 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
> >
> > Hi Emanuele,
> >
> > +1 to support Unbounded sink, but also, a very convenient function would
> be a Window to create a bounded collection as a subset of a unbounded
> collection.
> >
> > Regards
> > JB
> >
> > On 09/19/2016 05:59 PM, Emanuele Cesena wrote:
> >> Hi,
> >>
> >> This is a great insight. Is there any plan to support unbounded sink in
> Beam?
> >>
> >> On the temp kafka->kafka solution, this is exactly what we’re doing
> (and I wish to change). We have production stream pipelines that are
> kafka->kafka. Then we have 2 main use cases: kafka connect to dump into
> hive and go batch from there, and druid for real time reporting.
> >>
> >> However this makes prototyping really slow, and I wanted to introduce
> Beam to short cut from kafka to anywhere.
> >>
> >> Best,
> >>
> >>
> >>> On Sep 18, 2016, at 10:38 PM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
> >>>
> >>> Hi,
> >>> right now, writing to a Beam "Sink" is only supported for bounded
> streams, as you discovered. An unbounded stream cannot be transformed to a
> bounded stream using a window, this will just "chunk" the stream
> differently but it will still be unbounded.
> >>>
> >>> The options you have right now for writing are to write to your
> external datastore using a DoFn, using KafkaIO to write to a Kafka topic or
> to use UnboundedFlinkSink to wrap a Flink Sink for use in a Beam pipeline.
> The latter would allow you to use, for example, BucketingSink or
> RollingSink from Flink. I'm only mentioning UnboundedFlinkSink for
> completeness, I would not recommend using it since your program will only
> work on the Flink runner. The way to go, IMHO, would be to write to Kafka
> and then take the data from there and ship it to some final location such
> as HDFS.
> >>>
> >>> Cheers,
> >>> Aljoscha
> >>>
> >>> On Sun, 18 Sep 2016 at 23:17 Emanuele Cesena <emanu...@shopkick.com>
> wrote:
> >>> Thanks I’ll look into it, even if it’s not really the feature I need
> (exactly because it will stop execution).
> >>>
> >>>
> >>>> On Sep 18, 2016, at 2:11 PM, Chawla,Sumit <sumitkcha...@gmail.com>
> wrote:
> >>>>
> >>>> Hi Emanuele
> >>>>
> >>>> KafkaIO  supports withMaxNumRecords(X) support which will create a
> bounded source from Kafka. However, the pipeline will finish once X number
> of records are read.
> >>>>
> >>>> Regards
> >>>> Sumit Chawla
> >>>>
> >>>>
> >>>> On Sun, Sep 18, 2016 at 2:00 PM, Emanuele Cesena <
> emanu...@shopkick.com> wrote:
> >>>> Hi,
> >>>>
> >>>> Thanks for the hint - I’ll debug better but I thought I did that:
> >>>> https://github.com/ecesena/beam-starter/blob/master/src/
> main/java/com/dataradiant/beam/examples/StreamWordCount.java#L140
> >>>>
> >>>> Best,
> >>>>
> >>>>
> >>>>> On Sep 18, 2016, at 1:54 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
> >>>>>
> >>>>> Hi Emanuele
> >>>>>
> >>>>> You have to use a window to create a bounded collection from an
> unbounded source.
> >>>>>
> >>>>> Regards
> >>>>> JB
> >>>>>
> >>>>> On Sep 18, 2016, at 21:04, Emanuele Cesena <emanu...@shopkick.com>
> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I wrote a while ago about a simple example I was building to test
> KafkaIO:
> >>>>> https://github.com/ecesena/beam-starter/blob/master/src/
> main/java/com/dataradiant/beam/examples/StreamWordCount.java
> >>>>>
> >>>>> Issues with Flink should be fixed now, and I’m try to run the
> example on master and Flink 1.1.2.
> >>>>> I’m currently getting:
> >>>>> Caused by: java.lang.IllegalArgumentException: Write can only be
> applied to a Bounded PCollection
> >>>>>
> >>>>> What is the recommended way to go here?
> >>>>> - is there a way to create a bounded collection from an unbounded
> one?
> >>>>> - is there a plat to let TextIO write unbounded collections?
> >>>>> - is there another recommended “simple sink” to use?
> >>>>>
> >>>>> Thank you much!
> >>>>>
> >>>>> Best,
> >>>>
> >>>> --
> >>>> Emanuele Cesena, Data Eng.
> >>>> http://www.shopkick.com
> >>>>
> >>>> Il corpo non ha ideali
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> Emanuele Cesena, Data Eng.
> >>> http://www.shopkick.com
> >>>
> >>> Il corpo non ha ideali
> >>>
> >>>
> >>>
> >>>
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>
> --
> Emanuele Cesena, Data Eng.
> http://www.shopkick.com
>
> Il corpo non ha ideali
>
>
>
>
>

Reply via email to