Hi Thomas,

thanks for the update.

Stupid question: right now, most of the IO Sink use a simple DoFn (to write the PCollection elements).

Does it mean that, when possible, we should use Sink abstract class, with a Writer that can write bundles (using open(), write(), close() methods) ?
In that case, will the Window create the bundle ?

Regards
JB

On 09/19/2016 10:46 PM, Thomas Groh wrote:
The model doesn't support dynamically creating PCollections, so the
proposed transform producing a Bounded PCollection from a window of an
Unbounded PCollection means that you end up with a Bounded Pipeline - in
which case it is preferable to use a Bounded Read instead. As you're
reading from an unbounded input, it may be necessary to write the
windowed values to a sink that supports continuous updates, from which
you can execute a Bounded Pipeline over the appropriate set of input
instead of on some arbitrary chunk of records (e.g. the window).

The current Write implementation works effectively only for sinks that
can assume they can do something exactly once and finish. This is not an
assumption that can be made within arbitrary pipelines. Sinks that
currently only work for Bounded inputs are generally written with
assumptions that mean they do not work in the presence of multiple
trigger firings, late data, or across multiple windows, and handling
these facets of the model requires different behavior or configuration
from different sinks.

On Mon, Sep 19, 2016 at 9:53 AM, Emanuele Cesena <emanu...@shopkick.com
<mailto:emanu...@shopkick.com>> wrote:

    Oh yes, even better I’d say.

    Best,


    > On Sep 19, 2016, at 9:48 AM, Jean-Baptiste Onofré <j...@nanthrax.net
    <mailto:j...@nanthrax.net>> wrote:
    >
    > Hi Emanuele,
    >
    > +1 to support Unbounded sink, but also, a very convenient function
    would be a Window to create a bounded collection as a subset of a
    unbounded collection.
    >
    > Regards
    > JB
    >
    > On 09/19/2016 05:59 PM, Emanuele Cesena wrote:
    >> Hi,
    >>
    >> This is a great insight. Is there any plan to support unbounded
    sink in Beam?
    >>
    >> On the temp kafka->kafka solution, this is exactly what we’re
    doing (and I wish to change). We have production stream pipelines
    that are kafka->kafka. Then we have 2 main use cases: kafka connect
    to dump into hive and go batch from there, and druid for real time
    reporting.
    >>
    >> However this makes prototyping really slow, and I wanted to
    introduce Beam to short cut from kafka to anywhere.
    >>
    >> Best,
    >>
    >>
    >>> On Sep 18, 2016, at 10:38 PM, Aljoscha Krettek
    <aljos...@apache.org <mailto:aljos...@apache.org>> wrote:
    >>>
    >>> Hi,
    >>> right now, writing to a Beam "Sink" is only supported for
    bounded streams, as you discovered. An unbounded stream cannot be
    transformed to a bounded stream using a window, this will just
    "chunk" the stream differently but it will still be unbounded.
    >>>
    >>> The options you have right now for writing are to write to your
    external datastore using a DoFn, using KafkaIO to write to a Kafka
    topic or to use UnboundedFlinkSink to wrap a Flink Sink for use in a
    Beam pipeline. The latter would allow you to use, for example,
    BucketingSink or RollingSink from Flink. I'm only mentioning
    UnboundedFlinkSink for completeness, I would not recommend using it
    since your program will only work on the Flink runner. The way to
    go, IMHO, would be to write to Kafka and then take the data from
    there and ship it to some final location such as HDFS.
    >>>
    >>> Cheers,
    >>> Aljoscha
    >>>
    >>> On Sun, 18 Sep 2016 at 23:17 Emanuele Cesena
    <emanu...@shopkick.com <mailto:emanu...@shopkick.com>> wrote:
    >>> Thanks I’ll look into it, even if it’s not really the feature I
    need (exactly because it will stop execution).
    >>>
    >>>
    >>>> On Sep 18, 2016, at 2:11 PM, Chawla,Sumit
    <sumitkcha...@gmail.com <mailto:sumitkcha...@gmail.com>> wrote:
    >>>>
    >>>> Hi Emanuele
    >>>>
    >>>> KafkaIO  supports withMaxNumRecords(X) support which will
    create a bounded source from Kafka. However, the pipeline will
    finish once X number of records are read.
    >>>>
    >>>> Regards
    >>>> Sumit Chawla
    >>>>
    >>>>
    >>>> On Sun, Sep 18, 2016 at 2:00 PM, Emanuele Cesena
    <emanu...@shopkick.com <mailto:emanu...@shopkick.com>> wrote:
    >>>> Hi,
    >>>>
    >>>> Thanks for the hint - I’ll debug better but I thought I did that:
    >>>>
    
https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java#L140
    
<https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java#L140>
    >>>>
    >>>> Best,
    >>>>
    >>>>
    >>>>> On Sep 18, 2016, at 1:54 PM, Jean-Baptiste Onofré
    <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:
    >>>>>
    >>>>> Hi Emanuele
    >>>>>
    >>>>> You have to use a window to create a bounded collection from
    an unbounded source.
    >>>>>
    >>>>> Regards
    >>>>> JB
    >>>>>
    >>>>> On Sep 18, 2016, at 21:04, Emanuele Cesena
    <emanu...@shopkick.com <mailto:emanu...@shopkick.com>> wrote:
    >>>>> Hi,
    >>>>>
    >>>>> I wrote a while ago about a simple example I was building to
    test KafkaIO:
    >>>>>
    
https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java
    
<https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java>
    >>>>>
    >>>>> Issues with Flink should be fixed now, and I’m try to run the
    example on master and Flink 1.1.2.
    >>>>> I’m currently getting:
    >>>>> Caused by: java.lang.IllegalArgumentException: Write can only
    be applied to a Bounded PCollection
    >>>>>
    >>>>> What is the recommended way to go here?
    >>>>> - is there a way to create a bounded collection from an
    unbounded one?
    >>>>> - is there a plat to let TextIO write unbounded collections?
    >>>>> - is there another recommended “simple sink” to use?
    >>>>>
    >>>>> Thank you much!
    >>>>>
    >>>>> Best,
    >>>>
    >>>> --
    >>>> Emanuele Cesena, Data Eng.
    >>>> http://www.shopkick.com
    >>>>
    >>>> Il corpo non ha ideali
    >>>>
    >>>>
    >>>>
    >>>>
    >>>>
    >>>
    >>> --
    >>> Emanuele Cesena, Data Eng.
    >>> http://www.shopkick.com
    >>>
    >>> Il corpo non ha ideali
    >>>
    >>>
    >>>
    >>>
    >>
    >
    > --
    > Jean-Baptiste Onofré
    > jbono...@apache.org <mailto:jbono...@apache.org>
    > http://blog.nanthrax.net
    > Talend - http://www.talend.com

    --
    Emanuele Cesena, Data Eng.
    http://www.shopkick.com

    Il corpo non ha ideali






--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to