Hi Emanuele,
+1 to support Unbounded sink, but also, a very convenient function would
be a Window to create a bounded collection as a subset of a unbounded
collection.
Regards
JB
On 09/19/2016 05:59 PM, Emanuele Cesena wrote:
Hi,
This is a great insight. Is there any plan to support unbounded sink in Beam?
On the temp kafka->kafka solution, this is exactly what we’re doing (and I wish to
change). We have production stream pipelines that are kafka->kafka. Then we have 2
main use cases: kafka connect to dump into hive and go batch from there, and druid
for real time reporting.
However this makes prototyping really slow, and I wanted to introduce Beam to
short cut from kafka to anywhere.
Best,
On Sep 18, 2016, at 10:38 PM, Aljoscha Krettek <[email protected]> wrote:
Hi,
right now, writing to a Beam "Sink" is only supported for bounded streams, as you
discovered. An unbounded stream cannot be transformed to a bounded stream using a window, this will
just "chunk" the stream differently but it will still be unbounded.
The options you have right now for writing are to write to your external
datastore using a DoFn, using KafkaIO to write to a Kafka topic or to use
UnboundedFlinkSink to wrap a Flink Sink for use in a Beam pipeline. The latter
would allow you to use, for example, BucketingSink or RollingSink from Flink.
I'm only mentioning UnboundedFlinkSink for completeness, I would not recommend
using it since your program will only work on the Flink runner. The way to go,
IMHO, would be to write to Kafka and then take the data from there and ship it
to some final location such as HDFS.
Cheers,
Aljoscha
On Sun, 18 Sep 2016 at 23:17 Emanuele Cesena <[email protected]> wrote:
Thanks I’ll look into it, even if it’s not really the feature I need (exactly
because it will stop execution).
On Sep 18, 2016, at 2:11 PM, Chawla,Sumit <[email protected]> wrote:
Hi Emanuele
KafkaIO supports withMaxNumRecords(X) support which will create a bounded
source from Kafka. However, the pipeline will finish once X number of records
are read.
Regards
Sumit Chawla
On Sun, Sep 18, 2016 at 2:00 PM, Emanuele Cesena <[email protected]> wrote:
Hi,
Thanks for the hint - I’ll debug better but I thought I did that:
https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java#L140
Best,
On Sep 18, 2016, at 1:54 PM, Jean-Baptiste Onofré <[email protected]> wrote:
Hi Emanuele
You have to use a window to create a bounded collection from an unbounded
source.
Regards
JB
On Sep 18, 2016, at 21:04, Emanuele Cesena <[email protected]> wrote:
Hi,
I wrote a while ago about a simple example I was building to test KafkaIO:
https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java
Issues with Flink should be fixed now, and I’m try to run the example on master
and Flink 1.1.2.
I’m currently getting:
Caused by: java.lang.IllegalArgumentException: Write can only be applied to a
Bounded PCollection
What is the recommended way to go here?
- is there a way to create a bounded collection from an unbounded one?
- is there a plat to let TextIO write unbounded collections?
- is there another recommended “simple sink” to use?
Thank you much!
Best,
--
Emanuele Cesena, Data Eng.
http://www.shopkick.com
Il corpo non ha ideali
--
Emanuele Cesena, Data Eng.
http://www.shopkick.com
Il corpo non ha ideali
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com