Re: [DISCUSS] Sources and Runners

2016-10-19 Thread Raghu Angadi
On Wed, Oct 19, 2016 at 11:00 AM, Kenneth Knowles wrote: > I wanted to attempt to explicitly answer Raghu's question by saying that I > think Dan's starting points imply that the recommended behavior for start() > and advance() is to be "non-blocking" in the sense that

Re: [DISCUSS] Sources and Runners

2016-10-19 Thread Jean-Baptiste Onofré
Hi FYI when working on IO I already setup a docker image that I'm using for integration test. The IO unit tests embed and bootstrap the IO resources when possible. For instance JmsIO unit tests start a embedded ActiveMQ broker. However I also have a ActiveMQ docker image that I use for

Re: [DISCUSS] Sources and Runners

2016-10-19 Thread Thomas Weise
Hadoop FS has the local file system implementation that can be used for testing ("file" URL, no service needed). Thanks On Wed, Oct 19, 2016 at 10:43 AM, Amit Sela wrote: > Oh cool, that didn't exist in 0.8 I think, but anything that is Kafka > native is best. > I'm

Re: [DISCUSS] Sources and Runners

2016-10-19 Thread Raghu Angadi
It will be very useful for existing KafkaIOTest as well. MockConsumer we use is too primitive. ~ 50% of KafkaIOTest deals with MockConsumer. On Wed, Oct 19, 2016 at 10:43 AM, Amit Sela wrote: > Oh cool, that didn't exist in 0.8 I think, but anything that is Kafka > native

Re: [DISCUSS] Sources and Runners

2016-10-19 Thread Amit Sela
Oh cool, that didn't exist in 0.8 I think, but anything that is Kafka native is best. I'm pretty sure there's an embedded HDFS for testing as well. While embedded Kafka/HDFS won't reflect "real-life" distributed environment, it could be a good place to start and provide some basic functional

Re: [DISCUSS] Sources and Runners

2016-10-19 Thread Amit Sela
The SparkRunner actually has an embedded Kafka for its unit tests. On Wed, Oct 19, 2016, 20:16 Thomas Weise wrote: > Kafka can be embedded for the integration testing, which should > significantly simplify the setup. > > Here is an example I found: > >

Re: [DISCUSS] Sources and Runners

2016-10-19 Thread Dan Halperin
My thoughts: * It's worth reading the Beam testing document that Jason Kuster wrote! * Beam already has support for "End-to-end" integration tests, of examples (e.g., WordCountIT

Re: [DISCUSS] Sources and Runners

2016-10-18 Thread Robert Bradshaw
Eventually we'll be able to communicate intent with the runner much more directly via the ProcessContinuation object: https://github.com/apache/incubator-beam/blob/a0f649eaca8d8bd47d22db0ba7150fea1bf07975/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L658 On Tue, Oct 18,

Re: [DISCUSS] Sources and Runners

2016-10-18 Thread Jean-Baptiste Onofré
Thanks for the update and summary. Regards JB ⁣​ On Oct 18, 2016, 20:47, at 20:47, Amit Sela wrote: >I wanted to summarize here a couple of important points raised in some >PRs >I was involved with, and while those PRs were about KafkaIO and related >to >the Spark/Direct

Re: [DISCUSS] Sources and Runners

2016-10-18 Thread Raghu Angadi
One way I would rephrase the concern from unbounded source developer point of view : - What is the recommended blocking behavior for start() and advance()? E.g. on one extreme should they wait till there is a record? Mostly this will be bad. I am glad at pull/1125