Re: Dataflow Streaming ValidatesRunner

2020-04-09 Thread Kenneth Knowles
We (Beam) used to run all ValidatesRunner tests with the --streaming flag forced, because we didn't have autodetection of unbounded data in the pipeline. That functionality was lost in the migration to gradle IIRC. It is still valuable, but also expensive. The quota issue Luke mentioned could be re

Re: Dataflow Streaming ValidatesRunner

2020-04-09 Thread Luke Cwik
There are a limited number of streaming pipelines that run as post commits on Jenkins but it isn't comprehensive compared to the validates runner set. Googlers regularly import the github code into Google and find issues with implementations that have been merged. Sometimes its bugs in implementat

Re: Dataflow Streaming ValidatesRunner

2020-04-09 Thread Reuven Lax
You can also call Create and call setIsBoundedInternal(IsBounded.UNBOUNDED) on the resulting PCollection, which will force the streaming runner to be used. On Thu, Apr 9, 2020 at 2:25 PM Steve Niemitz wrote: > Ah yeah I forgot that you can force a pipeline into streaming mode with > that flag. >

Re: Dataflow Streaming ValidatesRunner

2020-04-09 Thread Steve Niemitz
Ah yeah I forgot that you can force a pipeline into streaming mode with that flag. It sounds like the story here is there are tests for the streaming worker, but they run "on the side" in Google's environment? My concern is it seems like (publically at least) there's no test coverage on the strea

Re: Dataflow Streaming ValidatesRunner

2020-04-09 Thread Luke Cwik
You can use Create in streaming pipelines as well but you want to ensure that --streaming is passed as a flag. You could update the existing test target and force --streaming to be inserted for example here: https://github.com/lukecwik/incubator-beam/blob/8097972b4d0ed759aa45f6710ac02b982c6e8deb/ru