Re: Expected behavior of PipelineResult#waitUntilFinish()

2017-03-29 Thread Shen Li
Hi, thanks for the prompt reply. I read the discussion. So, the waitUntilFinish() should *block* until all watermarks reach infinity regardless of how long (1 second, 1 year, 100 years) it might take, right? Shen On Wed, Mar 29, 2017 at 8:00 PM, Eugene Kirpichov < kirpic...@google.com.invalid>

Re: [PROPOSAL] @OnWindowExpiration

2017-03-29 Thread Aljoscha Krettek
+1 I had also already commented on the issue a while back ;-) On Wed, Mar 29, 2017, at 21:23, Kenneth Knowles wrote: > I had totally forgotten that this was filed as > https://issues.apache.org/jira/browse/BEAM-1589 already, which I have now > assigned to myself. > > And, of course, there have

Re: Spec cleanup for Finalize Checkpoint

2017-03-29 Thread Thomas Groh
(Short URL: https://s.apache.org/FIWQ) On Wed, Mar 29, 2017 at 1:15 PM, Thomas Groh wrote: > Hey everyone, > > We've had a few bugs recently in the DirectRunner based around finalizing > checkpoints, as well as a bit of confusion on what should be permitted from > within a

Spec cleanup for Finalize Checkpoint

2017-03-29 Thread Thomas Groh
Hey everyone, We've had a few bugs recently in the DirectRunner based around finalizing checkpoints, as well as a bit of confusion on what should be permitted from within a checkpoint. Those caused some revisiting of the checkpoint spec, both to make sure we have written down what a runner is

Re: [PROPOSAL] @OnWindowExpiration

2017-03-29 Thread Kenneth Knowles
I had totally forgotten that this was filed as https://issues.apache.org/jira/browse/BEAM-1589 already, which I have now assigned to myself. And, of course, there have been many discussions that mentioned the feature, so my initial phrasing as though it was a new idea probably seemed a bit odd.

Re: [DISCUSS] Change "RunnableOnService" To A More Intuitive Name

2017-03-29 Thread Kenneth Knowles
To have something to focus on, I have opened https://github.com/apache/beam/pull/2359 to remove all of either tags from anything outside of the core libraries. I believe there is rough consensus around this and I just wanted to make it extremely concrete. Comments appreciated, especially on those

Re: [PROPOSAL] @OnWindowExpiration

2017-03-29 Thread Thomas Groh
+1 The fact that we have this ability already (including all of the required information), just in a roundabout way by manually dredging in the allowed lateness, means that this isn't a huge burden to implement on an SDK or runner side; meanwhile, this much more strongly communicates what a user

Re: [PROPOSAL] @OnWindowExpiration

2017-03-29 Thread Kenneth Knowles
On Wed, Mar 29, 2017 at 12:16 AM, JingsongLee wrote: > If user have a WordCount StatefulDoFn, the result of > counts is always changing before the expiration of window. > Maybe the user want a signal to know the count is the final value > and then archive the value to

Re: IO IT Patterns: Simplifying data loading

2017-03-29 Thread Stephen Sisk
Hey Cham, Debugging is harder I definitely agree. As I said (and I think you still generally agree), I think the tradeoff is worth it. Looking at the data store in question can quickly narrow it down to one vs the other for a particular failure. Eventually consistent data stores

Re: Beam spark 2.x runner status

2017-03-29 Thread Jean-Baptiste Onofré
Cool for the PR merge, I will rebase my branch on it. Thanks ! Regards JB On 03/29/2017 01:58 PM, Amit Sela wrote: @Ted definitely makes sense. @JB I'm merging https://github.com/apache/beam/pull/2354 soon so any deprecated Spark API issues should be resolved. On Wed, Mar 29, 2017 at 2:46 PM

Re: Beam spark 2.x runner status

2017-03-29 Thread Amit Sela
@Ted definitely makes sense. @JB I'm merging https://github.com/apache/beam/pull/2354 soon so any deprecated Spark API issues should be resolved. On Wed, Mar 29, 2017 at 2:46 PM Ted Yu wrote: > This is what I did over HBASE-16179: > > -f.call((asJavaIterator(it),

Re: Beam spark 2.x runner status

2017-03-29 Thread Ted Yu
This is what I did over HBASE-16179: -f.call((asJavaIterator(it), conn)).iterator() +// the return type is different in spark 1.x & 2.x, we handle both cases +f.call(asJavaIterator(it), conn) match { + // spark 1.x + case iterable: Iterable[R] =>

Re: Beam spark 2.x runner status

2017-03-29 Thread Amit Sela
Just tried to replace dependencies and see what happens: Most required changes are about the runner using deprecated Spark APIs, and after fixing them the only real issue is with the Java API for Pair/FlatMapFunction that changed return value to Iterator (in 1.6 its Iterable). So I'm not sure

Re: [DISCUSS] Change "RunnableOnService" To A More Intuitive Name

2017-03-29 Thread Stas Levin
Stephen, the enforcement only applies to (test) pipelines created using "@Rule TestPipeline pipeline = ...". Naming could, and should, definitely be improved :) I think this aligns nicely with https://issues.apache.org/jira/browse/BEAM-1557 where we want to separate the TestPipeline rule into a