Re: Proposal: Watch: a transform for watching growth of sets

2017-07-30 Thread Eugene Kirpichov
This PR has been submitted and Watch is available.

Next PR https://github.com/apache/beam/pull/3607 is in review for adding
TextIO.read().watchForNewFiles()! (will extend it to Avro and provide
utility transforms for authors of similar IOs)

On Wed, Jul 19, 2017 at 8:37 PM Eugene Kirpichov 
wrote:

> Hi all,
>
> A PR for this is in review https://github.com/apache/beam/pull/3565
>
> I'm very excited about this, because:
> - I think it turned out to be a very neat API and I'm looking forward to
> people coming up with more use cases for it
> - I was pleasantly surprised by being able to come up with a good way to
> test this in great detail
> - This PR validated that SDF is the right API for this sort of thing:
> developing this transform did not require changes to SDF API or semantics,
> even for a highly nontrivial aspect like watermarks
> - I believe it's in good shape for being used in
> TextIO.read()/readAll().watchForNewFiles(), which will be the first use
> case of SDF for a previously impossible but much wanted IO connector! I'm
> going to implement these tomorrow.
>
> On Thu, Jun 29, 2017 at 4:43 PM Eugene Kirpichov 
> wrote:
>
>> Hi all,
>>
>> Please take a look at this short proposal that came out of implementing
>> http://s.apache.org/textio-sdf. I think it's a nice generalization. I
>> would welcome comments on the proposed API, or corner cases of semantics
>> that I haven't thought about, or more generalizations, etc.
>>
>> http://s.apache.org/beam-watch-transform
>>
>> We propose a PTransform Watch.growthOf(poll function) that repeatedly
>> polls sets associated with each of its inputs and continuously produces new
>> elements in each set until a per-set termination condition is reached. It
>> is a generalization of "watch filepattern for matching files".
>>
>> Code snippet:
>>
>> PCollection inputs = …;
>> PCollection> outputs = inputs.apply(
>> Watch.growthOf((InputT input, PollReceiver out) → {
>>… out.put(timestamp, value) …
>>return Watch.outputCanGrow().withWatermark( … );
>>// or:
>>return Watch.outputIsFinal();
>>  })
>>  .withPollInterval(10 sec)
>>  .withTerminationPerInput(
>>  Watch.afterEitherOf(
>>Watch.afterTotalOf(5 min),
>>Watch.afterOutputStableFor(1 min)))
>>  .withOutputCoder( … ))
>>
>> Thanks!
>>
>


Re: Proposal: Watch: a transform for watching growth of sets

2017-07-19 Thread Eugene Kirpichov
Hi all,

A PR for this is in review https://github.com/apache/beam/pull/3565

I'm very excited about this, because:
- I think it turned out to be a very neat API and I'm looking forward to
people coming up with more use cases for it
- I was pleasantly surprised by being able to come up with a good way to
test this in great detail
- This PR validated that SDF is the right API for this sort of thing:
developing this transform did not require changes to SDF API or semantics,
even for a highly nontrivial aspect like watermarks
- I believe it's in good shape for being used in
TextIO.read()/readAll().watchForNewFiles(), which will be the first use
case of SDF for a previously impossible but much wanted IO connector! I'm
going to implement these tomorrow.

On Thu, Jun 29, 2017 at 4:43 PM Eugene Kirpichov 
wrote:

> Hi all,
>
> Please take a look at this short proposal that came out of implementing
> http://s.apache.org/textio-sdf. I think it's a nice generalization. I
> would welcome comments on the proposed API, or corner cases of semantics
> that I haven't thought about, or more generalizations, etc.
>
> http://s.apache.org/beam-watch-transform
>
> We propose a PTransform Watch.growthOf(poll function) that repeatedly
> polls sets associated with each of its inputs and continuously produces new
> elements in each set until a per-set termination condition is reached. It
> is a generalization of "watch filepattern for matching files".
>
> Code snippet:
>
> PCollection inputs = …;
> PCollection> outputs = inputs.apply(
> Watch.growthOf((InputT input, PollReceiver out) → {
>… out.put(timestamp, value) …
>return Watch.outputCanGrow().withWatermark( … );
>// or:
>return Watch.outputIsFinal();
>  })
>  .withPollInterval(10 sec)
>  .withTerminationPerInput(
>  Watch.afterEitherOf(
>Watch.afterTotalOf(5 min),
>Watch.afterOutputStableFor(1 min)))
>  .withOutputCoder( … ))
>
> Thanks!
>


Proposal: Watch: a transform for watching growth of sets

2017-06-29 Thread Eugene Kirpichov
Hi all,

Please take a look at this short proposal that came out of implementing
http://s.apache.org/textio-sdf. I think it's a nice generalization. I would
welcome comments on the proposed API, or corner cases of semantics that I
haven't thought about, or more generalizations, etc.

http://s.apache.org/beam-watch-transform

We propose a PTransform Watch.growthOf(poll function) that repeatedly polls
sets associated with each of its inputs and continuously produces new
elements in each set until a per-set termination condition is reached. It
is a generalization of "watch filepattern for matching files".

Code snippet:

PCollection inputs = …;
PCollection> outputs = inputs.apply(
Watch.growthOf((InputT input, PollReceiver out) → {
   … out.put(timestamp, value) …
   return Watch.outputCanGrow().withWatermark( … );
   // or:
   return Watch.outputIsFinal();
 })
 .withPollInterval(10 sec)
 .withTerminationPerInput(
 Watch.afterEitherOf(
   Watch.afterTotalOf(5 min),
   Watch.afterOutputStableFor(1 min)))
 .withOutputCoder( … ))

Thanks!