Re: Event-time based in-window trigger

2016-12-08 Thread Manu Zhang
created BEAM-1119  and
assigned to you.

On Fri, Dec 9, 2016 at 11:51 AM Kenneth Knowles  wrote:

> I don't think there is a JIRA for this, but I agree that the use case is
> good and compatible with the model. Please do open a JIRA ticket in the
> beam-model component. Feel free to assign to me or leave unassigned.
>
> On Thu, Dec 8, 2016 at 5:30 PM, Manu Zhang 
> wrote:
>
> @Kenn and @Tyler,
>
> Given the use case is defined (hope I've explained it clearly), do we have
> plans/jiras to add the new functionality ?
>
> Thanks,
> Manu
>
> On Fri, Dec 2, 2016 at 12:21 PM Manu Zhang 
> wrote:
>
> @Kenn,
>
> 1. when the watermark jumps from 0 to 7,  http://foo -> http://foo/bar ->
> http://foo will be emitted
> We can emit events with timestamps before watermark in the pane
> 2. http://foo -> http://foo/bizzle -> http://foo/bar -> http://foo will
> be emitted if it's within the allowed lateness
> which Beam already allows us to do.
>
> To elaborate on the use case, when users are visiting Amazon we want to
> offer them best recommendations.
> Thus, we would like to know what leads to their final decision and track
> the pages they visit until clicking the "Add to cart" button.
> It will be too late if we only send the results when they finish shopping.
>
> @Tyler,
> I don't think it's likely to happen for my use case. Think about a user
> jumping between pages like crazy. Meanwhile, we can control how fast
> watermark progresses as long as it meets the latency requirement.
>
>
>
>
> On Fri, Dec 2, 2016 at 11:45 AM Tyler Akidau  wrote:
>
> And one more question while we're at it: what if you have events happening
> every second within the window? Do you really want to emit a new pane every
> second as the watermark progresses (assuming it progresses relatively
> smoothly)? What if we're talking differences of event times of
> milliseconds? Is one pane per millisecond what you want?
>
> -Tyler
>
> On Fri, Dec 2, 2016 at 10:41 AM Kenneth Knowles  wrote:
>
> Thanks for laying out some details.
>
> On Thu, Dec 1, 2016 at 7:09 PM, Manu Zhang 
> wrote:
>
> Yes, the difficulty is to define that trigger. The existing triggers fire
> at the end of window. (I could be mistaken, which will be good news)
>
>
> You are not mistaken that the only existing event time trigger is the one
> that fires at the end of the window. The trigger you describe would be a
> new primitive trigger. It fits with the design, if we ensure monotonicity,
> etc. Actually implementing it in the backend is easy, of course. We
> actually had something like it, but didn't quite nail it down so we removed
> it until we had a solid use case and design for it.
>
> B and C which are not mutually exclusive
> More on my use case. Say a user visits http://foo at 1, http://foo/bar at
> 4 and back to http://foo at 5 all in a Session
> we would want to emit
>
> http://foo  when the watermark passes 1
> http://foo -> http://foo/bar when the watermark passes 4
> http://foo -> http://foo/bar -> http://foo when the watermark passes 5
>
>
> What would you want to emit when the watermark jumps from 0 to 7 and all
> three of the above are buffered?
>
> What would you want to emit when the watermark was at 9 and
> http://foo/bizzle came in with timestamp 3?
>
> Kenn
>
>
>
>
> On Fri, Dec 2, 2016 at 10:12 AM Ben Chambers  wrote:
>
> As a clarifying question:
>
> If you have three elements in the pane with timestamps [1, 4, 5], would
> you:
> A. want to emit that entire pane when the watermark passes 1
> B. want to emit that entire pane when the watermark passes 5
> C. emit a fragment of that pane containing only the first element when the
> watermark passes 1
>
> On Thu, Dec 1, 2016 at 6:01 PM Tyler Akidau  wrote:
>
> So what you want is essentially a trigger that fires when the watermark
> has passed the event time of the oldest un-emitted element in the current
> pane? You could them presumably wrap this in a repeat to get the overall
> desired semantics, right?
>
> -Tyler
>
>
> On Fri, Dec 2, 2016 at 7:32 AM Manu Zhang  wrote:
>
> My use case is to track user trajectory based on page view event when they
> visit a website.  The input would be like a list of PageView(userId, url,
> eventTimestamp) with watermarks (= eventTimestamp - duration). I'm trying
> Sessions with event time trigger. Note we can't wait for the end of session
> window due to latency requirement. Instead, we want to emit the user
> trajectories whenever a buffered PageView's event time is passed by
> watermark.
>
> On Fri, Dec 2, 2016 at 5:41 AM Lukasz Cwik  wrote:
>
> Can you provide more details about the problem your trying to solve with
> some examples showing input and the expected output?
>
>
>
>
> On Wed, Nov 30, 2016 at 11:08 PM, 

Re: Event-time based in-window trigger

2016-12-08 Thread Kenneth Knowles
I don't think there is a JIRA for this, but I agree that the use case is
good and compatible with the model. Please do open a JIRA ticket in the
beam-model component. Feel free to assign to me or leave unassigned.

On Thu, Dec 8, 2016 at 5:30 PM, Manu Zhang  wrote:

> @Kenn and @Tyler,
>
> Given the use case is defined (hope I've explained it clearly), do we have
> plans/jiras to add the new functionality ?
>
> Thanks,
> Manu
>
> On Fri, Dec 2, 2016 at 12:21 PM Manu Zhang 
> wrote:
>
>> @Kenn,
>>
>> 1. when the watermark jumps from 0 to 7,  http://foo -> http://foo/bar ->
>>  http://foo will be emitted
>> We can emit events with timestamps before watermark in the pane
>> 2. http://foo -> http://foo/bizzle -> http://foo/bar -> http://foo will
>> be emitted if it's within the allowed lateness
>> which Beam already allows us to do.
>>
>> To elaborate on the use case, when users are visiting Amazon we want to
>> offer them best recommendations.
>> Thus, we would like to know what leads to their final decision and track
>> the pages they visit until clicking the "Add to cart" button.
>> It will be too late if we only send the results when they finish shopping.
>>
>> @Tyler,
>> I don't think it's likely to happen for my use case. Think about a user
>> jumping between pages like crazy. Meanwhile, we can control how fast
>> watermark progresses as long as it meets the latency requirement.
>>
>>
>>
>>
>> On Fri, Dec 2, 2016 at 11:45 AM Tyler Akidau  wrote:
>>
>> And one more question while we're at it: what if you have events
>> happening every second within the window? Do you really want to emit a new
>> pane every second as the watermark progresses (assuming it progresses
>> relatively smoothly)? What if we're talking differences of event times of
>> milliseconds? Is one pane per millisecond what you want?
>>
>> -Tyler
>>
>> On Fri, Dec 2, 2016 at 10:41 AM Kenneth Knowles  wrote:
>>
>> Thanks for laying out some details.
>>
>> On Thu, Dec 1, 2016 at 7:09 PM, Manu Zhang 
>> wrote:
>>
>> Yes, the difficulty is to define that trigger. The existing triggers fire
>> at the end of window. (I could be mistaken, which will be good news)
>>
>>
>> You are not mistaken that the only existing event time trigger is the one
>> that fires at the end of the window. The trigger you describe would be a
>> new primitive trigger. It fits with the design, if we ensure monotonicity,
>> etc. Actually implementing it in the backend is easy, of course. We
>> actually had something like it, but didn't quite nail it down so we removed
>> it until we had a solid use case and design for it.
>>
>> B and C which are not mutually exclusive
>> More on my use case. Say a user visits http://foo at 1, http://foo/bar at
>> 4 and back to http://foo at 5 all in a Session
>> we would want to emit
>>
>> http://foo  when the watermark passes 1
>> http://foo -> http://foo/bar when the watermark passes 4
>> http://foo -> http://foo/bar -> http://foo when the watermark passes 5
>>
>>
>> What would you want to emit when the watermark jumps from 0 to 7 and all
>> three of the above are buffered?
>>
>> What would you want to emit when the watermark was at 9 and
>> http://foo/bizzle came in with timestamp 3?
>>
>> Kenn
>>
>>
>>
>>
>> On Fri, Dec 2, 2016 at 10:12 AM Ben Chambers 
>> wrote:
>>
>> As a clarifying question:
>>
>> If you have three elements in the pane with timestamps [1, 4, 5], would
>> you:
>> A. want to emit that entire pane when the watermark passes 1
>> B. want to emit that entire pane when the watermark passes 5
>> C. emit a fragment of that pane containing only the first element when
>> the watermark passes 1
>>
>> On Thu, Dec 1, 2016 at 6:01 PM Tyler Akidau  wrote:
>>
>> So what you want is essentially a trigger that fires when the watermark
>> has passed the event time of the oldest un-emitted element in the current
>> pane? You could them presumably wrap this in a repeat to get the overall
>> desired semantics, right?
>>
>> -Tyler
>>
>>
>> On Fri, Dec 2, 2016 at 7:32 AM Manu Zhang 
>> wrote:
>>
>> My use case is to track user trajectory based on page view event when
>> they visit a website.  The input would be like a list of
>> PageView(userId, url, eventTimestamp) with watermarks (= eventTimestamp -
>> duration). I'm trying Sessions with event time trigger. Note we can't wait
>> for the end of session window due to latency requirement. Instead, we want
>> to emit the user trajectories whenever a buffered PageView's event time is
>> passed by watermark.
>>
>> On Fri, Dec 2, 2016 at 5:41 AM Lukasz Cwik  wrote:
>>
>> Can you provide more details about the problem your trying to solve with
>> some examples showing input and the expected output?
>>
>>
>>
>>
>> On Wed, Nov 30, 2016 at 11:08 PM, Manu Zhang 
>> wrote:
>>
>> Hi,

Re: Event-time based in-window trigger

2016-12-08 Thread Manu Zhang
@Kenn and @Tyler,

Given the use case is defined (hope I've explained it clearly), do we have
plans/jiras to add the new functionality ?

Thanks,
Manu

On Fri, Dec 2, 2016 at 12:21 PM Manu Zhang  wrote:

> @Kenn,
>
> 1. when the watermark jumps from 0 to 7,  http://foo -> http://foo/bar ->
> http://foo will be emitted
> We can emit events with timestamps before watermark in the pane
> 2. http://foo -> http://foo/bizzle -> http://foo/bar -> http://foo will
> be emitted if it's within the allowed lateness
> which Beam already allows us to do.
>
> To elaborate on the use case, when users are visiting Amazon we want to
> offer them best recommendations.
> Thus, we would like to know what leads to their final decision and track
> the pages they visit until clicking the "Add to cart" button.
> It will be too late if we only send the results when they finish shopping.
>
> @Tyler,
> I don't think it's likely to happen for my use case. Think about a user
> jumping between pages like crazy. Meanwhile, we can control how fast
> watermark progresses as long as it meets the latency requirement.
>
>
>
>
> On Fri, Dec 2, 2016 at 11:45 AM Tyler Akidau  wrote:
>
> And one more question while we're at it: what if you have events happening
> every second within the window? Do you really want to emit a new pane every
> second as the watermark progresses (assuming it progresses relatively
> smoothly)? What if we're talking differences of event times of
> milliseconds? Is one pane per millisecond what you want?
>
> -Tyler
>
> On Fri, Dec 2, 2016 at 10:41 AM Kenneth Knowles  wrote:
>
> Thanks for laying out some details.
>
> On Thu, Dec 1, 2016 at 7:09 PM, Manu Zhang 
> wrote:
>
> Yes, the difficulty is to define that trigger. The existing triggers fire
> at the end of window. (I could be mistaken, which will be good news)
>
>
> You are not mistaken that the only existing event time trigger is the one
> that fires at the end of the window. The trigger you describe would be a
> new primitive trigger. It fits with the design, if we ensure monotonicity,
> etc. Actually implementing it in the backend is easy, of course. We
> actually had something like it, but didn't quite nail it down so we removed
> it until we had a solid use case and design for it.
>
> B and C which are not mutually exclusive
> More on my use case. Say a user visits http://foo at 1, http://foo/bar at
> 4 and back to http://foo at 5 all in a Session
> we would want to emit
>
> http://foo  when the watermark passes 1
> http://foo -> http://foo/bar when the watermark passes 4
> http://foo -> http://foo/bar -> http://foo when the watermark passes 5
>
>
> What would you want to emit when the watermark jumps from 0 to 7 and all
> three of the above are buffered?
>
> What would you want to emit when the watermark was at 9 and
> http://foo/bizzle came in with timestamp 3?
>
> Kenn
>
>
>
>
> On Fri, Dec 2, 2016 at 10:12 AM Ben Chambers  wrote:
>
> As a clarifying question:
>
> If you have three elements in the pane with timestamps [1, 4, 5], would
> you:
> A. want to emit that entire pane when the watermark passes 1
> B. want to emit that entire pane when the watermark passes 5
> C. emit a fragment of that pane containing only the first element when the
> watermark passes 1
>
> On Thu, Dec 1, 2016 at 6:01 PM Tyler Akidau  wrote:
>
> So what you want is essentially a trigger that fires when the watermark
> has passed the event time of the oldest un-emitted element in the current
> pane? You could them presumably wrap this in a repeat to get the overall
> desired semantics, right?
>
> -Tyler
>
>
> On Fri, Dec 2, 2016 at 7:32 AM Manu Zhang  wrote:
>
> My use case is to track user trajectory based on page view event when they
> visit a website.  The input would be like a list of PageView(userId, url,
> eventTimestamp) with watermarks (= eventTimestamp - duration). I'm trying
> Sessions with event time trigger. Note we can't wait for the end of session
> window due to latency requirement. Instead, we want to emit the user
> trajectories whenever a buffered PageView's event time is passed by
> watermark.
>
> On Fri, Dec 2, 2016 at 5:41 AM Lukasz Cwik  wrote:
>
> Can you provide more details about the problem your trying to solve with
> some examples showing input and the expected output?
>
>
>
>
> On Wed, Nov 30, 2016 at 11:08 PM, Manu Zhang 
> wrote:
>
> Hi,
>
> Recently I’m addressing a problem where users want to trigger after
> watermark past each element (i.e. in the middle of event-time window). I
> fail to find an existing trigger that does so. Any idea on model this
> problem with Beam ?
>
> Thanks,
> Manu Zhang
>
>
>


Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-08 Thread Kenneth Knowles
Hi users@,

This has been done, and will show up in the next incubating release.

Kenn

On Wed, Dec 7, 2016 at 5:13 PM, Ben Chambers 
wrote:

> +1 to pushing all remaining "major" (read likely to affect everyone) breaks
> through in a single release.
>
> On Wed, Dec 7, 2016 at 3:56 PM Dan Halperin 
> wrote:
>
> > +user@, because this is a user-impacting change and they might not all
> be
> > paying attention to the dev@ list.
> >
> > +1
> >
> > I'm mildly reluctant because this will break all users that have written
> > composite transforms -- and I'm the jerk that filed the issue (a few
> times
> > now, on different iterations of the SDKs). But, like Ben said, I can't
> > think of a different way to do this that doesn't break users.
> >
> > Hopefully, with breaking changes pushed to DoFn and to PTransform, the
> > worst user churn would be over. I can't think of anything quite so
> > invasive.
> >
> > IMO: if there is, we should try to push all the remaining "major" breaks
> > through in the same release.
> >
> > Dan
> >
> > On Thu, Dec 8, 2016 at 7:48 AM, Aljoscha Krettek 
> > wrote:
> >
> > > +1
> > >
> > > I've seen this mistake myself in some PRs.
> > >
> > > On Thu, 8 Dec 2016 at 06:10 Ben Chambers  >
> > > wrote:
> > >
> > > > +1 -- This seems like the best option. It's a mechanical change, and
> > the
> > > > compiler will let users know it needs to be made. It will make the
> > > mistake
> > > > much less common, and when it occurs it will be much clearer what is
> > > wrong.
> > > >
> > > > It would be great if we could make the mis-use a compiler problem or
> a
> > > > pipeline construction time error without changing the names, but both
> > of
> > > > these options are not practical. We can't hide the expansion method,
> > > since
> > > > it is what PTransform implementations need to override. We can't make
> > > this
> > > > a construction time exception since it would require adding code to
> > every
> > > > PTransform implementation.
> > > >
> > > > On Wed, Dec 7, 2016 at 1:55 PM Thomas Groh  >
> > > > wrote:
> > > >
> > > > > +1; This is probably the best way to make sure users don't reverse
> > the
> > > > > polarity of the PCollection flow.
> > > > >
> > > > > This also brings PInput.expand(), POutput.expand(), and
> > > > > PTransform.expand(PInput) into line - namely, for some composite
> > thing,
> > > > > "represent yourself as some collection of primitives" (potentially
> > > > > recursively).
> > > > >
> > > > > On Wed, Dec 7, 2016 at 1:37 PM, Kenneth Knowles
> >  > > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I want to bring up another major backwards-incompatible change
> > before
> > > > it
> > > > > is
> > > > > > too late, to resolve [BEAM-438].
> > > > > >
> > > > > > Summary: Leave PInput.apply the same but rename PTransform.apply
> to
> > > > > > PTransform.expand. I have opened [PR #1538] just for reference
> (it
> > > took
> > > > > 30
> > > > > > seconds using IDE automated refactor)
> > > > > >
> > > > > > This change affects *PTransform authors* but does *not* affect
> > > pipeline
> > > > > > authors.
> > > > > >
> > > > > > This issue was filed a long time ago. It has been a problem many
> > > times
> > > > > with
> > > > > > actual users since before Beam started incubating. This is what
> > goes
> > > > > wrong
> > > > > > (often):
> > > > > >
> > > > > >PCollection input = ...
> > > > > >PTransform transform = ...
> > > > > >
> > > > > >transform.apply(input)
> > > > > >
> > > > > > This type checks and even looks perfectly normal. Do you see the
> > > error?
> > > > > >
> > > > > > ... what we need the user to write is:
> > > > > >
> > > > > > input.apply(transform)
> > > > > >
> > > > > > What a confusing difference! After all, the first one type-checks
> > and
> > > > the
> > > > > > first one is how you apply a Function or Predicate or
> > > > > SerializableFunction,
> > > > > > etc. But it is broken. With transform.apply(input) the transform
> is
> > > not
> > > > > > registered with the pipeline at all.
> > > > > >
> > > > > > We obviously can't (and don't want to) change the most core way
> > that
> > > > > > pipeline authors use Beam, so PInput.apply (aka
> PCollection.apply)
> > > must
> > > > > > remain the same. But we do need a way to make it impossible to
> mix
> > > > these
> > > > > > up.
> > > > > >
> > > > > > The simplest way I can think of is to choose a new name for the
> > other
> > > > > > method involved. Users probably won't write
> transform.expand(input)
> > > > since
> > > > > > they will never have seen it in any examples, etc. This will just
> > > make
> > > > > > PTransform authors need to do a global rename, and the type
> system
> > > will
> > > > > > direct them to all cases so there is no silent failure