[RESULT] [VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2020-01-03 Thread Kenneth Knowles
I am happy to announce that this vote has passed, with 20 approving +1
votes, 5 of which are binding PMC votes.

Beam's Mascot is the Firefly!

Kenn

On Fri, Jan 3, 2020 at 9:31 PM Kenneth Knowles  wrote:

> +1 (binding)
>
> On Tue, Dec 17, 2019 at 12:30 PM Leonardo Miguel <
> leonardo.mig...@arquivei.com.br> wrote:
>
>> +1
>>
>> Em sex., 13 de dez. de 2019 às 01:58, Kenneth Knowles 
>> escreveu:
>>
>>> Please vote on the proposal for Beam's mascot to be the Firefly. This
>>> encompasses the Lampyridae family of insects, without specifying a genus or
>>> species.
>>>
>>> [ ] +1, Approve Firefly being the mascot
>>> [ ] -1, Disapprove Firefly being the mascot
>>>
>>> The vote will be open for at least 72 hours excluding weekends. It is
>>> adopted by at least 3 PMC +1 approval votes, with no PMC -1 disapproval
>>> votes*. Non-PMC votes are still encouraged.
>>>
>>> PMC voters, please help by indicating your vote as "(binding)"
>>>
>>> Kenn
>>>
>>> *I have chosen this format for this vote, even though Beam uses simple
>>> majority as a rule, because I want any PMC member to be able to veto based
>>> on concerns about overlap or trademark.
>>>
>>
>>
>> --
>> []s
>>
>> Leonardo Alves Miguel
>> Data Engineer
>> (16) 3509-5515 | www.arquivei.com.br
>> 
>> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
>> 
>> [image: Google seleciona Arquivei para imersão e mentoria no Vale do
>> Silício]
>> 
>> 
>> 
>> 
>>
>


Re: [VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2020-01-03 Thread Kenneth Knowles
+1 (binding)

On Tue, Dec 17, 2019 at 12:30 PM Leonardo Miguel <
leonardo.mig...@arquivei.com.br> wrote:

> +1
>
> Em sex., 13 de dez. de 2019 às 01:58, Kenneth Knowles 
> escreveu:
>
>> Please vote on the proposal for Beam's mascot to be the Firefly. This
>> encompasses the Lampyridae family of insects, without specifying a genus or
>> species.
>>
>> [ ] +1, Approve Firefly being the mascot
>> [ ] -1, Disapprove Firefly being the mascot
>>
>> The vote will be open for at least 72 hours excluding weekends. It is
>> adopted by at least 3 PMC +1 approval votes, with no PMC -1 disapproval
>> votes*. Non-PMC votes are still encouraged.
>>
>> PMC voters, please help by indicating your vote as "(binding)"
>>
>> Kenn
>>
>> *I have chosen this format for this vote, even though Beam uses simple
>> majority as a rule, because I want any PMC member to be able to veto based
>> on concerns about overlap or trademark.
>>
>
>
> --
> []s
>
> Leonardo Alves Miguel
> Data Engineer
> (16) 3509-5515 | www.arquivei.com.br
> 
> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
> 
> [image: Google seleciona Arquivei para imersão e mentoria no Vale do
> Silício]
> 
> 
> 
> 
>


Re: Jenkins jobs not running for my PR 10438

2020-01-03 Thread Heejong Lee
+1: https://github.com/apache/beam/pull/10051

force-pushing again. retest this please. nothing works :(

On Fri, Jan 3, 2020 at 12:55 AM Michał Walenia 
wrote:

> Hi,
> I'm also affected by this - I touched my PRs opened before the holiday
> break and no jobs were triggered. Do we know what breaks Jenkins/fixes it
> when stuff like this happens?
> Happy new year,
> Michal
>
> On Fri, Jan 3, 2020 at 1:42 AM Kai Jiang  wrote:
>
>> Thanks Alan for checking this out! I closed PR 9903 and reopen it in
>> pull/10493 . It seems new PR
>> still did not trigger jenkins jobs.
>>
>> On Thu, Jan 2, 2020 at 2:55 PM Alan Myrvold  wrote:
>>
>>> Oh, the PR 9903 run is quite old; I don't see a recent one yet.
>>>
>>> On Thu, Jan 2, 2020 at 2:48 PM Alan Myrvold  wrote:
>>>
 For PR 10427, I see
 https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1593/
 For PR 9903, I see
 https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR/22/

 Maybe the PR status is not being updated when the jobs run?


 On Thu, Jan 2, 2020 at 2:37 PM Kai Jiang  wrote:

> same for https://github.com/apache/beam/pull/9903 as well
>
> On Thu, Jan 2, 2020 at 1:40 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> Seems like Jenkins tests are not being triggered for this PR as well:
>> https://github.com/apache/beam/pull/10427
>>
>> On Fri, Dec 20, 2019 at 2:16 PM Tomo Suzuki 
>> wrote:
>>
>>> Jenkins started working. Thank you for whoever fixed it.
>>>
>>> On Fri, Dec 20, 2019 at 1:42 PM Boyuan Zhang 
>>> wrote:
>>> >
>>> > Same here. Even the phrase trigger doesn't work.
>>> >
>>> > On Fri, Dec 20, 2019 at 10:16 AM Luke Cwik 
>>> wrote:
>>> >>
>>> >> I'm also affected by this.
>>> >>
>>> >> On Fri, Dec 20, 2019 at 10:13 AM Tomo Suzuki 
>>> wrote:
>>> >>>
>>> >>> Hi Beam developers,
>>> >>>
>>> >>> Does anybody know why my PR does not trigger Jenkins jobs today?
>>> >>> https://github.com/apache/beam/pull/10438
>>> >>>
>>> >>> --
>>> >>> Regards,
>>> >>> Tomo
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>
>
> --
>
> Michał Walenia
> Polidea  | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.wale...@polidea.com
>
> Unique Tech
> Check out our projects! 
>


Re: Dropping late data in DirectRunner

2020-01-03 Thread Steve Niemitz
I do agree that the direct runner doesn't drop late data arriving at a
stateful DoFn (I just tested as well).

However, I believe this is consistent with other runners.  I'm fairly
certain (at least last time I checked) that at least Dataflow will also
only drop late data at GBK operations, and NOT stateful DoFns.  Whether or
not this is intentional is debatable however, without being able to inspect
the watermark inside the stateful DoFn, it'd be very difficult to do
anything useful with late data.


On Fri, Jan 3, 2020 at 5:47 PM Jan Lukavský  wrote:

> I did write a test that tested if data is dropped in a plain stateful
> DoFn. I did this as part of validating that PR [1] didn't drop more data
> when using @RequiresTimeSortedInput than it would without this annotation.
> This test failed and I didn't commit it, yet.
>
> The test was basically as follows:
>
>  - use TestStream to generate three elements with timestamps 2, 1 and 0
>
>  - between elements with timestamp 1 and 0 move watermark to 1
>
>  - use allowed lateness of zero
>
>  - use stateful dofn that just emits arbitrary data for each input element
>
>  - use Count.globally to count outputs
>
> The outcome was that stateful dofn using @RequiresTimeSortedInput output 2
> elements, without the annotation it was 3 elements. I think the correct one
> would be 2 elements in this case. The difference is caused by the
> annotation having (currently) its own logic for dropping data, which could
> be removed if we agree, that the data should be dropped in all cases.
> On 1/3/20 11:23 PM, Kenneth Knowles wrote:
>
> Did you write such a @Category(ValidatesRunner.class) test? I believe the
> Java  direct runner does drop late data, for both GBK and stateful ParDo.
>
> Stateful ParDo is implemented on top of GBK:
> https://github.com/apache/beam/blob/64262a61402fad67d9ad8a66eaf6322593d3b5dc/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java#L172
>
> And GroupByKey, via DirectGroupByKey, via DirectGroupAlsoByWindow, does
> drop late data:
> https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/runners/direct-java/src/main/java/org/apache/beam/runners/direct/GroupAlsoByWindowEvaluatorFactory.java#L220
>
> I'm not sure why it has its own code, since ReduceFnRunner also drops late
> data, and it does use ReduceFnRunner (the same code path all Java-based
> runners use).
>
> Kenn
>
>
> On Fri, Jan 3, 2020 at 1:02 PM Jan Lukavský  wrote:
>
>> Yes, the non-reliability of late data dropping in distributed runner is
>> understood. But this is even where DirectRunner can play its role, because
>> only there it is actually possible to emulate and test specific watermark
>> conditions. Question regarding this for the java DirectRunner - should we
>> completely drop LataDataDroppingDoFnRunner and delegate the late data
>> dropping to StatefulDoFnRunner? Seems logical to me, as if we agree that
>> late data should always be dropped, then there would no "valid" use of
>> StatefulDoFnRunner without the late data dropping functionality.
>> On 1/3/20 9:32 PM, Robert Bradshaw wrote:
>>
>> I agree, in fact we just recently enabled late data dropping to the
>> direct runner in Python to be able to develop better tests for Dataflow.
>>
>> It should be noted, however, that in a distributed runner (absent the
>> quiessence of TestStream) that one can't *count* on late data being dropped
>> at a certain point, and in fact (due to delays in fully propagating the
>> watermark) late data can even become on-time, so the promises about what
>> happens behind the watermark are necessarily a bit loose.
>>
>> On Fri, Jan 3, 2020 at 9:15 AM Luke Cwik  wrote:
>>
>>> I agree that the DirectRunner should drop late data. Late data dropping
>>> is optional but the DirectRunner is used by many for testing and we should
>>> have the same behaviour they would get on other runners or users may be
>>> surprised.
>>>
>>> On Fri, Jan 3, 2020 at 3:33 AM Jan Lukavský  wrote:
>>>
 Hi,

 I just found out that DirectRunner is apparently not using
 LateDataDroppingDoFnRunner, which means that it doesn't drop late data
 in cases where there is no GBK operation involved (dropping in GBK
 seems
 to be correct). There is apparently no @Category(ValidatesRunner) test
 for that behavior (because DirectRunner would fail it), so the question
 is - should late data dropping be considered part of model (of which
 DirectRunner should be a canonical implementation) and therefore that
 should be fixed there, or is the late data dropping an optional feature
 of a runner?

 I'm strongly in favor of the first option, and I think it is likely
 that
 all real-world runners would probably adhere to that (I didn't check
 that, though).

 Opinions?

   Jan




Re: Dropping late data in DirectRunner

2020-01-03 Thread Jan Lukavský
I did write a test that tested if data is dropped in a plain stateful 
DoFn. I did this as part of validating that PR [1] didn't drop more data 
when using @RequiresTimeSortedInput than it would without this 
annotation. This test failed and I didn't commit it, yet.


The test was basically as follows:

 - use TestStream to generate three elements with timestamps 2, 1 and 0

 - between elements with timestamp 1 and 0 move watermark to 1

 - use allowed lateness of zero

 - use stateful dofn that just emits arbitrary data for each input element

 - use Count.globally to count outputs

The outcome was that stateful dofn using @RequiresTimeSortedInput output 
2 elements, without the annotation it was 3 elements. I think the 
correct one would be 2 elements in this case. The difference is caused 
by the annotation having (currently) its own logic for dropping data, 
which could be removed if we agree, that the data should be dropped in 
all cases.


On 1/3/20 11:23 PM, Kenneth Knowles wrote:
Did you write such a @Category(ValidatesRunner.class) test? I believe 
the Java direct runner does drop late data, for both GBK and stateful 
ParDo.


Stateful ParDo is implemented on top of GBK: 
https://github.com/apache/beam/blob/64262a61402fad67d9ad8a66eaf6322593d3b5dc/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java#L172


And GroupByKey, via DirectGroupByKey, via DirectGroupAlsoByWindow, 
does drop late data: 
https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/runners/direct-java/src/main/java/org/apache/beam/runners/direct/GroupAlsoByWindowEvaluatorFactory.java#L220


I'm not sure why it has its own code, since ReduceFnRunner also drops 
late data, and it does use ReduceFnRunner (the same code path all 
Java-based runners use).


Kenn


On Fri, Jan 3, 2020 at 1:02 PM Jan Lukavský > wrote:


Yes, the non-reliability of late data dropping in distributed
runner is understood. But this is even where DirectRunner can play
its role, because only there it is actually possible to emulate
and test specific watermark conditions. Question regarding this
for the java DirectRunner - should we completely drop
LataDataDroppingDoFnRunner and delegate the late data dropping to
StatefulDoFnRunner? Seems logical to me, as if we agree that late
data should always be dropped, then there would no "valid" use of
StatefulDoFnRunner without the late data dropping functionality.

On 1/3/20 9:32 PM, Robert Bradshaw wrote:

I agree, in fact we just recently enabled late data dropping to
the direct runner in Python to be able to develop better tests
for Dataflow.

It should be noted, however, that in a distributed runner (absent
the quiessence of TestStream) that one can't *count* on late data
being dropped at a certain point, and in fact (due to delays in
fully propagating the watermark) late data can even become
on-time, so the promises about what happens behind the
watermark are necessarily a bit loose.

On Fri, Jan 3, 2020 at 9:15 AM Luke Cwik mailto:lc...@google.com>> wrote:

I agree that the DirectRunner should drop late data. Late
data dropping is optional but the DirectRunner is used by
many for testing and we should have the same behaviour they
would get on other runners or users may be surprised.

On Fri, Jan 3, 2020 at 3:33 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:

Hi,

I just found out that DirectRunner is apparently not using
LateDataDroppingDoFnRunner, which means that it doesn't
drop late data
in cases where there is no GBK operation involved
(dropping in GBK seems
to be correct). There is apparently no
@Category(ValidatesRunner) test
for that behavior (because DirectRunner would fail it),
so the question
is - should late data dropping be considered part of
model (of which
DirectRunner should be a canonical implementation) and
therefore that
should be fixed there, or is the late data dropping an
optional feature
of a runner?

I'm strongly in favor of the first option, and I think it
is likely that
all real-world runners would probably adhere to that (I
didn't check
that, though).

Opinions?

  Jan



Re: Dropping late data in DirectRunner

2020-01-03 Thread Kenneth Knowles
Did you write such a @Category(ValidatesRunner.class) test? I believe the
Java  direct runner does drop late data, for both GBK and stateful ParDo.

Stateful ParDo is implemented on top of GBK:
https://github.com/apache/beam/blob/64262a61402fad67d9ad8a66eaf6322593d3b5dc/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java#L172

And GroupByKey, via DirectGroupByKey, via DirectGroupAlsoByWindow, does
drop late data:
https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/runners/direct-java/src/main/java/org/apache/beam/runners/direct/GroupAlsoByWindowEvaluatorFactory.java#L220

I'm not sure why it has its own code, since ReduceFnRunner also drops late
data, and it does use ReduceFnRunner (the same code path all Java-based
runners use).

Kenn


On Fri, Jan 3, 2020 at 1:02 PM Jan Lukavský  wrote:

> Yes, the non-reliability of late data dropping in distributed runner is
> understood. But this is even where DirectRunner can play its role, because
> only there it is actually possible to emulate and test specific watermark
> conditions. Question regarding this for the java DirectRunner - should we
> completely drop LataDataDroppingDoFnRunner and delegate the late data
> dropping to StatefulDoFnRunner? Seems logical to me, as if we agree that
> late data should always be dropped, then there would no "valid" use of
> StatefulDoFnRunner without the late data dropping functionality.
> On 1/3/20 9:32 PM, Robert Bradshaw wrote:
>
> I agree, in fact we just recently enabled late data dropping to the direct
> runner in Python to be able to develop better tests for Dataflow.
>
> It should be noted, however, that in a distributed runner (absent the
> quiessence of TestStream) that one can't *count* on late data being dropped
> at a certain point, and in fact (due to delays in fully propagating the
> watermark) late data can even become on-time, so the promises about what
> happens behind the watermark are necessarily a bit loose.
>
> On Fri, Jan 3, 2020 at 9:15 AM Luke Cwik  wrote:
>
>> I agree that the DirectRunner should drop late data. Late data dropping
>> is optional but the DirectRunner is used by many for testing and we should
>> have the same behaviour they would get on other runners or users may be
>> surprised.
>>
>> On Fri, Jan 3, 2020 at 3:33 AM Jan Lukavský  wrote:
>>
>>> Hi,
>>>
>>> I just found out that DirectRunner is apparently not using
>>> LateDataDroppingDoFnRunner, which means that it doesn't drop late data
>>> in cases where there is no GBK operation involved (dropping in GBK seems
>>> to be correct). There is apparently no @Category(ValidatesRunner) test
>>> for that behavior (because DirectRunner would fail it), so the question
>>> is - should late data dropping be considered part of model (of which
>>> DirectRunner should be a canonical implementation) and therefore that
>>> should be fixed there, or is the late data dropping an optional feature
>>> of a runner?
>>>
>>> I'm strongly in favor of the first option, and I think it is likely that
>>> all real-world runners would probably adhere to that (I didn't check
>>> that, though).
>>>
>>> Opinions?
>>>
>>>   Jan
>>>
>>>


Re: Edit access to Wiki

2020-01-03 Thread Kirill Kozlov
Thank you!

On Fri, Jan 3, 2020 at 10:39 AM Luke Cwik  wrote:

> I have added you. Happy editing.
>
> On Fri, Jan 3, 2020 at 10:31 AM Kirill Kozlov 
> wrote:
>
>> Hello everyone!
>>
>> I was hoping to add a design doc for SQL push-down [1] to the Wiki page
>> [2], but I need edit access.
>> What is the process for obtaining edit access?
>> My wiki username is: Kirill Kozlov
>>
>> [1]
>> https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit
>> [2] https://cwiki.apache.org/confluence/display/BEAM/Design+Documents
>>
>


Re: Dropping late data in DirectRunner

2020-01-03 Thread Jan Lukavský
Yes, the non-reliability of late data dropping in distributed runner is 
understood. But this is even where DirectRunner can play its role, 
because only there it is actually possible to emulate and test specific 
watermark conditions. Question regarding this for the java DirectRunner 
- should we completely drop LataDataDroppingDoFnRunner and delegate the 
late data dropping to StatefulDoFnRunner? Seems logical to me, as if we 
agree that late data should always be dropped, then there would no 
"valid" use of StatefulDoFnRunner without the late data dropping 
functionality.


On 1/3/20 9:32 PM, Robert Bradshaw wrote:
I agree, in fact we just recently enabled late data dropping to the 
direct runner in Python to be able to develop better tests for Dataflow.


It should be noted, however, that in a distributed runner (absent the 
quiessence of TestStream) that one can't *count* on late data being 
dropped at a certain point, and in fact (due to delays in fully 
propagating the watermark) late data can even become on-time, so the 
promises about what happens behind the watermark are necessarily a bit 
loose.


On Fri, Jan 3, 2020 at 9:15 AM Luke Cwik > wrote:


I agree that the DirectRunner should drop late data. Late data
dropping is optional but the DirectRunner is used by many for
testing and we should have the same behaviour they would get on
other runners or users may be surprised.

On Fri, Jan 3, 2020 at 3:33 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:

Hi,

I just found out that DirectRunner is apparently not using
LateDataDroppingDoFnRunner, which means that it doesn't drop
late data
in cases where there is no GBK operation involved (dropping in
GBK seems
to be correct). There is apparently no
@Category(ValidatesRunner) test
for that behavior (because DirectRunner would fail it), so the
question
is - should late data dropping be considered part of model (of
which
DirectRunner should be a canonical implementation) and
therefore that
should be fixed there, or is the late data dropping an
optional feature
of a runner?

I'm strongly in favor of the first option, and I think it is
likely that
all real-world runners would probably adhere to that (I didn't
check
that, though).

Opinions?

  Jan



Re: Dropping late data in DirectRunner

2020-01-03 Thread Robert Bradshaw
I agree, in fact we just recently enabled late data dropping to the direct
runner in Python to be able to develop better tests for Dataflow.

It should be noted, however, that in a distributed runner (absent the
quiessence of TestStream) that one can't *count* on late data being dropped
at a certain point, and in fact (due to delays in fully propagating the
watermark) late data can even become on-time, so the promises about what
happens behind the watermark are necessarily a bit loose.

On Fri, Jan 3, 2020 at 9:15 AM Luke Cwik  wrote:

> I agree that the DirectRunner should drop late data. Late data dropping is
> optional but the DirectRunner is used by many for testing and we should
> have the same behaviour they would get on other runners or users may be
> surprised.
>
> On Fri, Jan 3, 2020 at 3:33 AM Jan Lukavský  wrote:
>
>> Hi,
>>
>> I just found out that DirectRunner is apparently not using
>> LateDataDroppingDoFnRunner, which means that it doesn't drop late data
>> in cases where there is no GBK operation involved (dropping in GBK seems
>> to be correct). There is apparently no @Category(ValidatesRunner) test
>> for that behavior (because DirectRunner would fail it), so the question
>> is - should late data dropping be considered part of model (of which
>> DirectRunner should be a canonical implementation) and therefore that
>> should be fixed there, or is the late data dropping an optional feature
>> of a runner?
>>
>> I'm strongly in favor of the first option, and I think it is likely that
>> all real-world runners would probably adhere to that (I didn't check
>> that, though).
>>
>> Opinions?
>>
>>   Jan
>>
>>


Re: Edit access to Wiki

2020-01-03 Thread Luke Cwik
I have added you. Happy editing.

On Fri, Jan 3, 2020 at 10:31 AM Kirill Kozlov 
wrote:

> Hello everyone!
>
> I was hoping to add a design doc for SQL push-down [1] to the Wiki page
> [2], but I need edit access.
> What is the process for obtaining edit access?
> My wiki username is: Kirill Kozlov
>
> [1]
> https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit
> [2] https://cwiki.apache.org/confluence/display/BEAM/Design+Documents
>


Edit access to Wiki

2020-01-03 Thread Kirill Kozlov
Hello everyone!

I was hoping to add a design doc for SQL push-down [1] to the Wiki page
[2], but I need edit access.
What is the process for obtaining edit access?
My wiki username is: Kirill Kozlov

[1]
https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit
[2] https://cwiki.apache.org/confluence/display/BEAM/Design+Documents


Re: Dropping late data in DirectRunner

2020-01-03 Thread Luke Cwik
I agree that the DirectRunner should drop late data. Late data dropping is
optional but the DirectRunner is used by many for testing and we should
have the same behaviour they would get on other runners or users may be
surprised.

On Fri, Jan 3, 2020 at 3:33 AM Jan Lukavský  wrote:

> Hi,
>
> I just found out that DirectRunner is apparently not using
> LateDataDroppingDoFnRunner, which means that it doesn't drop late data
> in cases where there is no GBK operation involved (dropping in GBK seems
> to be correct). There is apparently no @Category(ValidatesRunner) test
> for that behavior (because DirectRunner would fail it), so the question
> is - should late data dropping be considered part of model (of which
> DirectRunner should be a canonical implementation) and therefore that
> should be fixed there, or is the late data dropping an optional feature
> of a runner?
>
> I'm strongly in favor of the first option, and I think it is likely that
> all real-world runners would probably adhere to that (I didn't check
> that, though).
>
> Opinions?
>
>   Jan
>
>


Re: Contributor permission for Beam Jira tickets

2020-01-03 Thread Ismaël Mejía
Done, welcome!

On Fri, Jan 3, 2020 at 1:56 AM Xia Bingfeng  wrote:

> Hi Ismaël,
>
> My JIRA id is xiabingfeng
>
>
> On Thu, Jan 2, 2020 at 4:37 PM Ismaël Mejía  wrote:
>
>> Hello, What is your JIRA id?
>>
>>
>> On Fri, Jan 3, 2020 at 12:38 AM Xia Bingfeng 
>> wrote:
>>
>>> Hi,
>>>
>>> Can someone add me as a contributor for Beam's Jira issue tracker? I
>>> plan to work on Nexmark (BEAM-4763) for Beam SamzaRunner.
>>>
>>> Thanks! Happy new year!
>>>
>>> Best,
>>> Bingfeng
>>>
>>> --
>>> Bingfeng Xia
>>> A la recherche de l'orange bleue.
>>>
>>
>
> --
> Bingfeng Xia
> A la recherche de l'orange bleue.
>


Dropping late data in DirectRunner

2020-01-03 Thread Jan Lukavský

Hi,

I just found out that DirectRunner is apparently not using 
LateDataDroppingDoFnRunner, which means that it doesn't drop late data 
in cases where there is no GBK operation involved (dropping in GBK seems 
to be correct). There is apparently no @Category(ValidatesRunner) test 
for that behavior (because DirectRunner would fail it), so the question 
is - should late data dropping be considered part of model (of which 
DirectRunner should be a canonical implementation) and therefore that 
should be fixed there, or is the late data dropping an optional feature 
of a runner?


I'm strongly in favor of the first option, and I think it is likely that 
all real-world runners would probably adhere to that (I didn't check 
that, though).


Opinions?

 Jan



Re: [ANNOUNCE] New committer: Kasia Kucharczyk

2020-01-03 Thread Kamil Wasilewski
Congrats Kasia, good job!

On Fri, Jan 3, 2020 at 8:22 AM Michał Walenia 
wrote:

> Congratulations, Kasia!
>
> On Thu, Jan 2, 2020 at 6:52 PM Valentyn Tymofieiev 
> wrote:
>
>> Congratulations, Kasia!
>>
>> On Thu, Jan 2, 2020 at 1:23 AM Katarzyna Kucharczyk <
>> ka.kucharc...@gmail.com> wrote:
>>
>>> Thank you everyone! I will try do my best as a committer :)
>>>
>>> On Thu, Dec 26, 2019 at 7:08 PM Cyrus Maden  wrote:
>>>
 Congrats Kasai!

 On Tue, Dec 24, 2019 at 7:07 PM Thomas Weise  wrote:

> Congratulations!
>
>
> On Mon, Dec 23, 2019 at 1:39 PM Udi Meiri  wrote:
>
>> Congrats Kasia!
>>
>> On Mon, Dec 23, 2019 at 1:23 PM Kyle Weaver 
>> wrote:
>>
>>> Congrats Kasia! And thanks for sharing, Pablo.
>>>
>>> On Mon, Dec 23, 2019 at 4:16 PM Pablo Estrada 
>>> wrote:
>>>
 Hi everyone,

 Please join me and the rest of the Beam PMC in welcoming a new
 committer: Kasia Kucharczyk

 Kasia has contributed to Beam in many ways, including the
 performance testing infrastructure, and has even spoken at events about
 Beam.

 In consideration of Kasia's contributions, the Beam PMC trusts her
 with the responsibilities of a Beam committer[1].

 Thanks for your contributions Kasia!

 Pablo, on behalf of the Apache Beam PMC.

 [1] https://beam.apache.org/contribute/become-a-committer
 /#an-apache-beam-committer

>>>
>
> --
>
> Michał Walenia
> Polidea  | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.wale...@polidea.com
>
> Unique Tech
> Check out our projects! 
>


Re: Jenkins jobs not running for my PR 10438

2020-01-03 Thread Michał Walenia
Hi,
I'm also affected by this - I touched my PRs opened before the holiday
break and no jobs were triggered. Do we know what breaks Jenkins/fixes it
when stuff like this happens?
Happy new year,
Michal

On Fri, Jan 3, 2020 at 1:42 AM Kai Jiang  wrote:

> Thanks Alan for checking this out! I closed PR 9903 and reopen it in
> pull/10493 . It seems new PR
> still did not trigger jenkins jobs.
>
> On Thu, Jan 2, 2020 at 2:55 PM Alan Myrvold  wrote:
>
>> Oh, the PR 9903 run is quite old; I don't see a recent one yet.
>>
>> On Thu, Jan 2, 2020 at 2:48 PM Alan Myrvold  wrote:
>>
>>> For PR 10427, I see
>>> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1593/
>>> For PR 9903, I see
>>> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR/22/
>>>
>>> Maybe the PR status is not being updated when the jobs run?
>>>
>>>
>>> On Thu, Jan 2, 2020 at 2:37 PM Kai Jiang  wrote:
>>>
 same for https://github.com/apache/beam/pull/9903 as well

 On Thu, Jan 2, 2020 at 1:40 PM Chamikara Jayalath 
 wrote:

> Seems like Jenkins tests are not being triggered for this PR as well:
> https://github.com/apache/beam/pull/10427
>
> On Fri, Dec 20, 2019 at 2:16 PM Tomo Suzuki 
> wrote:
>
>> Jenkins started working. Thank you for whoever fixed it.
>>
>> On Fri, Dec 20, 2019 at 1:42 PM Boyuan Zhang 
>> wrote:
>> >
>> > Same here. Even the phrase trigger doesn't work.
>> >
>> > On Fri, Dec 20, 2019 at 10:16 AM Luke Cwik 
>> wrote:
>> >>
>> >> I'm also affected by this.
>> >>
>> >> On Fri, Dec 20, 2019 at 10:13 AM Tomo Suzuki 
>> wrote:
>> >>>
>> >>> Hi Beam developers,
>> >>>
>> >>> Does anybody know why my PR does not trigger Jenkins jobs today?
>> >>> https://github.com/apache/beam/pull/10438
>> >>>
>> >>> --
>> >>> Regards,
>> >>> Tomo
>>
>>
>>
>> --
>> Regards,
>> Tomo
>>
>

-- 

Michał Walenia
Polidea  | Software Engineer

M: +48 791 432 002 <+48791432002>
E: michal.wale...@polidea.com

Unique Tech
Check out our projects!