Re: [ANNOUNCE] New committer: Kasia Kucharczyk

2020-01-02 Thread Michał Walenia
Congratulations, Kasia!

On Thu, Jan 2, 2020 at 6:52 PM Valentyn Tymofieiev 
wrote:

> Congratulations, Kasia!
>
> On Thu, Jan 2, 2020 at 1:23 AM Katarzyna Kucharczyk <
> ka.kucharc...@gmail.com> wrote:
>
>> Thank you everyone! I will try do my best as a committer :)
>>
>> On Thu, Dec 26, 2019 at 7:08 PM Cyrus Maden  wrote:
>>
>>> Congrats Kasai!
>>>
>>> On Tue, Dec 24, 2019 at 7:07 PM Thomas Weise  wrote:
>>>
 Congratulations!


 On Mon, Dec 23, 2019 at 1:39 PM Udi Meiri  wrote:

> Congrats Kasia!
>
> On Mon, Dec 23, 2019 at 1:23 PM Kyle Weaver 
> wrote:
>
>> Congrats Kasia! And thanks for sharing, Pablo.
>>
>> On Mon, Dec 23, 2019 at 4:16 PM Pablo Estrada 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Please join me and the rest of the Beam PMC in welcoming a new
>>> committer: Kasia Kucharczyk
>>>
>>> Kasia has contributed to Beam in many ways, including the
>>> performance testing infrastructure, and has even spoken at events about
>>> Beam.
>>>
>>> In consideration of Kasia's contributions, the Beam PMC trusts her
>>> with the responsibilities of a Beam committer[1].
>>>
>>> Thanks for your contributions Kasia!
>>>
>>> Pablo, on behalf of the Apache Beam PMC.
>>>
>>> [1] https://beam.apache.org/contribute/become-a-committer
>>> /#an-apache-beam-committer
>>>
>>

-- 

Michał Walenia
Polidea  | Software Engineer

M: +48 791 432 002 <+48791432002>
E: michal.wale...@polidea.com

Unique Tech
Check out our projects! 


BigQueryUtils improvements for Avro Bytes / Timestamp (millis)

2020-01-02 Thread Ryan Berti
Hello,

I just wanted to send a quick note to the mailing list letting you know I
opened two tickets:

https://issues.apache.org/jira/browse/BEAM-9051
https://issues.apache.org/jira/browse/BEAM-9052

In both cases, I encountered the lack of functionality after using an
external library to generate Avro GenericRecords from Scala case classes.
These records could be written via ParquetIO without any issues, but when
converting them to Beam Rows, and then BigQuery TableRows (via AvroUtils
and BigQueryUtils), I found that the records were rejected by BigQuery.
I'll setup some unit tests and open PRs in the next week or so to add this
functionality.

Thanks!
Ryan


Re: Contributor permission for Beam Jira tickets

2020-01-02 Thread Xia Bingfeng
Hi Ismaël,

My JIRA id is xiabingfeng


On Thu, Jan 2, 2020 at 4:37 PM Ismaël Mejía  wrote:

> Hello, What is your JIRA id?
>
>
> On Fri, Jan 3, 2020 at 12:38 AM Xia Bingfeng 
> wrote:
>
>> Hi,
>>
>> Can someone add me as a contributor for Beam's Jira issue tracker? I plan
>> to work on Nexmark (BEAM-4763) for Beam SamzaRunner.
>>
>> Thanks! Happy new year!
>>
>> Best,
>> Bingfeng
>>
>> --
>> Bingfeng Xia
>> A la recherche de l'orange bleue.
>>
>

-- 
Bingfeng Xia
A la recherche de l'orange bleue.


Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-02 Thread Robert Bradshaw
(Other than that everything looks fine.)

On Thu, Jan 2, 2020 at 4:44 PM Robert Bradshaw  wrote:
>
> -1
>
> I'm having trouble verifying the signatures on the release artifacts.
> When I try to import the key from
> https://dist.apache.org/repos/dist/release/beam/KEYS I get
>
> pub   rsa4096 2019-10-22 [SC]
>   79552F5C2FD869A08E097F96841855FB73AFFC7F
> uid   [ unknown] Mikhail Gryzykhin (mikhail) 
> sub   rsa4096 2019-10-22 [E]
>
> which is not the key that these artifacts were signed with.
>
>
> On Thu, Jan 2, 2020 at 4:23 PM Reuven Lax  wrote:
> >
> > +1
> >
> > On Thu, Jan 2, 2020 at 3:02 PM Valentyn Tymofieiev  
> > wrote:
> >>
> >> +1. Validated Batch and Streaming quickstarts on Python 3.7 (using wheels) 
> >> and Batch Mobile Gaming examples (user score, hourly team score) on 
> >> Dataflow.
> >>
> >> On Thu, Jan 2, 2020 at 11:23 AM Ahmet Altay  wrote:
> >>>
> >>> This vote needs at least one more PMC vote before it can be finalized. 
> >>> Could you please validate and vote?
> >>>
> >>> On Mon, Dec 23, 2019 at 9:44 AM Luke Cwik  wrote:
> 
>  +1, I validated the Java quickstarts for the runners and the issues I 
>  have brought up have been moved to a future release.
> 
>  On Fri, Dec 20, 2019 at 8:09 PM Ahmet Altay  wrote:
> >
> > +1, I validated the python2 quick starts using wheels. Thank you for 
> > pushing the release this far.
> >
> > On Thu, Dec 19, 2019 at 1:27 PM Kenneth Knowles  wrote:
> >>
> >> I verified the Java quickstart on Dataflow manually.
> >>
> >> Kenn
> >>
> >> On Wed, Dec 18, 2019 at 5:58 PM jincheng sun 
> >>  wrote:
> >>>
> >>> Thanks for drive this release Mikhail !
> >>>
> >>> I have found there is an incorrect release version for release notes 
> >>> in PR[1], also left a question in PR[2].
> >>>
> >>> But I do not think it's the blocker of the release :)
> >>>
> >>> Best,
> >>> Jincheng
> >>>
> >>> [1] https://github.com/apache/beam/pull/10401
> >>> [2] https://github.com/apache/beam/pull/10402
> >>>
> >>>
> >>> Ahmet Altay  于2019年12月19日周四 上午3:31写道:
> 
>  I validated python quickstarts with python 2. Wheels file are 
>  missing but they work otherwise. Once the wheel files are added I 
>  will add my vote.
> 
>  On Wed, Dec 18, 2019 at 10:00 AM Luke Cwik  wrote:
> >
> > I verified the release and ran the quickstarts and found that 
> > release 2.16 broke Apache Nemo runner which is also an issue for 
> > 2.17.0 RC #2. It is caused by a backwards incompatible change in 
> > ParDo.MultiOutput where getSideInputs return value was changed from 
> > List to Map as part of https://github.com/apache/beam/pull/9275. I 
> > filed https://issues.apache.org/jira/browse/BEAM-8989 to track the 
> > issue.
> >
> > Should we re-add the method back in 2.17.0 renaming the newly added 
> > method to something else and also patch 2.16.0 with a minor change 
> > including the same fix (breaking 2.16.0 users who picked up the new 
> > method) or leave as is?
> 
> 
>  I suggest not fixing this for 2.17, because the issue already exists 
>  in 2.16 and there are two releases in parallel and it would be fine 
>  to fix this for 2.18 or 2.19.
> 
>  +Reuven Lax, who merged the mentioned PR.
> 
> >
> >
> > On Tue, Dec 17, 2019 at 12:13 PM Mikhail Gryzykhin 
> >  wrote:
> >>
> >> Hi everyone,
> >>
> >>
> >> Please review and vote on the release candidate #2 for the version 
> >> 2.17.0, as follows:
> >>
> >> [ ] +1, Approve the release
> >>
> >> [ ] -1, Do not approve the release (please provide specific 
> >> comments)
> >>
> >>
> >>
> >> The complete staging area is available for your review, which 
> >> includes:
> >>
> >> * JIRA release notes [1],
> >>
> >> * the official Apache source release to be deployed to 
> >> dist.apache.org [2], which is signed with the key with fingerprint 
> >> 53F72D4EEEF306D97736FE1065ABB07A8965E788
> >>
> >>  [3],
> >>
> >> * all artifacts to be deployed to the Maven Central Repository [4],
> >>
> >> * source code tag "v2.17.0-RC2" [5],
> >>
> >> * website pull request listing the release [6], publishing the API 
> >> reference manual [7], and the blog post [8].
> >>
> >> * Python artifacts are deployed along with the source release to 
> >> the dist.apache.org [2].
> >>
> >> * Validation sheet with a tab for 2.17.0 release to help with 
> >> validation [9].
> >>
> >> * 

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-02 Thread Robert Bradshaw
-1

I'm having trouble verifying the signatures on the release artifacts.
When I try to import the key from
https://dist.apache.org/repos/dist/release/beam/KEYS I get

pub   rsa4096 2019-10-22 [SC]
  79552F5C2FD869A08E097F96841855FB73AFFC7F
uid   [ unknown] Mikhail Gryzykhin (mikhail) 
sub   rsa4096 2019-10-22 [E]

which is not the key that these artifacts were signed with.


On Thu, Jan 2, 2020 at 4:23 PM Reuven Lax  wrote:
>
> +1
>
> On Thu, Jan 2, 2020 at 3:02 PM Valentyn Tymofieiev  
> wrote:
>>
>> +1. Validated Batch and Streaming quickstarts on Python 3.7 (using wheels) 
>> and Batch Mobile Gaming examples (user score, hourly team score) on Dataflow.
>>
>> On Thu, Jan 2, 2020 at 11:23 AM Ahmet Altay  wrote:
>>>
>>> This vote needs at least one more PMC vote before it can be finalized. 
>>> Could you please validate and vote?
>>>
>>> On Mon, Dec 23, 2019 at 9:44 AM Luke Cwik  wrote:

 +1, I validated the Java quickstarts for the runners and the issues I have 
 brought up have been moved to a future release.

 On Fri, Dec 20, 2019 at 8:09 PM Ahmet Altay  wrote:
>
> +1, I validated the python2 quick starts using wheels. Thank you for 
> pushing the release this far.
>
> On Thu, Dec 19, 2019 at 1:27 PM Kenneth Knowles  wrote:
>>
>> I verified the Java quickstart on Dataflow manually.
>>
>> Kenn
>>
>> On Wed, Dec 18, 2019 at 5:58 PM jincheng sun  
>> wrote:
>>>
>>> Thanks for drive this release Mikhail !
>>>
>>> I have found there is an incorrect release version for release notes in 
>>> PR[1], also left a question in PR[2].
>>>
>>> But I do not think it's the blocker of the release :)
>>>
>>> Best,
>>> Jincheng
>>>
>>> [1] https://github.com/apache/beam/pull/10401
>>> [2] https://github.com/apache/beam/pull/10402
>>>
>>>
>>> Ahmet Altay  于2019年12月19日周四 上午3:31写道:

 I validated python quickstarts with python 2. Wheels file are missing 
 but they work otherwise. Once the wheel files are added I will add my 
 vote.

 On Wed, Dec 18, 2019 at 10:00 AM Luke Cwik  wrote:
>
> I verified the release and ran the quickstarts and found that release 
> 2.16 broke Apache Nemo runner which is also an issue for 2.17.0 RC 
> #2. It is caused by a backwards incompatible change in 
> ParDo.MultiOutput where getSideInputs return value was changed from 
> List to Map as part of https://github.com/apache/beam/pull/9275. I 
> filed https://issues.apache.org/jira/browse/BEAM-8989 to track the 
> issue.
>
> Should we re-add the method back in 2.17.0 renaming the newly added 
> method to something else and also patch 2.16.0 with a minor change 
> including the same fix (breaking 2.16.0 users who picked up the new 
> method) or leave as is?


 I suggest not fixing this for 2.17, because the issue already exists 
 in 2.16 and there are two releases in parallel and it would be fine to 
 fix this for 2.18 or 2.19.

 +Reuven Lax, who merged the mentioned PR.

>
>
> On Tue, Dec 17, 2019 at 12:13 PM Mikhail Gryzykhin 
>  wrote:
>>
>> Hi everyone,
>>
>>
>> Please review and vote on the release candidate #2 for the version 
>> 2.17.0, as follows:
>>
>> [ ] +1, Approve the release
>>
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>>
>> The complete staging area is available for your review, which 
>> includes:
>>
>> * JIRA release notes [1],
>>
>> * the official Apache source release to be deployed to 
>> dist.apache.org [2], which is signed with the key with fingerprint 
>> 53F72D4EEEF306D97736FE1065ABB07A8965E788
>>
>>  [3],
>>
>> * all artifacts to be deployed to the Maven Central Repository [4],
>>
>> * source code tag "v2.17.0-RC2" [5],
>>
>> * website pull request listing the release [6], publishing the API 
>> reference manual [7], and the blog post [8].
>>
>> * Python artifacts are deployed along with the source release to the 
>> dist.apache.org [2].
>>
>> * Validation sheet with a tab for 2.17.0 release to help with 
>> validation [9].
>>
>> * Docker images published to Docker Hub [10].
>>
>>
>> The vote will be open for at least 72 hours. It is adopted by 
>> majority approval, with at least 3 PMC affirmative votes.
>>
>>
>> Thanks,
>>
>> --Mikhail
>>
>>
>> [1] 
>> 

Re: Jenkins jobs not running for my PR 10438

2020-01-02 Thread Kai Jiang
Thanks Alan for checking this out! I closed PR 9903 and reopen it in
pull/10493 . It seems new PR
still did not trigger jenkins jobs.

On Thu, Jan 2, 2020 at 2:55 PM Alan Myrvold  wrote:

> Oh, the PR 9903 run is quite old; I don't see a recent one yet.
>
> On Thu, Jan 2, 2020 at 2:48 PM Alan Myrvold  wrote:
>
>> For PR 10427, I see
>> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1593/
>> For PR 9903, I see
>> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR/22/
>>
>> Maybe the PR status is not being updated when the jobs run?
>>
>>
>> On Thu, Jan 2, 2020 at 2:37 PM Kai Jiang  wrote:
>>
>>> same for https://github.com/apache/beam/pull/9903 as well
>>>
>>> On Thu, Jan 2, 2020 at 1:40 PM Chamikara Jayalath 
>>> wrote:
>>>
 Seems like Jenkins tests are not being triggered for this PR as well:
 https://github.com/apache/beam/pull/10427

 On Fri, Dec 20, 2019 at 2:16 PM Tomo Suzuki  wrote:

> Jenkins started working. Thank you for whoever fixed it.
>
> On Fri, Dec 20, 2019 at 1:42 PM Boyuan Zhang 
> wrote:
> >
> > Same here. Even the phrase trigger doesn't work.
> >
> > On Fri, Dec 20, 2019 at 10:16 AM Luke Cwik  wrote:
> >>
> >> I'm also affected by this.
> >>
> >> On Fri, Dec 20, 2019 at 10:13 AM Tomo Suzuki 
> wrote:
> >>>
> >>> Hi Beam developers,
> >>>
> >>> Does anybody know why my PR does not trigger Jenkins jobs today?
> >>> https://github.com/apache/beam/pull/10438
> >>>
> >>> --
> >>> Regards,
> >>> Tomo
>
>
>
> --
> Regards,
> Tomo
>



Re: Contributor permission for Beam Jira tickets

2020-01-02 Thread Ismaël Mejía
Hello, What is your JIRA id?


On Fri, Jan 3, 2020 at 12:38 AM Xia Bingfeng  wrote:

> Hi,
>
> Can someone add me as a contributor for Beam's Jira issue tracker? I plan
> to work on Nexmark (BEAM-4763) for Beam SamzaRunner.
>
> Thanks! Happy new year!
>
> Best,
> Bingfeng
>
> --
> Bingfeng Xia
> A la recherche de l'orange bleue.
>


Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-02 Thread Reuven Lax
+1

On Thu, Jan 2, 2020 at 3:02 PM Valentyn Tymofieiev 
wrote:

> +1. Validated Batch and Streaming quickstarts on Python 3.7 (using wheels)
> and Batch Mobile Gaming examples (user score, hourly team score) on
> Dataflow.
>
> On Thu, Jan 2, 2020 at 11:23 AM Ahmet Altay  wrote:
>
>> This vote needs at least one more PMC vote before it can be finalized.
>> Could you please validate and vote?
>>
>> On Mon, Dec 23, 2019 at 9:44 AM Luke Cwik  wrote:
>>
>>> +1, I validated the Java quickstarts for the runners and the issues I
>>> have brought up have been moved to a future release.
>>>
>>> On Fri, Dec 20, 2019 at 8:09 PM Ahmet Altay  wrote:
>>>
 +1, I validated the python2 quick starts using wheels. Thank you for
 pushing the release this far.

 On Thu, Dec 19, 2019 at 1:27 PM Kenneth Knowles  wrote:

> I verified the Java quickstart on Dataflow manually.
>
> Kenn
>
> On Wed, Dec 18, 2019 at 5:58 PM jincheng sun 
> wrote:
>
>> Thanks for drive this release Mikhail !
>>
>> I have found there is an incorrect release version for release notes
>> in PR[1], also left a question in PR[2].
>>
>> But I do not think it's the blocker of the release :)
>>
>> Best,
>> Jincheng
>>
>> [1] https://github.com/apache/beam/pull/10401
>> [2] https://github.com/apache/beam/pull/10402
>>
>>
>> Ahmet Altay  于2019年12月19日周四 上午3:31写道:
>>
>>> I validated python quickstarts with python 2. Wheels file are
>>> missing but they work otherwise. Once the wheel files are added I will 
>>> add
>>> my vote.
>>>
>>> On Wed, Dec 18, 2019 at 10:00 AM Luke Cwik  wrote:
>>>
 I verified the release and ran the quickstarts and found that
 release 2.16 broke Apache Nemo runner which is also an issue for 
 2.17.0 RC
 #2. It is caused by a backwards incompatible change in 
 ParDo.MultiOutput
 where getSideInputs return value was changed from List to Map as part 
 of
 https://github.com/apache/beam/pull/9275. I filed
 https://issues.apache.org/jira/browse/BEAM-8989 to track the issue.

 Should we re-add the method back in 2.17.0 renaming the newly added
 method to something else and also patch 2.16.0 with a minor change
 including the same fix (breaking 2.16.0 users who picked up the new 
 method)
 or leave as is?

>>>
>>> I suggest not fixing this for 2.17, because the issue already exists
>>> in 2.16 and there are two releases in parallel and it would be fine to 
>>> fix
>>> this for 2.18 or 2.19.
>>>
>>> +Reuven Lax , who merged the mentioned PR.
>>>
>>>

 On Tue, Dec 17, 2019 at 12:13 PM Mikhail Gryzykhin <
 mig...@google.com> wrote:

> Hi everyone,
>
>
> Please review and vote on the release candidate #2 for the version
> 2.17.0, as follows:
>
> [ ] +1, Approve the release
>
> [ ] -1, Do not approve the release (please provide specific
> comments)
>
>
> The complete staging area is available for your review, which
> includes:
>
> * JIRA release notes [1],
>
> * the official Apache source release to be deployed to
> dist.apache.org [2], which is signed with the key with
> fingerprint 53F72D4EEEF306D97736FE1065ABB07A8965E788
>
>  [3],
>
> * all artifacts to be deployed to the Maven Central Repository [4],
>
> * source code tag "v2.17.0-RC2" [5],
>
> * website pull request listing the release [6], publishing the API
> reference manual [7], and the blog post [8].
>
> * Python artifacts are deployed along with the source release to
> the dist.apache.org [2].
>
> * Validation sheet with a tab for 2.17.0 release to help with
> validation [9].
>
> * Docker images published to Docker Hub [10].
>
> The vote will be open for at least 72 hours. It is adopted by
> majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
>
> --Mikhail
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345970=12319527
>
> [2] https://dist.apache.org/repos/dist/dev/beam/2.17.0/
>
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>
> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1087/
>
> [5] https://github.com/apache/beam/tree/v2.17.0-RC2
>
> [6] https://github.com/apache/beam/pull/10401
>
> [7] https://github.com/apache/beam-site/pull/594
>
> [8] https://github.com/apache/beam/pull/10402

Contributor permission for Beam Jira tickets

2020-01-02 Thread Xia Bingfeng
Hi,

Can someone add me as a contributor for Beam's Jira issue tracker? I plan
to work on Nexmark (BEAM-4763) for Beam SamzaRunner.

Thanks! Happy new year!

Best,
Bingfeng

-- 
Bingfeng Xia
A la recherche de l'orange bleue.


Re: External transform API in Java SDK

2020-01-02 Thread Heejong Lee
If we pass in TypeDescriptor objects instead of Java type information for
the compiler, we could match the returning coders and the given type
descriptors at pipeline construction time. It would be helpful to prevent
pipeline failing by class cast exception in runners. I've create the jira
ticket: https://issues.apache.org/jira/browse/BEAM-9048

On Mon, Dec 30, 2019 at 10:27 AM Luke Cwik  wrote:

>
>
> On Mon, Dec 23, 2019 at 12:20 PM Heejong Lee  wrote:
>
>>
>>
>> On Fri, Dec 20, 2019 at 11:38 AM Luke Cwik  wrote:
>>
>>> What do side inputs look like?
>>>
>>
>> A user needs to first pass PCollections for side inputs into the external
>> transform in addition to ordinary input PCollections and define
>> PCollectionViews inside the external transform something like:
>>
>> PCollectionTuple pTuple =
>> PCollectionTuple.of("main1", main1)
>> .and("main2", main2)
>> .and("side", side)
>> .apply(External.of(...).withMultiOutputs());
>>
>> public static class TestTransform extends PTransform> PCollectionTuple> {
>>   @Override
>>   public PCollectionTuple expand(PCollectionTuple input) {
>> PCollectionView sideView = 
>> input.get("side").apply(View.asSingleton());
>> PCollection main =
>> PCollectionList.of(input.get("main1"))
>> .and(input.get("main2"))
>> .apply(Flatten.pCollections())
>> .apply(
>> ParDo.of(
>> new DoFn() {
>>   @ProcessElement
>>   public void processElement(
>>   @Element String x,
>>   OutputReceiver out,
>>   DoFn.ProcessContext c) {
>> out.output(x + c.sideInput(sideView));
>>   }
>> })
>> .withSideInputs(sideView));
>>
>>
>>
>>> On Thu, Dec 19, 2019 at 4:39 PM Heejong Lee  wrote:
>>>
 I wanted to know if anybody has any comment on external transform API
 for Java SDK.

 `External.of()` can create external transform for Java SDK. Depending
 on input and output types, two additional methods are provided:
 `withMultiOutputs()` which specifies the type of PCollection and
 `withOutputType()` which specifies the type of output element. Some
 examples are:

 PCollection col =
 testPipeline
 .apply(Create.of("1", "2", "3"))
 .apply(External.of(*...*));

 This is okay without additional methods since 1) input and output types
 of external transform can be inferred 2) output PCollection is singular.

>>>
>>> How does the type/coder at runtime get inferred (doesn't java's type
>>> erasure get rid of this information)?
>>>
>>
>>>
 PCollectionTuple pTuple =
 testPipeline
 .apply(Create.of(1, 2, 3, 4, 5, 6))
 .apply(
 External.of(*...*).withMultiOutputs());

 This requires `withMultiOutputs()` since output PCollection is
 PCollectionTuple.

>>>
>>> Shouldn't this require a mapping from "output" name to coder/type
>>> variable to be specified as an argument to withMultiOutputs?
>>>
>>>
 PCollection pCol =
 testPipeline
 .apply(Create.of("1", "2", "2", "3", "3", "3"))
 .apply(
 External.of(...)
 .>withOutputType())
 .apply(
 "toString",
 MapElements.into(TypeDescriptors.strings()).via(   
  x -> String.format("%s->%s", x.getKey(), x.getValue(;

  This requires `withOutputType()` since the output element type cannot
 be inferred from method chaining. I think some users may feel awkward to
 call method only with the type parameter and empty parenthesis. Without
 `withOutputType()`, the type of output element will be java.lang.Object
 which might still be forcefully casted to KV.

>>>
>>> How does the output type get preserved in this case (since Java's type
>>> erasure would remove > after compilation and coder
>>> inference in my opinion should be broken and or choosing something generic
>>> like serializable)?
>>>
>>
>> The expansion service is responsible for using cross-language compatible
>> coders in the returning expanded transforms and these are the coders used
>> in the runtime. Type information annotated by additional methods here is
>> for compile-time type safety of external transforms.
>>
>
> Note that *.>withOutputType()* could be changed to
> *.withOutputType()* and we would get a *PCollection*
> since *withOutputType* doesn't actually do anything at runtime and is
> just to make types align during compilation.
>
> Is there a way to ensure that the output type is actually compatible with
> the coder that was returned after expansion (this would likely require you
> to pass in typing information into *withOutputType*, see
> 

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-02 Thread Valentyn Tymofieiev
+1. Validated Batch and Streaming quickstarts on Python 3.7 (using wheels)
and Batch Mobile Gaming examples (user score, hourly team score) on
Dataflow.

On Thu, Jan 2, 2020 at 11:23 AM Ahmet Altay  wrote:

> This vote needs at least one more PMC vote before it can be finalized.
> Could you please validate and vote?
>
> On Mon, Dec 23, 2019 at 9:44 AM Luke Cwik  wrote:
>
>> +1, I validated the Java quickstarts for the runners and the issues I
>> have brought up have been moved to a future release.
>>
>> On Fri, Dec 20, 2019 at 8:09 PM Ahmet Altay  wrote:
>>
>>> +1, I validated the python2 quick starts using wheels. Thank you for
>>> pushing the release this far.
>>>
>>> On Thu, Dec 19, 2019 at 1:27 PM Kenneth Knowles  wrote:
>>>
 I verified the Java quickstart on Dataflow manually.

 Kenn

 On Wed, Dec 18, 2019 at 5:58 PM jincheng sun 
 wrote:

> Thanks for drive this release Mikhail !
>
> I have found there is an incorrect release version for release notes
> in PR[1], also left a question in PR[2].
>
> But I do not think it's the blocker of the release :)
>
> Best,
> Jincheng
>
> [1] https://github.com/apache/beam/pull/10401
> [2] https://github.com/apache/beam/pull/10402
>
>
> Ahmet Altay  于2019年12月19日周四 上午3:31写道:
>
>> I validated python quickstarts with python 2. Wheels file are missing
>> but they work otherwise. Once the wheel files are added I will add my 
>> vote.
>>
>> On Wed, Dec 18, 2019 at 10:00 AM Luke Cwik  wrote:
>>
>>> I verified the release and ran the quickstarts and found that
>>> release 2.16 broke Apache Nemo runner which is also an issue for 2.17.0 
>>> RC
>>> #2. It is caused by a backwards incompatible change in ParDo.MultiOutput
>>> where getSideInputs return value was changed from List to Map as part of
>>> https://github.com/apache/beam/pull/9275. I filed
>>> https://issues.apache.org/jira/browse/BEAM-8989 to track the issue.
>>>
>>> Should we re-add the method back in 2.17.0 renaming the newly added
>>> method to something else and also patch 2.16.0 with a minor change
>>> including the same fix (breaking 2.16.0 users who picked up the new 
>>> method)
>>> or leave as is?
>>>
>>
>> I suggest not fixing this for 2.17, because the issue already exists
>> in 2.16 and there are two releases in parallel and it would be fine to 
>> fix
>> this for 2.18 or 2.19.
>>
>> +Reuven Lax , who merged the mentioned PR.
>>
>>
>>>
>>> On Tue, Dec 17, 2019 at 12:13 PM Mikhail Gryzykhin <
>>> mig...@google.com> wrote:
>>>
 Hi everyone,


 Please review and vote on the release candidate #2 for the version
 2.17.0, as follows:

 [ ] +1, Approve the release

 [ ] -1, Do not approve the release (please provide specific
 comments)


 The complete staging area is available for your review, which
 includes:

 * JIRA release notes [1],

 * the official Apache source release to be deployed to
 dist.apache.org [2], which is signed with the key with fingerprint
 53F72D4EEEF306D97736FE1065ABB07A8965E788

  [3],

 * all artifacts to be deployed to the Maven Central Repository [4],

 * source code tag "v2.17.0-RC2" [5],

 * website pull request listing the release [6], publishing the API
 reference manual [7], and the blog post [8].

 * Python artifacts are deployed along with the source release to
 the dist.apache.org [2].

 * Validation sheet with a tab for 2.17.0 release to help with
 validation [9].

 * Docker images published to Docker Hub [10].

 The vote will be open for at least 72 hours. It is adopted by
 majority approval, with at least 3 PMC affirmative votes.

 Thanks,

 --Mikhail

 [1]
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345970=12319527

 [2] https://dist.apache.org/repos/dist/dev/beam/2.17.0/

 [3] https://dist.apache.org/repos/dist/release/beam/KEYS

 [4]
 https://repository.apache.org/content/repositories/orgapachebeam-1087/

 [5] https://github.com/apache/beam/tree/v2.17.0-RC2

 [6] https://github.com/apache/beam/pull/10401

 [7] https://github.com/apache/beam-site/pull/594

 [8] https://github.com/apache/beam/pull/10402

 [9]
 https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=803858785

 [10] https://hub.docker.com/u/apachebeam




Re: Jenkins jobs not running for my PR 10438

2020-01-02 Thread Alan Myrvold
Oh, the PR 9903 run is quite old; I don't see a recent one yet.

On Thu, Jan 2, 2020 at 2:48 PM Alan Myrvold  wrote:

> For PR 10427, I see
> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1593/
> For PR 9903, I see
> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR/22/
>
> Maybe the PR status is not being updated when the jobs run?
>
>
> On Thu, Jan 2, 2020 at 2:37 PM Kai Jiang  wrote:
>
>> same for https://github.com/apache/beam/pull/9903 as well
>>
>> On Thu, Jan 2, 2020 at 1:40 PM Chamikara Jayalath 
>> wrote:
>>
>>> Seems like Jenkins tests are not being triggered for this PR as well:
>>> https://github.com/apache/beam/pull/10427
>>>
>>> On Fri, Dec 20, 2019 at 2:16 PM Tomo Suzuki  wrote:
>>>
 Jenkins started working. Thank you for whoever fixed it.

 On Fri, Dec 20, 2019 at 1:42 PM Boyuan Zhang 
 wrote:
 >
 > Same here. Even the phrase trigger doesn't work.
 >
 > On Fri, Dec 20, 2019 at 10:16 AM Luke Cwik  wrote:
 >>
 >> I'm also affected by this.
 >>
 >> On Fri, Dec 20, 2019 at 10:13 AM Tomo Suzuki 
 wrote:
 >>>
 >>> Hi Beam developers,
 >>>
 >>> Does anybody know why my PR does not trigger Jenkins jobs today?
 >>> https://github.com/apache/beam/pull/10438
 >>>
 >>> --
 >>> Regards,
 >>> Tomo



 --
 Regards,
 Tomo

>>>


Re: Jenkins jobs not running for my PR 10438

2020-01-02 Thread Alan Myrvold
For PR 10427, I see
https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1593/
For PR 9903, I see
https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR/22/

Maybe the PR status is not being updated when the jobs run?


On Thu, Jan 2, 2020 at 2:37 PM Kai Jiang  wrote:

> same for https://github.com/apache/beam/pull/9903 as well
>
> On Thu, Jan 2, 2020 at 1:40 PM Chamikara Jayalath 
> wrote:
>
>> Seems like Jenkins tests are not being triggered for this PR as well:
>> https://github.com/apache/beam/pull/10427
>>
>> On Fri, Dec 20, 2019 at 2:16 PM Tomo Suzuki  wrote:
>>
>>> Jenkins started working. Thank you for whoever fixed it.
>>>
>>> On Fri, Dec 20, 2019 at 1:42 PM Boyuan Zhang  wrote:
>>> >
>>> > Same here. Even the phrase trigger doesn't work.
>>> >
>>> > On Fri, Dec 20, 2019 at 10:16 AM Luke Cwik  wrote:
>>> >>
>>> >> I'm also affected by this.
>>> >>
>>> >> On Fri, Dec 20, 2019 at 10:13 AM Tomo Suzuki 
>>> wrote:
>>> >>>
>>> >>> Hi Beam developers,
>>> >>>
>>> >>> Does anybody know why my PR does not trigger Jenkins jobs today?
>>> >>> https://github.com/apache/beam/pull/10438
>>> >>>
>>> >>> --
>>> >>> Regards,
>>> >>> Tomo
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>


Re: Jenkins jobs not running for my PR 10438

2020-01-02 Thread Kai Jiang
same for https://github.com/apache/beam/pull/9903 as well

On Thu, Jan 2, 2020 at 1:40 PM Chamikara Jayalath 
wrote:

> Seems like Jenkins tests are not being triggered for this PR as well:
> https://github.com/apache/beam/pull/10427
>
> On Fri, Dec 20, 2019 at 2:16 PM Tomo Suzuki  wrote:
>
>> Jenkins started working. Thank you for whoever fixed it.
>>
>> On Fri, Dec 20, 2019 at 1:42 PM Boyuan Zhang  wrote:
>> >
>> > Same here. Even the phrase trigger doesn't work.
>> >
>> > On Fri, Dec 20, 2019 at 10:16 AM Luke Cwik  wrote:
>> >>
>> >> I'm also affected by this.
>> >>
>> >> On Fri, Dec 20, 2019 at 10:13 AM Tomo Suzuki 
>> wrote:
>> >>>
>> >>> Hi Beam developers,
>> >>>
>> >>> Does anybody know why my PR does not trigger Jenkins jobs today?
>> >>> https://github.com/apache/beam/pull/10438
>> >>>
>> >>> --
>> >>> Regards,
>> >>> Tomo
>>
>>
>>
>> --
>> Regards,
>> Tomo
>>
>


Re: Jenkins jobs not running for my PR 10438

2020-01-02 Thread Chamikara Jayalath
Seems like Jenkins tests are not being triggered for this PR as well:
https://github.com/apache/beam/pull/10427

On Fri, Dec 20, 2019 at 2:16 PM Tomo Suzuki  wrote:

> Jenkins started working. Thank you for whoever fixed it.
>
> On Fri, Dec 20, 2019 at 1:42 PM Boyuan Zhang  wrote:
> >
> > Same here. Even the phrase trigger doesn't work.
> >
> > On Fri, Dec 20, 2019 at 10:16 AM Luke Cwik  wrote:
> >>
> >> I'm also affected by this.
> >>
> >> On Fri, Dec 20, 2019 at 10:13 AM Tomo Suzuki 
> wrote:
> >>>
> >>> Hi Beam developers,
> >>>
> >>> Does anybody know why my PR does not trigger Jenkins jobs today?
> >>> https://github.com/apache/beam/pull/10438
> >>>
> >>> --
> >>> Regards,
> >>> Tomo
>
>
>
> --
> Regards,
> Tomo
>


Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-02 Thread Ahmet Altay
This vote needs at least one more PMC vote before it can be finalized.
Could you please validate and vote?

On Mon, Dec 23, 2019 at 9:44 AM Luke Cwik  wrote:

> +1, I validated the Java quickstarts for the runners and the issues I have
> brought up have been moved to a future release.
>
> On Fri, Dec 20, 2019 at 8:09 PM Ahmet Altay  wrote:
>
>> +1, I validated the python2 quick starts using wheels. Thank you for
>> pushing the release this far.
>>
>> On Thu, Dec 19, 2019 at 1:27 PM Kenneth Knowles  wrote:
>>
>>> I verified the Java quickstart on Dataflow manually.
>>>
>>> Kenn
>>>
>>> On Wed, Dec 18, 2019 at 5:58 PM jincheng sun 
>>> wrote:
>>>
 Thanks for drive this release Mikhail !

 I have found there is an incorrect release version for release notes in
 PR[1], also left a question in PR[2].

 But I do not think it's the blocker of the release :)

 Best,
 Jincheng

 [1] https://github.com/apache/beam/pull/10401
 [2] https://github.com/apache/beam/pull/10402


 Ahmet Altay  于2019年12月19日周四 上午3:31写道:

> I validated python quickstarts with python 2. Wheels file are missing
> but they work otherwise. Once the wheel files are added I will add my 
> vote.
>
> On Wed, Dec 18, 2019 at 10:00 AM Luke Cwik  wrote:
>
>> I verified the release and ran the quickstarts and found that release
>> 2.16 broke Apache Nemo runner which is also an issue for 2.17.0 RC #2. It
>> is caused by a backwards incompatible change in ParDo.MultiOutput where
>> getSideInputs return value was changed from List to Map as part of
>> https://github.com/apache/beam/pull/9275. I filed
>> https://issues.apache.org/jira/browse/BEAM-8989 to track the issue.
>>
>> Should we re-add the method back in 2.17.0 renaming the newly added
>> method to something else and also patch 2.16.0 with a minor change
>> including the same fix (breaking 2.16.0 users who picked up the new 
>> method)
>> or leave as is?
>>
>
> I suggest not fixing this for 2.17, because the issue already exists
> in 2.16 and there are two releases in parallel and it would be fine to fix
> this for 2.18 or 2.19.
>
> +Reuven Lax , who merged the mentioned PR.
>
>
>>
>> On Tue, Dec 17, 2019 at 12:13 PM Mikhail Gryzykhin 
>> wrote:
>>
>>> Hi everyone,
>>>
>>>
>>> Please review and vote on the release candidate #2 for the version
>>> 2.17.0, as follows:
>>>
>>> [ ] +1, Approve the release
>>>
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> The complete staging area is available for your review, which
>>> includes:
>>>
>>> * JIRA release notes [1],
>>>
>>> * the official Apache source release to be deployed to
>>> dist.apache.org [2], which is signed with the key with fingerprint
>>> 53F72D4EEEF306D97736FE1065ABB07A8965E788
>>>
>>>  [3],
>>>
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>>
>>> * source code tag "v2.17.0-RC2" [5],
>>>
>>> * website pull request listing the release [6], publishing the API
>>> reference manual [7], and the blog post [8].
>>>
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2].
>>>
>>> * Validation sheet with a tab for 2.17.0 release to help with
>>> validation [9].
>>>
>>> * Docker images published to Docker Hub [10].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by
>>> majority approval, with at least 3 PMC affirmative votes.
>>>
>>> Thanks,
>>>
>>> --Mikhail
>>>
>>> [1]
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345970=12319527
>>>
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.17.0/
>>>
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1087/
>>>
>>> [5] https://github.com/apache/beam/tree/v2.17.0-RC2
>>>
>>> [6] https://github.com/apache/beam/pull/10401
>>>
>>> [7] https://github.com/apache/beam-site/pull/594
>>>
>>> [8] https://github.com/apache/beam/pull/10402
>>>
>>> [9]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=803858785
>>>
>>> [10] https://hub.docker.com/u/apachebeam
>>>
>>>


Re: [ANNOUNCE] New committer: Kasia Kucharczyk

2020-01-02 Thread Valentyn Tymofieiev
Congratulations, Kasia!

On Thu, Jan 2, 2020 at 1:23 AM Katarzyna Kucharczyk 
wrote:

> Thank you everyone! I will try do my best as a committer :)
>
> On Thu, Dec 26, 2019 at 7:08 PM Cyrus Maden  wrote:
>
>> Congrats Kasai!
>>
>> On Tue, Dec 24, 2019 at 7:07 PM Thomas Weise  wrote:
>>
>>> Congratulations!
>>>
>>>
>>> On Mon, Dec 23, 2019 at 1:39 PM Udi Meiri  wrote:
>>>
 Congrats Kasia!

 On Mon, Dec 23, 2019 at 1:23 PM Kyle Weaver 
 wrote:

> Congrats Kasia! And thanks for sharing, Pablo.
>
> On Mon, Dec 23, 2019 at 4:16 PM Pablo Estrada 
> wrote:
>
>> Hi everyone,
>>
>> Please join me and the rest of the Beam PMC in welcoming a new
>> committer: Kasia Kucharczyk
>>
>> Kasia has contributed to Beam in many ways, including the performance
>> testing infrastructure, and has even spoken at events about Beam.
>>
>> In consideration of Kasia's contributions, the Beam PMC trusts her
>> with the responsibilities of a Beam committer[1].
>>
>> Thanks for your contributions Kasia!
>>
>> Pablo, on behalf of the Apache Beam PMC.
>>
>> [1] https://beam.apache.org/contribute/become-a-committer
>> /#an-apache-beam-committer
>>
>


Re: Python versions for Beam development

2020-01-02 Thread Luke Cwik
I believe it is Python 2.7, 3.5, 3.6, 3.7 as of right now.

On Thu, Jan 2, 2020 at 8:33 AM Elliotte Rusty Harold 
wrote:

> Apropos of https://github.com/apache/beam/pull/10366 which version or
> versions of Python is required to successfully compile and run the
> Python parts of Beam? i.e. ./gradlew check?
>
> From looking at a few recent PRs, it seems at least Python 3.7 I
> required but I'm not a Python dev so I'm sure I'm missing things.
>
> --
> Elliotte Rusty Harold
> elh...@ibiblio.org
>


Re: Performance drops in Python PortableRunner tests

2020-01-02 Thread Kamil Wasilewski
Robert, you can find the pipeline of this particular test here:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_tests/pardo_test.py
.

The documentation for running this kind of tests, including how to set up a
Flink cluster, is on CWIKI:
https://cwiki.apache.org/confluence/display/BEAM/Contribution+Testing+Guide#ContributionTestingGuide-TestsofCoreApacheBeamOperations
.
Hope this helps.


On Fri, Dec 20, 2019 at 7:10 PM Pablo Estrada  wrote:

> The jenkins jobs for the Flink load tests:
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_LoadTests_ParDo_Flink_Python.groovy
>
> The documentation for the test contains how to run it on each runner:
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_tests/pardo_test.py#L17
>
> I assume that standing up the Flink cluster should be done separately.
>
> LMK if that helps Robert.
> -P.
>
> On Fri, Dec 20, 2019 at 9:59 AM Robert Bradshaw 
> wrote:
>
>> Yes, it is possible that this had an influence--Reads are now all
>> implemented as SDFs and Creates involve a reshuffle to better
>> redistribute data. This much of a change is quite surprising. Where is
>> the pipeline for, say, "Python | ParDo | 2GB, 100 byte records, 10
>> iterations | Batch" and how does one run it?
>>
>> On Fri, Dec 20, 2019 at 6:50 AM Kamil Wasilewski
>>  wrote:
>> >
>> > Hi all,
>> >
>> > We have a couple of Python load tests running on Flink in which we are
>> testing the performance of ParDo, GroupByKey, CoGroupByKey and Combine
>> operations.
>> >
>> > Recently, I've discovered that the runtime of all those tests rose up
>> significantly. It happened between the 6th and 7th of December (the tests
>> are running daily). Here are the dashboards where you can see the results:
>> >
>> >
>> https://apache-beam-testing.appspot.com/explore?dashboard=5649695233802240
>> >
>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792
>> >
>> https://apache-beam-testing.appspot.com/explore?dashboard=5698549949923328
>> >
>> https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536
>> >
>> > I've seen in that period we submitted some changes to the core,
>> including Read transform. Do you think this might have influenced the
>> results?
>> >
>> > Thanks,
>> > Kamil
>>
>


Re: [ANNOUNCE] New committer: Kasia Kucharczyk

2020-01-02 Thread Katarzyna Kucharczyk
Thank you everyone! I will try do my best as a committer :)

On Thu, Dec 26, 2019 at 7:08 PM Cyrus Maden  wrote:

> Congrats Kasai!
>
> On Tue, Dec 24, 2019 at 7:07 PM Thomas Weise  wrote:
>
>> Congratulations!
>>
>>
>> On Mon, Dec 23, 2019 at 1:39 PM Udi Meiri  wrote:
>>
>>> Congrats Kasia!
>>>
>>> On Mon, Dec 23, 2019 at 1:23 PM Kyle Weaver  wrote:
>>>
 Congrats Kasia! And thanks for sharing, Pablo.

 On Mon, Dec 23, 2019 at 4:16 PM Pablo Estrada 
 wrote:

> Hi everyone,
>
> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Kasia Kucharczyk
>
> Kasia has contributed to Beam in many ways, including the performance
> testing infrastructure, and has even spoken at events about Beam.
>
> In consideration of Kasia's contributions, the Beam PMC trusts her
> with the responsibilities of a Beam committer[1].
>
> Thanks for your contributions Kasia!
>
> Pablo, on behalf of the Apache Beam PMC.
>
> [1] https://beam.apache.org/contribute/become-a-committer
> /#an-apache-beam-committer
>