Re: Blocked: Precommit failing pull_licenses_java.py in python PR

2020-06-29 Thread Alex Amato
This is failing in two of my PRs now. It looks like this tool already has dealt with these errors by adding retries. But it only retries 3 times per URL. Here is a PR which changes it from 3 to 9 retries. https://github.com/apache/beam/pull/12130 Would it be possible to merge this? Hopefully this

Re: Contributor permission for Beam Jira tickets

2020-06-29 Thread Kenneth Knowles
Welcome! I've added you to the role and assigned the ticket to you. On Mon, Jun 29, 2020 at 9:00 PM Almeida, Julius wrote: > Hi, > > > > This is Julius from Intuit. > > I would like to update AwsModule in beam to add assume role functionality. > > Can someone add me as a contributor for Beam's

Contributor permission for Beam Jira tickets

2020-06-29 Thread Almeida, Julius
Hi, This is Julius from Intuit. I would like to update AwsModule in beam to add assume role functionality. Can someone add me as a contributor for Beam's Jira issue tracker? I would like to contribute to this Jira : https://issues.apache.org/jira/browse/BEAM-10335 Jira username : jalmeida

Re: Unsure why Warning shows up for unmodified file on Java PR

2020-06-29 Thread Yichi Zhang
It looks like quite a few java precommit runs timed out on :sdks:java:io:rabbitmq:test On Mon, Jun 29, 2020 at 5:57 PM Alex Amato wrote: > PR: https://github.com/apache/beam/pull/12083 > > Java ("Run Java PreCommit") is failing - >

Unsure why Warning shows up for unmodified file on Java PR

2020-06-29 Thread Alex Amato
PR: https://github.com/apache/beam/pull/12083 Java ("Run Java PreCommit") is failing - https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/12011/ When I dug into the console log and found the error details: https://screenshot.googleplex.com/EJkhH9Bq8en *15:14:55*

Re: Blocked: Precommit failing pull_licenses_java.py in python PR

2020-06-29 Thread Alex Amato
My mistake looks like this is the failure: https://issues.apache.org/jira/browse/BEAM-10381 I'll keep running it locally to see if it will pass, to see if the flake theory makes sense On Mon, Jun 29, 2020 at 5:38 PM Ahmet Altay wrote: > It might be a flake? I restarted the "Run

Re: Blocked: Precommit failing pull_licenses_java.py in python PR

2020-06-29 Thread Ahmet Altay
It might be a flake? I restarted the "Run Python2_PVR_Flink PreCommit" test. Is the JIRA link correct, it does not look directly related. On Mon, Jun 29, 2020 at 5:34 PM Alex Amato wrote: > I thought this was a bit odd as this PR doesn't change java code or deps. > > Details in JIRA: > >

Blocked: Precommit failing pull_licenses_java.py in python PR

2020-06-29 Thread Alex Amato
I thought this was a bit odd as this PR doesn't change java code or deps. Details in JIRA: https://issues.apache.org/jira/projects/BEAM/issues/BEAM-10308?filter=allopenissues 404s trying to download this file:

Re: Individual Parallelism support for Flink Runner

2020-06-29 Thread Akshay Iyangar
Hi As a use case we have records being fetched from Kinesis as well as S3 (Bounded) source as an unified pipeline which eventually is flattened into a single projection/output for processing the data. But we usually end up not needing a lot of task slots / parallelism for processing data

[Proposal] Supporting Python Annotation Typehints on PTransform

2020-06-29 Thread Saavan Nanavati
Hi all! Currently, in the Python SDK, we don't support annotation-style type hints for PTransforms. This email includes a proposal to support PEP 484 annotations on PTransform's expand() function, and would allow you to write something like the following: class MapStrToInt(beam.PTransform): def

Re: Composable DoFn IOs Connection Reuse

2020-06-29 Thread Siyuan Chen
-- Best regards, Siyuan On Mon, Jun 29, 2020 at 1:06 PM Kenneth Knowles wrote: > Great doc. > > The extended API makes sense. I like how it removes a knob. The question > that I have to ask is whether there is a core model change here or can we > avoid it. Defining "shard" as a scope of state

Re: [ANNOUNCE] New committer: Aizhamal Nurmamat kyzy

2020-06-29 Thread Austin Bennett
Congratulations, @Aizhamal Nurmamat kyzy ! On Mon, Jun 29, 2020 at 2:32 PM Valentyn Tymofieiev wrote: > Congratulations and big thank you for all the hard work on Beam, Aizhamal! > > On Mon, Jun 29, 2020 at 9:56 AM Kenneth Knowles wrote: > >> Please join me and the rest of the Beam PMC in

Re: Individual Parallelism support for Flink Runner

2020-06-29 Thread amit kumar
Looks like https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html#operator-level Regards, Amit On Mon, Jun 29, 2020 at 12:59 PM Kenneth Knowles wrote: > This exact issue has been discussed before, though I can't find the older > threads. Basically, specifying parallelism is

Re: [DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-06-29 Thread Brian Hulette
Sorry for jumping into this late and casting a vote against the consensus... but I think I'd prefer standardizing on a pattern like PCollection rather than PCollection. That approach clearly separates the parameters that are allowed to vary across a ReadAll (the ones defined in

Re: [ANNOUNCE] New committer: Aizhamal Nurmamat kyzy

2020-06-29 Thread Valentyn Tymofieiev
Congratulations and big thank you for all the hard work on Beam, Aizhamal! On Mon, Jun 29, 2020 at 9:56 AM Kenneth Knowles wrote: > Please join me and the rest of the Beam PMC in welcoming a new committer: > Aizhamal Nurmamat kyzy > > Over the last 15 months or so, Aizhamal has driven many

Re: [DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-06-29 Thread Kenneth Knowles
Eugene, let me see if I understand correctly. Your initial email suggested we consider adopting the "dynamic write" style for ReadAll transforms. Supposing Read had three fields x, y, and z, and you have a PCollection with all the info you need. (Ismaël's) "Read" way: Use a PTransform to

Re: Composable DoFn IOs Connection Reuse

2020-06-29 Thread Kenneth Knowles
Great doc. The extended API makes sense. I like how it removes a knob. The question that I have to ask is whether there is a core model change here or can we avoid it. Defining "shard" as a scope of state within which execution is observably serial, today the model has key+window sharding always.

Re: Individual Parallelism support for Flink Runner

2020-06-29 Thread Kenneth Knowles
This exact issue has been discussed before, though I can't find the older threads. Basically, specifying parallelism is a workaround (aka a cost), not a feature (aka a benefit). Sometimes you have to pay that cost as it is the only solution currently understood or implemented. It depends on what

Re: [ANNOUNCE] New committer: Aizhamal Nurmamat kyzy

2020-06-29 Thread Gris Cuevas
Congrats and Welcome Aizhamal!!! Well deserved and very thankful for all you have done for the Beam community :) On 2020/06/29 16:56:04, Kenneth Knowles wrote: > Please join me and the rest of the Beam PMC in welcoming a new committer: > Aizhamal Nurmamat kyzy > > Over the last 15 months

Re: [ANNOUNCE] New committer: Aizhamal Nurmamat kyzy

2020-06-29 Thread Ahmet Altay
Congratulations Aizhamal. This is a great list of accomplishments. I am particularly very proud how Beam webinars during pandemic brought the community even closer. On Mon, Jun 29, 2020 at 10:14 AM Kyle Weaver wrote: > Thanks for all your contributions Aizhamal :) > > On Mon, Jun 29, 2020 at

Re: [ANNOUNCE] New committer: Aizhamal Nurmamat kyzy

2020-06-29 Thread Kyle Weaver
Thanks for all your contributions Aizhamal :) On Mon, Jun 29, 2020 at 9:56 AM Kenneth Knowles wrote: > Please join me and the rest of the Beam PMC in welcoming a new committer: > Aizhamal Nurmamat kyzy > > Over the last 15 months or so, Aizhamal has driven many efforts in the > Beam community

[ANNOUNCE] New committer: Aizhamal Nurmamat kyzy

2020-06-29 Thread Kenneth Knowles
Please join me and the rest of the Beam PMC in welcoming a new committer: Aizhamal Nurmamat kyzy Over the last 15 months or so, Aizhamal has driven many efforts in the Beam community and contributed to others. Aizhamal started by helping with the Beam newsletter [1] then continued by contributing

Re: Composable DoFn IOs Connection Reuse

2020-06-29 Thread Luke Cwik
On Fri, Jun 26, 2020 at 3:45 PM Tyson Hamilton wrote: > Nice doc by the way, it's concise. Thanks for sharing and I'm excited to > see this feature, particularly the PCollection variant that would have > been useful for the Cloud AI transforms recently introduced. > > On Fri, Jun 26, 2020 at

Re: [DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-06-29 Thread Eugene Kirpichov
I'd like to raise one more time the question of consistency between dynamic reads and dynamic writes, per my email at the beginning of the thread. If the community prefers ReadAll to read from Read, then should dynamicWrite's write to Write? On Mon, Jun 29, 2020 at 8:57 AM Boyuan Zhang wrote: >

Re: Canceling Jenkins builds when the update to PR makes prior build irrelevant

2020-06-29 Thread Tobiasz Kędzierski
Hi Agree with Ahmet, that even in that shape it should improve the queue length. Both _Commit/_Phrase cross-cancelling and "cancell all" phrase seem require much effort and I doubt it's worthy to do it. I will turn on `Cancel build on update` in ghprb-plugin on ci-beam.apache.org tomorrow

Re: Individual Parallelism support for Flink Runner

2020-06-29 Thread Luke Cwik
Check out this thread[1] about adding "runner determined sharding" as a general concept. This could be used to enhance the reshuffle implementation significantly and might remove the need for per transform parallelism from that specific use case and likely from most others. 1:

Re: [DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-06-29 Thread Boyuan Zhang
It seems like most of us agree on the idea that ReadAll should read from Read. I'm going to update the Kafka ReadAll with the same pattern. Thanks for all your help! On Fri, Jun 26, 2020 at 12:12 PM Chamikara Jayalath wrote: > > > On Fri, Jun 26, 2020 at 11:49 AM Luke Cwik wrote: > >> I would

Beam Dependency Check Report (2020-06-29)

2020-06-29 Thread Apache Jenkins Server
High Priority Dependency Updates Of Beam Python SDK: Dependency Name Current Version Latest Version Release Date Of the Current Used Version Release Date Of The Latest Release JIRA Issue google-cloud-datastore 1.7.4 1.12.0

Re: Individual Parallelism support for Flink Runner

2020-06-29 Thread Maximilian Michels
We could allow parameterizing transforms by using transform identifiers from the pipeline, e.g. options = ['--parameterize=MyTransform;parallelism=5'] with Pipeline.create(PipelineOptions(options)) as p: p | Create(1, 2, 3) | 'MyTransform' >> ParDo(..) Those hints should always be