Try Beam Katas Today

2020-05-12 Thread Damon Douglas
Hello Everyone, If you don't already know, there are helpful instructional tools for learning the Apache Beam SDKs called Beam Katas hosted on https://stepik.org. Similar to traditional Kata , they are meant to be repeated as practice. Before practicing the

Re: GSoD with Apache Beam

2020-05-12 Thread Aizhamal Nurmamat kyzy
Hi Sylvia, As Ahmet mentioned, you may start by exploring the 2 project ideas that we shared [1], and see which one you like better. On the same page, you will find links to the old/related documentation that we are looking forward to update/improve with tech writers' help. We also specified the

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-12 Thread Ahmet Altay
- I reviewed the diff output with Nam's explanations. The change looks minimal. Large diffs are primarily coming from index and redirect files. codeblocks have differences but the content is seemingly preserved. IIUC, the source of truth is snippet files anyway. (It would be good to get one more

Google Season of Docs

2020-05-12 Thread Harshit Dixit
Hi Aizhamal, I am looking to apply to Apache Beam for the Season of Docs. The project really looks innovative to me and I would like to be a part of it. I hope we can connect soon to further discuss the same.

Re: Python Precommit significantly flaky

2020-05-12 Thread Brian Hulette
I think the previously mentioned issues are resolved now, but the python precommit still has some flakes. I filed: https://issues.apache.org/jira/browse/BEAM-9975 - PortableRunnerTest flake "ParseError: Unexpected type for Value message." An example of this is

[Season of Docs] Acceptance into Season of Docs 2020

2020-05-12 Thread Season of Docs
Congratulations! We’ve received your alternative administrator/mentor registration(s) and your organization’s application for Season of Docs has been accepted. The list of accepted organizations is on the Season of Docs website. We

Re: [DISCUSS] finishBundle once per window

2020-05-12 Thread Reuven Lax
It sounds like consensus is to do this in finishBundle, but require specifying output timestamps. I'll alter the PR appropriately. Reuven On Mon, May 11, 2020 at 5:06 PM Robert Bradshaw wrote: > StartBundle pre-dated setUp, which makes it less useful than before. With > DoFn re-use, however,

Contributor permission for Beam Jira tickets

2020-05-12 Thread Omar Ismail
Hi Folks, My name is Omar, and I work at Google in a Support role. I am trying to get my hands dirty with some dev work, and would like to start contributing to the Beam SDK. I have opened a recent issue tracker, BEAM-9964, and would like it to work on it. Can someone add me as a contributor

Re: [DISCUSS] Dealing with @Ignored tests

2020-05-12 Thread Mikhail Gryzykhin
I wonder if we can add graph to community metrics showing ignored tests by language/project/overall. That can be useful to see focus area. On Tue, May 12, 2020 at 12:28 PM Jan Lukavský wrote: > +1, visualizing the number of ignored tests in a graph seems useful. Even > better with some slices

Contributor Permission for Beam Jira Tickets

2020-05-12 Thread Omar Ismail
Hi Folks, My name is Omar, and I work at Google in a support role. I want to get my hands dirty doing some dev work, and would love to contribute to the Beam SDK! I opened my first JIRA today, BEAM-9964, and would like it assigned to me. Can someone please add me as a contributor for Beam's

Re: [DISCUSS] Dealing with @Ignored tests

2020-05-12 Thread Jan Lukavský
+1, visualizing the number of ignored tests in a graph seems useful. Even better with some slices (e.g. per runner, module, ...). On 5/12/20 8:02 PM, Ahmet Altay wrote: +1 to generate a report instead of removing these tests. A report like this could help us with prioritization. It is easier

Re: GSoD with Apache Beam

2020-05-12 Thread Ahmet Altay
Welcome. There was this list of projects ( https://cwiki.apache.org/confluence/display/BEAM/Google+Season+of+Docs) if that helps. +Aizhamal Nurmamat kyzy could probably point you in the right direction. Ahmet On Tue, May 12, 2020 at 1:30 AM Sylvia Mittal wrote: > Hey, > > I am Sylvia, a

Re: [DISCUSS] Dealing with @Ignored tests

2020-05-12 Thread Ahmet Altay
+1 to generate a report instead of removing these tests. A report like this could help us with prioritization. It is easier to address issues when we can quantify how much of a problem it is. I am curious what we can do to incentivize reducing the number of flaky/ignored tests? A report itself

Re: Running NexMark Tests

2020-05-12 Thread Maximilian Michels
A heads-up if anybody else sees this, we have removed the flag: https://jira.apache.org/jira/browse/BEAM-9900 Further contributions are very welcome :) -Max On 11.05.20 17:05, Sruthi Sree Kumar wrote: > I have opened a PR with the documentation change. >

Re: Parallelism in Combine.GroupedValues

2020-05-12 Thread Luke Cwik
There isn't just one accumulator. There are multiple accumulators that are used to support a parallel combine. Feel free to open up a PR to help improve the javadoc. Yes, these combines are converted to a lifted combine when the runner is able to do so. On Tue, May 12, 2020 at 9:14 AM rahul

Re: Proposal to join Apache Beam as a Technical Writer for Season of Docs

2020-05-12 Thread Sarthak Khandelwal
PS : I take part in the Project : Deployment of a Flink and Spark Clusters with Portable Beam On Tue, 12 May, 2020, 11:30 AM Sarthak Khandelwal, < sarthakkhandelwal032...@gmail.com> wrote: > Hello there, > I am Sarthak khandelwal currently pursuing Btech from Medicaps University, > India. I

Proposal to join Apache Beam as a Technical Writer for Season of Docs

2020-05-12 Thread Sarthak Khandelwal
Hello there, I am Sarthak khandelwal currently pursuing Btech from Medicaps University, India. I wanna join Apache Beam as a Technical Writer for Google Season of Docs. Joining Beam because being an ML practitioner I know the difference between batch and streaming data and also have a quite

Re: Parallelism in Combine.GroupedValues

2020-05-12 Thread rahul patwari
Hi Luke, I should have been more clear with my question. Sorry, my bad. I wanted to ask: How can combine happen parallelly by using only *one accumulator instance*? It has been explicitly specified in CombineFn.apply()[4] that mergeAccumulators() will not be called. A single accumulator

Re: Runner dependent sharding for dynamic destinations in FileIO

2020-05-12 Thread Luke Cwik
How/why are you trying to make it runner dependent? On Fri, May 8, 2020 at 2:20 PM amit kumar wrote: > Hi Everyone, > > We use FileIO's writeDynamic to write dynamically to separate groups > based on an attribute's value in the input PCollection. > I wanted to check if there is a way to make

Re: Support for AWS SDK v2 and enhanced fanout in KinesisIO

2020-05-12 Thread Luke Cwik
You'll want to reach out to the users@ thread for getting feedback on deprecation. On Mon, May 11, 2020 at 9:53 AM Alexey Romanenko wrote: > > Thanks Ismaël for recalling of this thread, I think we should start to > take some efforts to deprecate the AWS SDK V1 IOs that are already >

Re: TextIO. Writing late files

2020-05-12 Thread Jose Manuel
Hi, I would like to clarify that while TextIO is writing every data are in the files (shards). The losing happens when file names emitted by getPerDestinationOutputFilenames are processed by a window. I have created a pipeline to reproduce the scenario in which some filenames are loss after the

Re: Parallelism in Combine.GroupedValues

2020-05-12 Thread Luke Cwik
There is more than one instance of an accumulator being used and then those accumulators are merged using mergeAccumulators method. Two examples of when combining happens in parallel is when the withFewKeys hint is used on the combiner or when there is partial combining[1] happening on the mapper

Parallelism in Combine.GroupedValues

2020-05-12 Thread rahul patwari
Hi, In the Javadoc for Combine.GroupedValues[1], it has been described that *combining the values associated with a single key can happen in parallel*. The logic to combine values associated with a key can be provided by CombineFnWithContext (or) CombineFn. Both CombineFnWithContext.apply()[2]

GSoD with Apache Beam

2020-05-12 Thread Sylvia Mittal
Hey, I am Sylvia, a final year Computer Science student at the Indian Institute Of Technology, Mandi. I am interested in applying for GSoD with Apache Beam. I found the projects very interesting and useful. I would be very happy to contribute to the organization with the technical documentation.

Re: Jenkins jobs not running for my PR 10438

2020-05-12 Thread Ismaël Mejía
done On Tue, May 12, 2020 at 4:04 AM rahul patwari wrote: > > Hi, > > Can you please trigger pre-commit checks for > https://github.com/apache/beam/pull/11581 > > Thanks, > Rahul > > On Tue, May 12, 2020 at 7:12 AM Ahmet Altay wrote: >> >> Done for both Yoshiki and Tomo's PRs. >> >> On Mon,