Re: [ANNOUNCE] Apache Beam 2.13.0 released!

2019-06-07 Thread Chad Dombrova
I saw this and was particularly excited about the new support for "external" transforms in portable runners like python (i.e. the ability to use the Java KafkaIO transforms, with presumably more to come in the future). While the release notes are useful, I will say that it takes a lot of time and

Re: [ANNOUNCEMENT] Common Pipeline Patterns - new section in the documentation + contributions welcome

2019-06-07 Thread Sergei Sokolenko
and now the news in on the twitterwebs https://twitter.com/datancoffee/status/1137160729386074113 On Fri, Jun 7, 2019 at 5:52 PM Reza Rokni wrote: > +1 on the pattern Tim! > > Please raise a Jira with the label pipeline-patterns, details are here: > >

Re: [ANNOUNCEMENT] Common Pipeline Patterns - new section in the documentation + contributions welcome

2019-06-07 Thread Reza Rokni
+1 on the pattern Tim! Please raise a Jira with the label pipeline-patterns, details are here: https://beam.apache.org/documentation/patterns/overview/#contributing-a-pattern On Sat, 8 Jun 2019 at 05:04, Tim Robertson wrote: > This is great. Thanks Pablo and all > > I've seen several folk

Re: [ANNOUNCE] Apache Beam 2.13.0 released!

2019-06-07 Thread Kyle Weaver
Awesome! Thanks for leading the release Ankur. On Fri, Jun 7, 2019 at 2:57 PM Ankur Goenka wrote: > The Apache Beam team is pleased to announce the release of version 2.13.0! > > Apache Beam is an open source unified programming model to define and > execute data processing pipelines, including

Re: [ANNOUNCEMENT] Common Pipeline Patterns - new section in the documentation + contributions welcome

2019-06-07 Thread Tim Robertson
This is great. Thanks Pablo and all I've seen several folk struggle with writing avro to dynamic locations which I think might be a good addition. If you agree I'll offer a PR unless someone gets there first - I have an example here:

[ANNOUNCEMENT] Common Pipeline Patterns - new section in the documentation + contributions welcome

2019-06-07 Thread Pablo Estrada
Hello everyone, A group of community members has been working on gathering and providing common pipeline patterns for pipelines in Beam. These are examples on how to perform certain operations, and useful ways of using Beam in your pipelines. Some of them relate to processing of files, use of side

Re: Design Proposal for Cost Estimation

2019-06-07 Thread Kenneth Knowles
Thanks for the doc. This is really clear and readable. It all looks like a good improvement, whatever the result of the various open threads. And nice bonus that you've pointed to more good reading material. Kenn On Fri, Jun 7, 2019 at 12:25 PM Alireza Samadian wrote: > Thank you so much. > >

Re: Help triaging Jira issues

2019-06-07 Thread Kenneth Knowles
Nice. I noticed the huge drop in untriaged issues. Both of those ideas for automation sound reasonable. I think the other things that are harder to optimize can probably be addressed by re-triaging stale bugs. We will probably find those that should have been closed and those that are just

Re: Design Proposal for Cost Estimation

2019-06-07 Thread Alireza Samadian
Thank you so much. Best, Alireza On Fri, Jun 7, 2019 at 11:48 AM Pablo Estrada wrote: > I've added you as a contributor! : ) > > On Fri, Jun 7, 2019 at 11:20 AM Alireza Samadian > wrote: > >> Hi, >> >> I am going to create Issues in Jira and start implementing row estimation >> of each source

Re: Design Proposal for Cost Estimation

2019-06-07 Thread Pablo Estrada
I've added you as a contributor! : ) On Fri, Jun 7, 2019 at 11:20 AM Alireza Samadian wrote: > Hi, > > I am going to create Issues in Jira and start implementing row estimation > of each source separately. I will appreciate if someone gives me the > permission to assign Jira Issues to myself.

Re: Plan for dropping python 2 support

2019-06-07 Thread Ahmet Altay
I agree with you. A more recent LTS release with python 2 support will be good. Cost of maintaining python 2 support is also fairly low (maybe zero actually besides keeping some pre-existing compatibility code). I believe we are referring to two separate things with support: - Supporting existing

Re: Design Proposal for Cost Estimation

2019-06-07 Thread Alireza Samadian
Hi, I am going to create Issues in Jira and start implementing row estimation of each source separately. I will appreciate if someone gives me the permission to assign Jira Issues to myself. My Jira id is riazela. Best, Alireza On Fri, May 31, 2019 at 3:54 PM Alireza Samadian wrote: > Dear

Re: [DISCUSS] Portability representation of schemas

2019-06-07 Thread Anton Kedin
The topic of schema registries probably does not block the design and implementation of logical types and portable schemas by themselves, however I think we should spend some time discussing it (probably in a separate thread) so that all SDKs have similar mechanisms for schema registration and

Re: Testing code in extensions against runner

2019-06-07 Thread Lukasz Cwik
We have been currently been having every runner define and manage its own suite/tests so yes modifying flink_runner.gradle is currently the correct thing to do. There is a larger discussion about whether this is the right way since we would like to capture things like perf benchmarks and

Re: [Discuss] Ideas for Apache Beam presence in social media

2019-06-07 Thread Thomas Weise
Here is an idea how this could be done: Create a JIRA ticket that will always remain open. Have folks append their suggested tweets as comments. Interested PMC members can watch that ticket. Thomas On Thu, Jun 6, 2019 at 10:41 AM Thomas Weise wrote: > Pinging individual PMC members doesn't

Re: Removing shading by default within BeamModulePlugin.groovy

2019-06-07 Thread Lukasz Cwik
I also noticed that the build takes significantly less time on my machine, several mins saved. On Fri, Jun 7, 2019 at 9:54 AM Lukasz Cwik wrote: > Guava was the only thing that we shaded everywhere but the original intent > was for us to shade more and more by default until we decided to do >

Re: Removing shading by default within BeamModulePlugin.groovy

2019-06-07 Thread Lukasz Cwik
Guava was the only thing that we shaded everywhere but the original intent was for us to shade more and more by default until we decided to do vendoring (which is a better solution). So yes, this really only removed shading of Guava, we still have shading in all these other places: model/*

Re: I'm thinking about new features, what do you think?

2019-06-07 Thread Lukasz Cwik
Even though we don't support iteration, one could have a known upperbound and "unroll" the loop to a fixed number of iterations statically before the pipeline is run but I agree with Eugene on his other points. On Fri, Jun 7, 2019 at 3:59 AM Robert Burke wrote: > I'm not sure I understand

Re: [DISCUSS] Portability representation of schemas

2019-06-07 Thread Robert Burke
Wouldn't SDK specific types always be under the "coders" component instead of the logical type listing? Offhand, having a separate normalized listing of logical schema types in the pipeline components message of the types seems about right. Then they're unambiguous, but can also either refer to

Re: I'm thinking about new features, what do you think?

2019-06-07 Thread Robert Burke
I'm not sure I understand the desired properties of GroupByMultiKey. Offhand, am I right interpreting GroupByMultiKey as essentially forming a graph of the keys based on the MultiKeys nodes, and the number of resulting iterables is based on the components of the graph. If that's the case then,

Re: I'm thinking about new features, what do you think?

2019-06-07 Thread Eugene Kirpichov
It looks like you want to take a PCollection of lists of items of the same type (but not necessarily of the same length - in your example you pad them to the same length but that's unnecessary), induce an undirected graph on them where there's an edge between XS and YS if they have an element in

Re: I'm thinking about new features, what do you think?

2019-06-07 Thread Jan Lukavský
Hi, that sounds interesting, but it seems to be computationally intensive and might not be well scalable, if I understand it correctly. It looks like it needs a transitive closure, am I right?  Jan On 6/7/19 11:17 AM, i.am.moai wrote: Hello everyone, nice to meet you I am Naoki Hyu(日宇尚記).

Re: [DISCUSS] Cookbooks for users with knowledge in other frameworks

2019-06-07 Thread Maximilian Michels
Sounds like a good idea. I think the same can be done for Flink; Flink's and Spark's APIs are similar to a large degree. Here also a link to the transforms: https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/operators/ -Max On 04.06.19 03:20, Ahmet Altay wrote: Thank you

Re: [PROPOSAL] Prepare for LTS bugfix release 2.7.1

2019-06-07 Thread Maximilian Michels
Created an up-to-date version of the Flink backports for 2.7.1: https://github.com/apache/beam/pull/8787 Some of the Gradle task names have changed which makes testing via Jenkins hard. Will have to run them manually before merging. -Max On 06.06.19 17:41, Kenneth Knowles wrote: Hi all,

I'm thinking about new features, what do you think?

2019-06-07 Thread i.am.moai
Hello everyone, nice to meet you I am Naoki Hyu(日宇尚記). a developer live in Tokyo. I often use scala and python as my favorite language . I have no experience with OSS development, but as I use DataFlow at work, I want to contribute to the development of Beam. In fact, there is a feature I want

Re: Plan for dropping python 2 support

2019-06-07 Thread Robert Bradshaw
I don't think the second release with robust/recommended Python 3 support should be the last release with Python 2 support--that is simply not enough time for people to migrate. (Look at how long it took us...) It does make a lot of sense to at least have one LTS release with support for both.

Re: Removing shading by default within BeamModulePlugin.groovy

2019-06-07 Thread Ismaël Mejía
This is fantastic. Took a look at the PR and did not see anything that jump to my eyes and also validated with two external projects with today’s snapshots (after merge) without issues so far. Great that we finally tackle this on, thanks Luke! Have one minor comment because the title of the

Re: Help triaging Jira issues

2019-06-07 Thread Ismaël Mejía
I took a look and reduced the untriaged issues to around 100. I noticed however some patterns that are producing more untriaged issues that we should have. Those can be probably automated (if JIRA has ways to do it): 1. Issues created and assigned on creation can be marked as open. 2. Once an

Re: @RequireTimeSortedInput design draft

2019-06-07 Thread Jan Lukavský
Hi Reza, interesting suggestions, thanks. When you mentioned join, I recalled an older issue (which apparently was not yet transfered to Beam's JIRA)  [1]. Is this anyhow related to what you are implementing? Would you like to make your implementation accessible via Euphoria DSL [2]?  Jan