Re: [DISCUSS] Cookbooks for users with knowledge in other frameworks

2019-05-29 Thread Reza Rokni
+1 I think there will be at least two layers of this; Layer 1 - Using primitives : I do join, GBK, Aggregation... with system x this way, what is the canonical equivalent in Beam. Layer 2 - Patterns : I read and join Unbounded and Bounded Data in system x this way, what is the canonical

**Request to add me as a contributor.**

2019-05-29 Thread Akshay Iyangar
Hello everyone, My name is Akshay Iyangar, using beam repo extensively. There is a small patch that I would like to push through upstream. https://issues.apache.org/jira/browse/BEAM-7442 . I’m working on this issue and hope to become a contributor for Beam's JIRA issue tracker so that I can

[DISCUSS] Cookbooks for users with knowledge in other frameworks

2019-05-29 Thread Ahmet Altay
Hi all, Inspired by the user asking about a Spark feature in Beam [1] in the release thread, I searched the user@ list and noticed a few instances of people asking for question like "I can do X in Spark, how can I do that in Beam?" Would it make sense to add documentation to explain how certain

Re: [DISCUSS] Autoformat python code with Black

2019-05-29 Thread Kenneth Knowles
+1 pending good enough tooling (I can't quite tell - seems there are some issues?) On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk < ka.kucharc...@gmail.com> wrote: > What else actually we gain? My guess is faster PR review iteration. We > will skip some of conversations about code style. >

Re: Timer support in Flink

2019-05-29 Thread Maximilian Michels
Hi Reza, The detailed view of the capability matrix states: "The Flink Runner supports timers in non-merging windows." That is still the case. Other than that, timers should be working fine. It makes very heavy use of Event.Time timers and has to do some manual DoFn cache work to get

Question about building Go SDK

2019-05-29 Thread Kengo Seki
Hi Beam developers, I tried to build Go SDK on the master branch, but encountered the following error. ``` $ ./gradlew :sdks:go:resolveBuildDependencies (snip) FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':sdks:go:resolveBuildDependencies'. >

Re: [DISCUSS] Autoformat python code with Black

2019-05-29 Thread Maximilian Michels
I think the question is if it can be configured in a way to fit our current linter's style. I don't think it is feasible to reformat the entire Python SDK. Reformatted lines don't allow quick access to the Git history. This effect is still visible in the Java SDK. However, I have the feeling

Re: [DISCUSS] Autoformat python code with Black

2019-05-29 Thread Ismaël Mejía
> I think the question is if it can be configured in a way to fit our > current linter's style. I don't think it is feasible to reformat the > entire Python SDK. It cannot be configured to do what we actually do because Black is configurable only to support the standard python codestyle

Re: [DISCUSS] Autoformat python code with Black

2019-05-29 Thread Ismaël Mejía
> My concerns are: > - The product is clearly marked as beta with a big warning. > - It looks like mostly a single person project. For the same reason I also > strongly prefer not using a fork for a specific setting. Fork will only have > less people looking at it. I suppose the project is

Re: Timer support in Flink

2019-05-29 Thread Reza Rokni
Thanx Max! Reza On Wed, 29 May 2019, 16:38 Maximilian Michels, wrote: > Hi Reza, > > The detailed view of the capability matrix states: "The Flink Runner > supports timers in non-merging windows." > > That is still the case. Other than that, timers should be working fine. > > > It makes very

Re: [DISCUSS] Autoformat python code with Black

2019-05-29 Thread Robert Bradshaw
Reformatting to 4 spaces seems a non-starter to me, as it would change nearly every single line in the codebase (and the loss of all context as well as that particular line). This is probably why the 2-space fork exists. However, we don't conform to that either--we use 2 spaces for indentation,

Re: Question about building Go SDK

2019-05-29 Thread Kengo Seki
It was my fault, probably my gradle cache was broken. Running gradlew succeeded after removing ~/.gradle. I'm sorry for bothering you. Kengo Seki On Wed, May 29, 2019 at 6:45 PM Kengo Seki wrote: > > Hi Beam developers, > > I tried to build Go SDK on the master branch, but encountered the >

Re: Definition of Unified model

2019-05-29 Thread Robert Bradshaw
On Tue, May 28, 2019 at 12:18 PM Jan Lukavský wrote: > > As I understood it, Kenn was supporting the idea that sequence metadata > is preferable over FIFO. I was trying to point out, that it even should > provide the same functionally as FIFO, plus one important more - > reproducibility and

Re: Hazelcast Jet Runner

2019-05-29 Thread Reza Rokni
Hi, Over 800 usages under java, might be worth doing a few PR... Also suggest we use a very light review process: First round go for low hanging fruit, if anyone does a -1 against a change then we leave that for round two. Thoughts? Cheers Reza On Wed, 29 May 2019 at 12:05, Kenneth Knowles

Re: Shuffling on apache beam

2019-05-29 Thread pasquale . bonito
Hi Reza, with GlobalWindow with triggering I was able to reduce hotspot issues gaining satisfying performance for BigTable update. Unfortunately latency when getting messages from PubSub remains around 1.5s that it's too much considering our NFR. This is the code I use to get the messages:

Re: Definition of Unified model

2019-05-29 Thread Robert Bradshaw
On Tue, May 28, 2019 at 2:34 PM Reuven Lax wrote: > > Sequence metadata does have the disadvantage that users can no longer use the > types coming from the source. You must create a new type that contains a > sequence number (unless Beam provides this). Yes. Well, the source would have a

Re: [VOTE] Release 2.13.0, release candidate #1

2019-05-29 Thread Thomas Weise
Added: https://github.com/apache/beam/pull/8714 On Tue, May 28, 2019 at 4:03 PM Ankur Goenka wrote: > Open cherry pick PRs for spark runner > https://github.com/apache/beam/pull/8705 > https://github.com/apache/beam/pull/8706 > > On Tue, May 28, 2019 at 3:42 PM Valentyn Tymofieiev > wrote: >

Re: [DISCUSS] Autoformat python code with Black

2019-05-29 Thread Thomas Weise
Thanks Ismaël for bringing this up. Dealing with Py lint issues has been painful and unproductive. For the Java side it became ./gradlew spotlessApply and there is no looking back. To get there it was also necessary to reformat some of the code, but the net benefits are evident. Even if we have

Re: [VOTE] Release 2.13.0, release candidate #1

2019-05-29 Thread Ahmet Altay
We have a quite a bit of cherry pick requests. Are they all for major/blocking issues? Have we uncovered issues in release validation that is normally missing in our daily tests? On Wed, May 29, 2019 at 10:20 AM Thomas Weise wrote: > Added: https://github.com/apache/beam/pull/8714 > > > On Tue,

Re: Definition of Unified model

2019-05-29 Thread Jan Lukavský
> Offsets within a file, unordered between files seems exactly analogous with offsets within a partition, unordered between partitions, right? Not exactly. The key difference is in that partitions in streaming stores are defined (on purpose, and with key impact on this discussion) as

Re: Question about building Go SDK

2019-05-29 Thread Lukasz Cwik
I have gotten the same error on other dependencies. Gogradle sometimes fails to pull down the dependencies. Retrying ./gradlew :sdks:go:resolveBuildDependencies in the past did allow me to fetch the dependencies. Others who focus only on the Go SDK work using the standard Go development process by

Re: Shuffling on apache beam

2019-05-29 Thread Pablo Estrada
If you add a stateful DoFn to your pipeline, you'll force Beam to shuffle data to their corresponding worker per key. I am not sure what is the latency cost of doing this (as the messages still need to be shuffled). But it may help you accomplish this without adding windowing+triggering. -P. On

Re: Definition of Unified model

2019-05-29 Thread Lukasz Cwik
Expanding the dimensionality could be the basis for loops within the graph since loops could be modeled as (time, loop iteration #, nested loop iteration #, nested nested loop iteration #, ...) On Tue, May 28, 2019 at 12:10 PM Jan Lukavský wrote: > Could this be solved by "expanding the

Re: [DISCUSS] Autoformat python code with Black

2019-05-29 Thread Ahmet Altay
Thank you all for doing the work. The demo PR looks decent. I still have 3 major concerns. (1) I prefer not to maintain a fork with our own customization for a tool. It will add up to our maintenance costs. (2) Even the small change touches about 20k lines of code. (3) On beta note, the specific

Re: [VOTE] Release 2.13.0, release candidate #1

2019-05-29 Thread Valentyn Tymofieiev
Hi Vasiullah, I am not aware of such function. I suggest that you start a new thread on u...@beam.apache.org mailing list for this question and describe your use-case there. On Wed, May 29, 2019 at 2:39 PM Vasiullah syed wrote: > Hello All, > > > Can anyone please help me out

Re: Question about building Go SDK

2019-05-29 Thread Robert Burke
Not a bother at all! The Gradle based build for the Go SDK is brittle, and there are a few issues around it. Notably that it's doesn't really line up with how end users will typically acquire dependencies, using the regular go toolchain. As Luke says, if you're working with or *on* the SDK, one

Re: [DISCUSS] Autoformat python code with Black

2019-05-29 Thread Katarzyna Kucharczyk
I think all of mentioned arguments are reasonable. As Ismaël said this all seems to be a tradeoff but maybe worth of considering. Maybe we should think why it’s worth to introduce a tool which will make us strictly use PEP8. First of all, it should help with good coding practices. For example if

Re: [VOTE] Release 2.13.0, release candidate #1

2019-05-29 Thread Ankur Goenka
Thanks Valentyn for providing the fix for the BQ blocker. All other cherrypicks are also done mentioned in the mail thread. I will start RC2 now. Thanks, Ankur On Wed, May 29, 2019 at 2:47 PM Valentyn Tymofieiev wrote: > Hi Vasiullah, > > I am not aware of such function. I suggest that you

Re: Proposal: Portability SDKHarness Docker Image Release with Beam Version Release.

2019-05-29 Thread Ankur Goenka
I agree, I think their are few things which have to be though through as part of Portable image release. * Where to host the images. We can ofcourse have an alias for the image which can point to a different location but the hosting location have to be sort through. * Validation process for the