Re: Key encodings for state requests

2019-11-05 Thread Kenneth Knowles
Specifically, "We have no way of telling from the Runner side, if a length prefix has been used or not." seems false. The runner has all the information since length prefix is a model coder. Didn't we agree that all coders should be self-delimiting in runner/SDK interactions, requiring

Re: What code runs in PreCommits?

2019-11-05 Thread Kenneth Knowles
It is a good question, and the answer is good to remember. TL;DR it runs against the merge commit from the moment you last pushed. You can learn the answer by inspection of Jenkins logs and some knowledge of GitHub. See

What code runs in PreCommits?

2019-11-05 Thread Pablo Estrada
Hi, this may be a dumb question. Let's imagine a hypothetical case, where I open a pull request against master. I wrote the change on top of COMMIT#11, so: My branch: COMMIT#11 -> MyCommit Let's suppose that master has received a bunch of new commits (and a fix on COMMIT#12), so it looks like

Re: [Discuss] Beam mascot

2019-11-05 Thread Aizhamal Nurmamat kyzy
Aww.. that Hoover beaver is cute. But then lemur is also "taken" [1] and the owl too [2]. Personally, I don't think it matters much which mascots are taken, as long as the project is not too close in the same space as Beam. Also, it's good to just get all ideas out. We should still consider

Re: Key encodings for state requests

2019-11-05 Thread Luke Cwik
+1 to what Robert said. On Tue, Nov 5, 2019 at 2:36 PM Robert Bradshaw wrote: > The Coder used for State/Timers in a StatefulDoFn is pulled out of the > input PCollection. If a Runner needs to partition by this coder, it > should ensure the coder of this PCollection matches with the Coder >

Re: RFC: python static typing PR

2019-11-05 Thread Robert Bradshaw
Sounds like we have consensus. Let's move forward. I'll follow up with the discussions on the PRs themselves. On Wed, Oct 30, 2019 at 2:38 PM Robert Bradshaw wrote: > > On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova wrote: > > > >> Do you believe that a future mypy plugin could replace pipeline

Re: Key encodings for state requests

2019-11-05 Thread Robert Bradshaw
The Coder used for State/Timers in a StatefulDoFn is pulled out of the input PCollection. If a Runner needs to partition by this coder, it should ensure the coder of this PCollection matches with the Coder used to create the serialized bytes that are used for partitioning (whether or not this is

Key encodings for state requests

2019-11-05 Thread Maximilian Michels
Hi, I wanted to get your opinion on something that I have been struggling with. It is about the coders for state requests in portable pipelines. In contrast to "classic" Beam, the Runner is not guaranteed to know which coder is used by the SDK. If the SDK happens to use a standard coder

Re: Embedding expansion service for cross language in the runner

2019-11-05 Thread Robert Bradshaw
On Tue, Nov 5, 2019 at 10:32 AM Hai Lu wrote: > > Starting the expansion service in the job server is helpful. But having to > expose the port number and to include the address in the > beam.ExternalTransform is still a hassle. Giving a hard-coded port number > might be the only solution right

Re: Custom Windowing and TimestampCombiner

2019-11-05 Thread Reuven Lax
I'm not sure if it's currently legal. However the watermark is generally defined to be monotonic, so if it was allowed it would result in late data in the pipeline. On Tue, Nov 5, 2019 at 10:29 AM Aaron Dixon wrote: > Thanks Reuven, > > So is my conclusion correct? That it is illegal for any

Re: Embedding expansion service for cross language in the runner

2019-11-05 Thread Hai Lu
Starting the expansion service in the job server is helpful. But having to expose the port number and to include the address in the beam.ExternalTransform is still a hassle. Giving a hard-coded port number might be the only solution right now but it's not a very clean solution in our case.

Re: Custom Windowing and TimestampCombiner

2019-11-05 Thread Aaron Dixon
Thanks Reuven, So is my conclusion correct? That it is illegal for any custom window function (+ combiner policy) to merge in a way that would regress the watermark? What do Runners (eg Dataflow) do if this occurs? Does the API obligate runners to fail, or can insanity ensue? :) On Tue, Nov 5,

Re: Python Precommit duration pushing 2 hours

2019-11-05 Thread David Cavazos
+1 to moving the GCP tests outside of core. If there are issues that only show up on GCP tests but not in core, it might be an indication that there needs to be another test in core covering that, but I think that should be pretty rare. On Mon, Nov 4, 2019 at 8:33 PM Kenneth Knowles wrote: > +1

Re: Custom Windowing and TimestampCombiner

2019-11-05 Thread Reuven Lax
On Tue, Nov 5, 2019 at 8:07 AM Aaron Dixon wrote: > I noticed that if I use TimestampCombiner/EARLIEST for session windows > that the watermark appears to get held up for sessions that never "close" > (or that extend for a long time). > Correct - because the watermark is then being held up by

Custom Windowing and TimestampCombiner

2019-11-05 Thread Aaron Dixon
I noticed that if I use TimestampCombiner/EARLIEST for session windows that the watermark appears to get held up for sessions that never "close" (or that extend for a long time). But if I use default (TimestampCombiner/END_OF_WINDOW) the watermark doesn't get held. Does this mean that the

Jenkins workflow improvement question

2019-11-05 Thread Michał Walenia
Hi all, As those of you that work on Jenkins jobs know, they can be a pain to work with. Even simple changes are painful to run in the PR because of the seed job - it reloads all the jobs sequentially and runs for over 10 minutes. If someone else runs it against another branch - tough luck, you

Re: [Discuss] Beam mascot

2019-11-05 Thread Maximilian Michels
Quick update: The mentioned designer has gotten back to me and offered to sketch something until the end of the week. I've pointed him to this thread and the existing logo material: https://beam.apache.org/community/logos/ [I don't want to interrupt the discussion in any way, I just think

[VOTE] @RequiresTimeSortedInput stateful DoFn annotation

2019-11-05 Thread Jan Lukavský
Hi, I'd like to open a vote on accepting design document [1] as a base for implementation of @RequiresTimeSortedInput annotation for stateful DoFns. Associated JIRA [2] and PR [3] contains only subset of the whole functionality (allowed lateness ignored and no possibility to specify UDF for

Re: [Discuss] Beam mascot

2019-11-05 Thread Maximilian Michels
How about fireflies in the Beam light rays? ;) Feels like "Beam" would go well with an animal that has glowing bright eyes such as a lemur I love the lemur idea because it has almost orange eyes. Thanks for starting this Aizhamal! I've recently talked to a designer which is somewhat famous