Re: Enhancement for Joining Unbounded PCollections of different WindowFns

2019-07-26 Thread rahul patwari
Thanks for your detailed explanation Rui. Like you said, the triggers for the PCollections should be compatible with "Slowly Changing Lookup Cache" pattern. Rui, If this feature makes sense, can you please create a JIRA for it. I will start working on splitting BeamJoinRel.java to specific

Re: [VOTE] Release 2.14.0, release candidate #1

2019-07-26 Thread Ahmet Altay
To confirm, I manuall validated leader board on python. It is working. On Fri, Jul 26, 2019 at 5:23 PM Yifan Zou wrote: > AFAIK, there should not be any special prerequisites for this. Things the > script does including: > 1. download the python rc in zip > 2. start virtualenv and install the

Re: [VOTE] Release 2.14.0, release candidate #1

2019-07-26 Thread Yifan Zou
AFAIK, there should not be any special prerequisites for this. Things the script does including: 1. download the python rc in zip 2. start virtualenv and install the sdk. 3. verify hash. 4. config settings.xml and start a Java pubsub message injector. 5. run game examples and validate. Could you

Re: [POPOSAL] Integrate BigQuery-compatible HyperLogLog algorithm into Beam

2019-07-26 Thread Robin Qiu
Quick update: the PR implementing this feature has been sent out: https://github.com/apache/beam/pull/9144. The design doc is also revamped to reflect the design decisions we have made. On Tue, Jun 25, 2019 at 2:05 PM Robin Qiu wrote: > Can you please add this to the design documents webpage.

Re: Enhancement for Joining Unbounded PCollections of different WindowFns

2019-07-26 Thread Rui Wang
> > PCollection mainStream = ... > *PCollectionView>>* lookupStream = ... // Note: > PCollectionView not PCollection. I have referred to PCollection before. And > *PCollectionView > should be of type Multimap*, to perform SideinputJoin. > PCollectionTuple tuple = PCollectionTuple.of(new

Beam metrics update

2019-07-26 Thread Mikhail Gryzykhin
Hello everybody, I'm working on improving deployment scripts for beam metrics site and going to do some updates over the weekend. This might bring site down for short periods of time. Please respond to this message if you require metrics dashboards

Neat Beam integration - Ananas Analytics Desktop

2019-07-26 Thread Kenneth Knowles
One colleague pointed me to this project and another tweeted about it [1]. I see it was mentioned on hacker news around that time [2]. Looks like a cool visual editor that produces Beam pipelines. Small # of contributors, ASL2 licensed, from a brief glance the code arrived on GitHub in a few large

Re: Enhancement for Joining Unbounded PCollections of different WindowFns

2019-07-26 Thread rahul patwari
Is this the flow that you are referring to: PCollection mainStream = ... *PCollectionView>>* lookupStream = ... // Note: PCollectionView not PCollection. I have referred to PCollection before. And *PCollectionView should be of type Multimap*, to perform SideinputJoin. PCollectionTuple tuple

Re: Choosing a coder for a class that contains a Row?

2019-07-26 Thread Reuven Lax
The metadata needed is already there - it's the encoding-position map in Schema. However the code needs to be written to examine an old schema and a new one in order to make the new schema encoding-compatible with the old one. This shouldn't be difficult to write. On Fri, Jul 26, 2019 at 10:21 AM

Re: [VOTE] Release 2.14.0, release candidate #1

2019-07-26 Thread Anton Kedin
Cool, will make the post and will update the release guide as well then On Fri, Jul 26, 2019 at 10:20 AM Chad Dombrova wrote: > I think the release guide needs to be updated to remove the optionality of >> blog creation and avoid confusion. Thanks for pointing that out. >> > > +1 > >

Re: Choosing a coder for a class that contains a Row?

2019-07-26 Thread Kenneth Knowles
The most challenging part, as I understand it, surrounds automatically inferred schemas from POJOs, where Java's nondeterministic iteration order, combined with a row's inherent ordering, means that even an identical pipeline will need some metadata to plumb the right fields to the right column

Re: [VOTE] Release 2.14.0, release candidate #1

2019-07-26 Thread Chad Dombrova
> > I think the release guide needs to be updated to remove the optionality of > blog creation and avoid confusion. Thanks for pointing that out. > +1

Re: [VOTE] Release 2.14.0, release candidate #1

2019-07-26 Thread Thomas Weise
A quick look over the JIRA release notes reveals new features and important improvements that call to be announced. I think the release guide needs to be updated to remove the optionality of blog creation and avoid confusion. Thanks for pointing that out. On Fri, Jul 26, 2019 at 9:53 AM Anton

Re: [VOTE] Release 2.14.0, release candidate #1

2019-07-26 Thread Anton Kedin
Hi Thomas, I haven't made it. I read that step of the guide as optional ("..if needed for this particular release..."). I am not sure if anything specific needs to be announced or highlighted for 2.14. I can go over the closed Jiras and create a blog post if it's expected. Regards, Anton On Fri,

Re: [VOTE] Release 2.14.0, release candidate #1

2019-07-26 Thread Thomas Weise
Hi Anton, Thanks for working on the release. I don't find the release blog in https://github.com/apache/beam/pull/9157 or elsewhere? This should be part of the release candidate [1] and I wonder why we keep on missing it in RCs. Is there something that needs be be fixed in [1]? The reason why

Re: Sort Merge Bucket - Action Items

2019-07-26 Thread Kenneth Knowles
There is still considerable value in knowing data sources statically so you can do things like fetch sizes and other metadata and adjust pipeline shape. I would not expect to delete these, but to implement them on top of SDF while still giving them a clear URN and payload so runners can know that

Re: Collecting metrics in JobInvocation - BEAM-4775

2019-07-26 Thread Kenneth Knowles
Took a look at the code, too. It seems like a mismatch in a few ways - PipelineRunner::run is async already and returns while the job is still running - PipelineResult is a legacy name - it is really meant to be a handle to a running job - cancel() on a future is just not really related to

Collecting metrics in JobInvocation - BEAM-4775

2019-07-26 Thread Ɓukasz Gajowy
Hi all, I'm currently working on BEAM-4775 . The goal here is to pass portable MetricResults over the RPC API to the PortableRunner (SDK) part and allow reading them there. The metrics can be collected from the pipeline result that is available in

Re: Sort Merge Bucket - Action Items

2019-07-26 Thread Robert Bradshaw
On Thu, Jul 25, 2019 at 11:09 PM Eugene Kirpichov wrote: > > Hi Gleb, > > Regarding the future of io.Read: ideally things would go as follows > - All runners support SDF at feature parity with Read (mostly this is just > the Dataflow runner's liquid sharding and size estimation for bounded >