Fwd: Launching a Portable Pipeline

2018-05-11 Thread Ankur Goenka
Hi, Recent effort on portability has introduced JobService and ArtifactService to the beam stack along with SDK. This has open up a few questions around how we start a pipeline in a portable setup (with JobService). I am trying to document our approach to launching a portable pipeline and take

Re: Reproducible Environment for Jenkins Tests By Using Container

2018-05-11 Thread Henning Rohde
This is very cool! Added some comments in the doc. Thanks, Henning On Fri, May 11, 2018 at 3:26 PM Yifan Zou wrote: > Hello, > > I am working on creating a reproducible build environment for BEAM. The > goal is having a reproducible environment by using docker for Beam

Enabling ErrorProne analysis on sdks-java-core

2018-05-11 Thread Scott Wegner
I just wanted to give a heads-up on pr/5319 [1], which makes sdks-java-core ErrorProne-clean and upgrades ErrorProne analysis to produce build errors. ErrorProne [2] is another static analysis tool which hooks into the Java compilation process. Kenn added it as warnings during the Gradle

Reproducible Environment for Jenkins Tests By Using Container

2018-05-11 Thread Yifan Zou
Hello, I am working on creating a reproducible build environment for BEAM. The goal is having a reproducible environment by using docker for Beam build and test on Jenkins and contributors' local machines. More details are in the proposal:

Re: Documenting Metrics API?

2018-05-11 Thread Lukasz Cwik
The programming guide doesn't have the information your looking for. Like Kenn says it should have it but it currently doesn't. On Fri, May 11, 2018 at 1:02 PM Kenneth Knowles wrote: > I think the programming guide needs to have end-user documentation. > > Kenn > > On Fri, May

Re: Documenting Metrics API?

2018-05-11 Thread Kenneth Knowles
I think the programming guide needs to have end-user documentation. Kenn On Fri, May 11, 2018 at 12:58 PM Lukasz Cwik wrote: > Are you speaking about metrics related to portability? If so, Alex shared > this doc a while back: https://s.apache.org/beam-fn-api-metrics > >

Re: Documenting Metrics API?

2018-05-11 Thread Pablo Estrada
I'm speaking about instructions on how to use Metrics.counter / Metrics.distribution, etc when writing pipelines. Best -P. On Fri, May 11, 2018 at 12:58 PM Lukasz Cwik wrote: > Are you speaking about metrics related to portability? If so, Alex shared > this doc a while back:

Re: Documenting Metrics API?

2018-05-11 Thread Lukasz Cwik
Are you speaking about metrics related to portability? If so, Alex shared this doc a while back: https://s.apache.org/beam-fn-api-metrics Otherwise, I'm not aware of any metrics related documentation for Apache Beam on the website. On Fri, May 11, 2018 at 12:02 PM Pablo Estrada

Re: Tracking what works with portability

2018-05-11 Thread Henning Rohde
> For runners*SDK pairs that don't have a batch/streaming distinction how about collapsing the columns? There is also often a difference in whether we've actually tried them or whether there are regression tests. Once we have a clearer (= greener and bluer) picture, I'm fine with collapsing some

Re: Graal instead of docker?

2018-05-11 Thread Kenneth Knowles
Romain, You probably did not mean to, but I think this message crosses outside the expected code of conduct. On Fri, May 11, 2018 at 11:48 AM Romain Manni-Bucau wrote: > > Also beam community is java - dont answer it is python or go without > checking ;). Not sure adding

Re: Graal instead of docker?

2018-05-11 Thread Eugene Kirpichov
On Fri, May 11, 2018 at 11:48 AM Romain Manni-Bucau wrote: > > > Le ven. 11 mai 2018 18:15, Andrew Pilloud a écrit : > >> Json and Protobuf aren't the same thing. Json is for exchanging >> unstructured data, Protobuf is for exchanging structured data.

Re: Graal instead of docker?

2018-05-11 Thread Reuven Lax
Romain, if we are specifically discussing the use of protocol buffers and gRPC, this is the result of community discussion on the dev list back in 2016. Many options were considered: JSON, Thrift, Kryo, and proto among them. The decision that protocol buffers and gRPC were the best solutions for

Re: Tracking what works with portability

2018-05-11 Thread Henning Rohde
> Yea so I guess the column is more just "what works?" and not "what works with portability?" Yeah - the Direct runner column is just "what works". It's included, because direct runners are still relevant in the portable world and it's useful to see what is supported there in comparison with the

Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-05-11 Thread Matthias Baetens
Hey Eugene, Apologies for picking this up so late, but I could help uploading your video to the Beam channel. Are you able to send me the raw file and do you have sign-off to go ahead with sharing it on YouTube? Thanks. Matthias On Sat, 14 Apr 2018 at 21:45 Eugene Kirpichov

Re: Tracking what works with portability

2018-05-11 Thread Kenneth Knowles
On Fri, May 11, 2018 at 11:46 AM Lukasz Cwik wrote: > > On Fri, May 11, 2018 at 11:40 AM Kenneth Knowles wrote: > >> This is great. "The Beam Vision in a spreadsheet" and/or what the >> capability matrix wishes it always had been. >> >> - I don't know how to

Documenting Metrics API?

2018-05-11 Thread Pablo Estrada
Hello all, I could not find a place were the Beam Metrics API is well detailed in the Beam website. Is there a JIRA tracking this? Perhaps our use-case driven docs + java/pydoc cover it well enough, but I'm not sure that that's the case. Thanks -P -- Got feedback? go/pabloem-feedback

Re: Graal instead of docker?

2018-05-11 Thread Romain Manni-Bucau
Le ven. 11 mai 2018 18:15, Andrew Pilloud a écrit : > Json and Protobuf aren't the same thing. Json is for exchanging > unstructured data, Protobuf is for exchanging structured data. The point of > Portability is to define a protocol for exchanging structured messages >

Re: Tracking what works with portability

2018-05-11 Thread Lukasz Cwik
On Fri, May 11, 2018 at 11:40 AM Kenneth Knowles wrote: > This is great. "The Beam Vision in a spreadsheet" and/or what the > capability matrix wishes it always had been. > > - I don't know how to interpret the DirectRunner column. Is it that it > uses ye olde proto round trip?

Re: Tracking what works with portability

2018-05-11 Thread Kenneth Knowles
This is great. "The Beam Vision in a spreadsheet" and/or what the capability matrix wishes it always had been. - I don't know how to interpret the DirectRunner column. Is it that it uses ye olde proto round trip? Another level is that it actually directly links in the SDK harness as a dep and

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-11 Thread Lukasz Cwik
The iterator going out of scope is the idiomatic way that resources are freed for Java developers (hence the weak/phantom reference suggestion). Explicitly requiring users to deal with 'handles' (like file streams) lead to leaked resources. On Fri, May 11, 2018 at 10:55 AM Kenneth Knowles

Jenkins build is back to stable : beam_SeedJob #1673

2018-05-11 Thread Apache Jenkins Server
See

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-11 Thread Kenneth Knowles
Thanks Xinyu, I actually had first sketched out just what you wrote. But then I realized a few things: - usually an Iterable does not allocate resources, only its Iterators - if you consume the whole iterator, I hope the user would not have to do any extra work - you can also automatically

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-11 Thread Eugene Kirpichov
I'm not sure if this has been proposed in this thread, but if the common case is that users consume the whole iterator, then you can close resources at !hasNext(). And for cleanup of incompletely consumed iterators, rely on what Kenn suggested. Since you're making your own runner, you can add

Tracking what works with portability

2018-05-11 Thread Henning Rohde
Hi everyone, While the portability framework moves forward, it is often hard to figure out exactly what is supported to work at any given time. There are still many irregularities, TODOs, bugs and small differences between batch and streaming and the portable SDK and runner implementations. For

Re: Graal instead of docker?

2018-05-11 Thread Andrew Pilloud
Json and Protobuf aren't the same thing. Json is for exchanging unstructured data, Protobuf is for exchanging structured data. The point of Portability is to define a protocol for exchanging structured messages across languages. What do you propose using on top of Json to define message structure?

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-11 Thread Lukasz Cwik
Alternatively to using weak/phantom reference: * Can you configure RocksDb's memory usage/limits? * Inside the iterator, periodically close and re-open the RocksDb connection seeking back to where the user was? * Use the ParDo/DoFn lifecycle and clean up after each processElement/finishBundle

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-11 Thread Xinyu Liu
Thanks for drafting the details about the two approaches, Kenn. Now I understand Luke's proposal better. The approach looks neat, but the uncertainty of *when* GC is going to kick in will make users' life hard. If the user happens to configure a large JVM heap size, and since rocksDb uses off-heap

Jenkins build became unstable: beam_SeedJob #1672

2018-05-11 Thread Apache Jenkins Server
See

Re: Documenting Github PR jenkins trigger phrases

2018-05-11 Thread Jean-Baptiste Onofré
Agree, I thought there's already a PR about that. Regards JB On 11/05/2018 16:11, Alexey Romanenko wrote: +1 to add such reference guide of Jenkins commands to Testing guide. It should be extremely useful, especially for those who were not aware about this before. WBR, Alexey On 11 May

Re: Documenting Github PR jenkins trigger phrases

2018-05-11 Thread Scott Wegner
+1 to adding a doc for this, along with some PR conventions about when to use these. Some questions that I would love to see documented: * What tests are run automatically and when (pre-commits, all languages, on every commit) * What other test suites exist, and how should they be used

Re: Graal instead of docker?

2018-05-11 Thread Romain Manni-Bucau
Le mer. 9 mai 2018 17:41, Eugene Kirpichov a écrit : > > > On Wed, May 9, 2018 at 1:08 AM Romain Manni-Bucau > wrote: > >> >> >> Le mer. 9 mai 2018 00:57, Henning Rohde a écrit : >> >>> There are indeed lots of possibilities for

Re: Documenting Github PR jenkins trigger phrases

2018-05-11 Thread Alexey Romanenko
+1 to add such reference guide of Jenkins commands to Testing guide. It should be extremely useful, especially for those who were not aware about this before. WBR, Alexey > On 11 May 2018, at 02:44, Ankur Goenka wrote: > > In my experience affect of white space in commit

Looking for contributors for Python 3 support

2018-05-11 Thread Robbe Sneyders
Hello everyone, We have started adding Python 3 support to Beam. It took a while to get the best approach sorted out, but the first PR [1] has been merged and we're ready to start working on additional subpackages in parallel. We would like to prevent regression as much as possible, so any help

Re: Can I get the contributor role?

2018-05-11 Thread Carlos Alonso
Thanks!! On Fri, May 11, 2018 at 1:44 PM Jean-Baptiste Onofré wrote: > Already done (by another guy certainly). > > Regards > JB > > On 05/11/2018 09:37 AM, Carlos Alonso wrote: > > Hi everyone!! > > > > I'm working on https://issues.apache.org/jira/browse/BEAM-4257 and I'd >

Re: Can I get the contributor role?

2018-05-11 Thread Jean-Baptiste Onofré
Already done (by another guy certainly). Regards JB On 05/11/2018 09:37 AM, Carlos Alonso wrote: > Hi everyone!! > > I'm working on https://issues.apache.org/jira/browse/BEAM-4257 and I'd like to > get the task assigned. > > Can a PMC for the project add me as a contributor and assign me the

Re: Can I get the contributor role?

2018-05-11 Thread Ismaël Mejía
Done and welcome! On Fri, May 11, 2018 at 9:38 AM Carlos Alonso wrote: > Hi everyone!! > I'm working on https://issues.apache.org/jira/browse/BEAM-4257 and I'd like to get the task assigned. > Can a PMC for the project add me as a contributor and assign me the ticket? >

Re: triggers in direct runner

2018-05-11 Thread Plajt, Vaclav
Hi Kenneth, thanks for the clarification. I was not aware of bundles. Now it makes sense. Vaclav From: Kenneth Knowles Sent: Thursday, May 10, 2018 4:57:34 PM To: dev Subject: Re: triggers in direct runner Hi Vaclav, Slightly stale but still

Re: Jackson serialisation of GenericJson subclasses

2018-05-11 Thread Tim Robertson
You're very welcome. Glad you have it sorted. On Fri, May 11, 2018 at 12:48 PM, Carlos Alonso wrote: > Hi Tim, many thanks for your help. It's definitely interesting, but > unfortunately not useful this time, I think, as that JsonTypeInfo and > JsonSubClasses annotations

Re: Jackson serialisation of GenericJson subclasses

2018-05-11 Thread Carlos Alonso
Hi Tim, many thanks for your help. It's definitely interesting, but unfortunately not useful this time, I think, as that JsonTypeInfo and JsonSubClasses annotations are on the base class, which, in my case, I don't own and even if I did, I don't think I could list all the subclasses GenericJson

Can I get the contributor role?

2018-05-11 Thread Carlos Alonso
Hi everyone!! I'm working on https://issues.apache.org/jira/browse/BEAM-4257 and I'd like to get the task assigned. Can a PMC for the project add me as a contributor and assign me the ticket? Thanks!