Re: SplittableDoFn

2018-08-30 Thread Lukasz Cwik
I came up with a proposal[1] for a progress model solely based off of the backlog and that splits should be based upon the remaining backlog we want the SDK to split at. I also give recommendations to runner authors as to how an autoscaling system could work based upon the measured backlog. A lot

Re: jira search in chrome omnibox

2018-08-30 Thread Udi Meiri
Correction: this is the correct URL: https://issues.apache.org/jira/secure/QuickSearch.jspa?searchString=%s It uses smart querying. Ex: Searching for "beam open pubsub" will search for open bugs in project BEAM with the keyword "pubsub". On Tue, Aug 28, 2018 at 4:49 PM Valentyn Tymofieiev

Re: builds.apache.org refused connections since last night

2018-08-30 Thread Boyuan Zhang
Hey Thomas, I guess a comitter can push changes directly into https://gitbox.apache.org/repos/asf?p=beam-site.git. Maybe can have a try with a comitter's help. On Thu, Aug 30, 2018 at 10:34 AM Thomas Weise wrote: > While Jenkins is down, is there an alternative process to merge web site >

Re: builds.apache.org refused connections since last night

2018-08-30 Thread Thomas Weise
While Jenkins is down, is there an alternative process to merge web site changes? On Wed, Aug 29, 2018 at 9:19 AM Boyuan Zhang wrote: > Thank you Andrew! > > On Wed, Aug 29, 2018 at 9:17 AM Andrew Pilloud > wrote: > >> Down for me too. It sounds like the disk failed and it will be down for a

Re: Status of IntelliJ with Gradle

2018-08-30 Thread Maximilian Michels
Small update, it helps to add the following to the IntelliJ properties: Help -> Edit Custom Properties idea.max.intellisense.filesize=5000 This gets rid of the errors due to large generated source files, e.g. RunnerApi.java. -Max On 22.08.18 23:26, Kai Jiang wrote: I encountered same

Re: [QUESTION] retrial in sources by the runners

2018-08-30 Thread Lukasz Cwik
Runners are responsible for retry semantics, they should catch the failure and choose whether they want to retry or not. I think your reading of the code is correct. Some I/O layers do retry but that is more about attempting to continue processing within a bundle instead of failing and having the

Re: Beam Schemas: current status

2018-08-30 Thread Reuven Lax
Max, Nested Pojos are fully supported, as are nested array/collection and map types (e.g. if your Pojo contains List). One limitation right now is that only mutable Pojos are supported. For example, the following Pojo would _not_ work, because the fields aren't mutable. public class Pojo {

Re: Beam Schemas: current status

2018-08-30 Thread Ismaël Mejía
Thanks Reuven for the excellent summary and thanks to all the guys who worked in the Schema/SQL improvements. This is great for usability. I really like the idea of making user experience simpler, e.g. by automatically inferring Coders. Some questions: - Any plans to add similar improvements for

Re: Beam Schemas: current status

2018-08-30 Thread Reuven Lax
Andrew - the @Experimental tag simply means that we are free to change the interfaces without waiting for the next major Beam version. Once we are happy to freeze these interfaces, we can drop the tag. On Wed, Aug 29, 2018 at 1:48 PM Andrew Pilloud wrote: > The work you've done to generalize

Re: Beam Schemas: current status

2018-08-30 Thread Connell O'Callaghan
Nice work Reuven!!! On Thu, Aug 30, 2018 at 6:57 AM Jean-Baptiste Onofré wrote: > Nice feature, thanks Reuven ! > > I started to revamp the Spark runner with dataset, I will leverage this ! > > Regards > JB > > On 29/08/2018 07:40, Reuven Lax wrote: > > I wanted to send a quick note to the

Re: [Proposal] Creating a reproducible environment for Beam Jenkins Tests

2018-08-30 Thread Jean-Baptiste Onofré
Hi, That's interesting, however, it's really important to still be able to easily run test locally, without any VM/Docker required. It should be activated by profile or so. Regards JB On 27/08/2018 19:53, Yifan Zou wrote: > Hi, > > I have a proposal for creating a reproducible environment for

Re: Beam Schemas: current status

2018-08-30 Thread Jean-Baptiste Onofré
Nice feature, thanks Reuven ! I started to revamp the Spark runner with dataset, I will leverage this ! Regards JB On 29/08/2018 07:40, Reuven Lax wrote: > I wanted to send a quick note to the community about the current status > of schema-aware PCollections in Beam. As some might remember we

Re: Beam Schemas: current status

2018-08-30 Thread Maximilian Michels
That's a cool feature. Are there any limitations for the schema inference apart from being a Pojo/Bean? Does it supported nested PoJos, e.g. "wrapper.field"? -Max On 29.08.18 07:40, Reuven Lax wrote: I wanted to send a quick note to the community about the current status of schema-aware

Re: [Proposal] Creating a reproducible environment for Beam Jenkins Tests

2018-08-30 Thread Maximilian Michels
Hi Yifan, Thanks for the proposal. I like the idea of unifying test environments via Docker. It would be great if we could still easily run tests without Docker. Best, Max On 27.08.18 19:53, Yifan Zou wrote: Hi, I have a proposal for creating a reproducible environment for Jenkins tests

Re: Accessing attempted metrics from within a DoFn

2018-08-30 Thread Etienne Chauchot
Robin, I asked myself the same thing, and indeed there is no way of accessing the metrics from within the pipeline itself. The only access you can have is directly to the MetricCell like that (1) but it is runner facing, it is not user facing, so not helping for your test.So I agree with you:

[QUESTION] retrial in sources by the runners

2018-08-30 Thread Etienne Chauchot
Hi all, I have a question concerning retrial of sources. I've looked at the code of direct runner and spark runner on bounded sources. As far as I can tell, if there is a failure in reading a record from the reader of the source, there will be no retrial from the runner, there will just be an

Re: Beam Schemas: current status

2018-08-30 Thread Etienne Chauchot
Very impressive, thanks for your work Reuven ! Etienne Le mardi 28 août 2018 à 22:40 -0700, Reuven Lax a écrit : > I wanted to send a quick note to the community about the current status of > schema-aware PCollections in Beam. As some > might remember we had a good discussion last year about

Re: delayed emit (timer) in py-beam?

2018-08-30 Thread Charles Chen
FYI: the reference DirectRunner implementation of the Python user state and timers API is out for review: https://github.com/apache/beam/pull/6304 On Mon, Jul 30, 2018 at 3:57 PM Austin Bennett wrote: > Fantastic; thanks, Charles! > > > > On Mon, Jul 30, 2018 at 3:49 PM, Charles Chen wrote: >

Re: Proposal for Beam Python User State and Timer APIs

2018-08-30 Thread Charles Chen
Another update: the reference DirectRunner implementation of the Python user state and timers API is out for review: https://github.com/apache/beam/pull/6304 On Mon, Jul 9, 2018 at 2:18 PM Charles Chen wrote: > An update: https://github.com/apache/beam/pull/5691 has been merged. I > hope to

[BEAM-960] Backoff in the DirectRunner if no work is available

2018-08-30 Thread Vojtech Janota
Hi beamers, I would like to contribute fix for the following issue: - https://issues.apache.org/jira/browse/BEAM-690 The corresponding PR: - https://github.com/apache/beam/pull/6303 I tried to follow the approach suggested in the comments of the said ticket and any feedback is

Gradle tests parallelization

2018-08-30 Thread Etienne Chauchot
Hi everyone, To fix flaky tests, I was wondering what the current tests parallelization of the gradle build is. Luke gave me the informations I needed. I have transcribed our conversation in this (1) wiki page so that it can profit to everyone. [1]: