Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-07-23 Thread Rakesh Kumar
Hi Brian/Robert I am moving ahead with implementing ReadModifyWriteState. I will just repurpose this PR: https://github.com/apache/beam/pull/9067 . On Mon, Jul 15, 2019 at 7:47 PM Rakesh Kumar wrote: > Brian, > > I just want to follow up. Let me know if you are working on this. > Otherwise,

Re: Write-through-cache in State logic

2019-07-23 Thread Rakesh Kumar
Thanks Robert, I stumble on the jira that you have created some time ago https://jira.apache.org/jira/browse/BEAM-5428 You also marked code where code changes are required:

Re: On Auto-creating GCS buckets on behalf of users

2019-07-23 Thread Udi Meiri
Another idea would be to put default bucket preferences in a .beamrc file so you don't have to remember to pass it every time (this could also contain other default flag values). On Tue, Jul 23, 2019 at 1:43 PM Robert Bradshaw wrote: > On Tue, Jul 23, 2019 at 10:26 PM Chamikara Jayalath >

Re: Sort Merge Bucket - Action Items

2019-07-23 Thread Neville Li
So I spent one afternoon trying some ideas for reusing the last few transforms WriteFiles. WriteShardsIntoTempFilesFn extends DoFn*, Iterable>, *FileResult*> => GatherResults extends PTransform, PCollection>> => FinalizeTempFileBundles extends PTransform*>>, WriteFilesResult> I replaced

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-23 Thread Austin Bennett
Pablo, Assigned https://issues.apache.org/jira/browse/BEAM-7607 to you, to make even more likely that it is still around on the 25th :-) Cheers, Austin On Tue, Jul 23, 2019 at 11:24 AM Pablo Estrada wrote: > Hi all, > I've just realized that https://issues.apache.org/jira/browse/BEAM-7607 is

Re: On Auto-creating GCS buckets on behalf of users

2019-07-23 Thread Robert Bradshaw
On Tue, Jul 23, 2019 at 10:26 PM Chamikara Jayalath wrote: > > On Tue, Jul 23, 2019 at 1:10 PM Kyle Weaver wrote: >> >> I agree with David that at least clearer log statements should be added. >> >> Udi, that's an interesting idea, but I imagine the sheer number of existing >> flags (including

Re: On Auto-creating GCS buckets on behalf of users

2019-07-23 Thread Chamikara Jayalath
On Tue, Jul 23, 2019 at 1:10 PM Kyle Weaver wrote: > I agree with David that at least clearer log statements should be added. > > Udi, that's an interesting idea, but I imagine the sheer number of > existing flags (including many SDK-specific flags) would make it difficult > to implement. In

Re: Sort Merge Bucket - Action Items

2019-07-23 Thread Chamikara Jayalath
On Mon, Jul 22, 2019 at 1:41 PM Robert Bradshaw wrote: > On Mon, Jul 22, 2019 at 7:39 PM Eugene Kirpichov > wrote: > > > > On Mon, Jul 22, 2019 at 7:49 AM Robert Bradshaw > wrote: > >> > >> On Mon, Jul 22, 2019 at 4:04 PM Neville Li > wrote: > >> > > >> > Thanks Robert. Agree with the FileIO

Re: On Auto-creating GCS buckets on behalf of users

2019-07-23 Thread Kyle Weaver
I agree with David that at least clearer log statements should be added. Udi, that's an interesting idea, but I imagine the sheer number of existing flags (including many SDK-specific flags) would make it difficult to implement. In addition, uniform argument names wouldn't necessarily ensure

Re: On Auto-creating GCS buckets on behalf of users

2019-07-23 Thread Valentyn Tymofieiev
+1 to have a consistent experience across SDKs, and do bucket creation by default, specifically: - Temp locations should be optional. - Autocreation behavior should be documented. - The messages ("using bucket X", or "creating bucket X since temp_location is not specified") should be visible in

Re: On Auto-creating GCS buckets on behalf of users

2019-07-23 Thread Udi Meiri
Java SDK creates one regional bucket per project and region combination . So it's not a

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-23 Thread Pablo Estrada
Hi all, I've just realized that https://issues.apache.org/jira/browse/BEAM-7607 is a single-line change - and we'd spend 40 minutes chitchatting, so I'll also be working on https://jira.apache.org/jira/browse/BEAM-7803, which is a Python issue (also for the BigQuery sink!). Thanks! -P. On Sat,

Re: On Auto-creating GCS buckets on behalf of users

2019-07-23 Thread David Cavazos
I would go for #1 since it's a better user experience. Especially for new users who don't understand every step involved on staging/deploying. It's just another (unnecessary) mental concept they don't have to be aware of. Anything that makes it closer to only providing the `--runner` flag without

Re: On Auto-creating GCS buckets on behalf of users

2019-07-23 Thread Chamikara Jayalath
Do we clean up auto created GCS buckets ? If there's no good way to cleanup, I think it might be better to make this opt-in. Thanks, Cham On Tue, Jul 23, 2019 at 3:25 AM Robert Bradshaw wrote: > I think having a single, default, auto-created temporary bucket per > project for use in GCP (when

Re: Jenkins nodes disconnected?

2019-07-23 Thread Yifan Zou
That was a known issue, BEAM-7650 . Basically, the disk was full. We should either fix this problem in the python precommit, or as Udi suggested, having a cron job to do the periodic disk space releases. I'll try to restore those broken agents.

Jenkins nodes disconnected?

2019-07-23 Thread Ɓukasz Gajowy
Hi, I noticed that 5 Jenkins nodes are disconnected[1]. This results in a very long task queue and requires long waiting for a job to be completed. I'm currently waiting 42 minutes for seed job to be started (and still counting). Is anyone currently working on reconnecting the nodes? Why is this

Re: Write-through-cache in State logic

2019-07-23 Thread Robert Bradshaw
This is documented at https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m . Note that it requires participation of both the runner and the SDK (though there are no correctness issues if one or the other side does not understand the protocol,

Re: On Auto-creating GCS buckets on behalf of users

2019-07-23 Thread Robert Bradshaw
I think having a single, default, auto-created temporary bucket per project for use in GCP (when running on Dataflow, or running elsewhere but using GCS such as for this BQ load files example), though not ideal, is the best user experience. If we don't want to be automatically creating such things

Re: Jenkins failures / dependency downloads / gradle caching

2019-07-23 Thread Manu Zhang
Is build cache enabled ? If so, will temporarily disabling it help ? On Tue, Jul 23, 2019 at 11:08 AM Kenneth Knowles wrote: > The PR that is causing me the most trouble is > https://github.com/apache/beam/pull/9071. But each merge to 2.7.1 took a > long time, and many of them I eventually just