Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-01 Thread Ruoyun Huang
Congratulations everyone! Well deserved! On Wed, May 1, 2019 at 8:38 PM Kenneth Knowles wrote: > Congrats! All well deserved! > > Kenn > > On Wed, May 1, 2019 at 8:09 PM Reza Rokni wrote: > >> Congratulations! >> >> On Thu, 2 May 2019 at 10:53, Connell O'Callaghan >> wrote: >> >>> Well done

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-01 Thread Kenneth Knowles
Congrats! All well deserved! Kenn On Wed, May 1, 2019 at 8:09 PM Reza Rokni wrote: > Congratulations! > > On Thu, 2 May 2019 at 10:53, Connell O'Callaghan > wrote: > >> Well done - congratulations to you all!!! Rose thank you for sharing this >> news!!! >> >> On Wed, May 1, 2019 at 19:45 Rose

Re: GSOC - Implement an S3 filesystem for Python SDK

2019-05-01 Thread Pasan Kamburugamuwa
Thanks a lot Jeff. I will follow this. On Thu, May 2, 2019 at 7:31 AM Jeff Klukas wrote: > For getting started reading data, there are some public S3 buckets on > which Amazon hosts data used in tutorials. For example, you should be able > to access s3://awssampledbuswest2 which is referenced

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-01 Thread Reza Rokni
Congratulations! On Thu, 2 May 2019 at 10:53, Connell O'Callaghan wrote: > Well done - congratulations to you all!!! Rose thank you for sharing this > news!!! > > On Wed, May 1, 2019 at 19:45 Rose Nguyen wrote: > >> Matthias Baetens, Lukazs Gajowy, Suneel Marthi, Maximilian Michels, Alex >>

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-01 Thread Connell O'Callaghan
Well done - congratulations to you all!!! Rose thank you for sharing this news!!! On Wed, May 1, 2019 at 19:45 Rose Nguyen wrote: > Matthias Baetens, Lukazs Gajowy, Suneel Marthi, Maximilian Michels, Alex > Van Boxel, and Thomas Weise: > > Thank you for your exceptional contributions to Apache

Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-01 Thread Rose Nguyen
Matthias Baetens, Lukazs Gajowy, Suneel Marthi, Maximilian Michels, Alex Van Boxel, and Thomas Weise: Thank you for your exceptional contributions to Apache Beam. I'm looking forward to seeing this project grow and for more folks to contribute and be recognized! Everyone can read more about this

Re: GSOC - Implement an S3 filesystem for Python SDK

2019-05-01 Thread Jeff Klukas
For getting started reading data, there are some public S3 buckets on which Amazon hosts data used in tutorials. For example, you should be able to access s3://awssampledbuswest2 which is referenced in Redshift tutorials [0]. Amazon also has a free tier for S3 for the first year an account is

GSOC - Implement an S3 filesystem for Python SDK

2019-05-01 Thread Pasan Kamburugamuwa
Hi all, I want to access to a s3 bucket in order to get familiarize with the boto3. So can you guys help me in this process. Thank you

Fwd: Your application for Season of Docs 2019 was unsuccessful

2019-05-01 Thread Pablo Estrada
Hello all, as you may already know, unfortunately our application for Season of Docs was not successful. That's too bad : ) - but it's good that we were able to produce a couple work items that can still be picked up by the community at some point. Thanks to everyone who helped here. Best -P.

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-01 Thread Lukasz Cwik
On Wed, May 1, 2019 at 11:09 AM Kenneth Knowles wrote: > On Wed, May 1, 2019 at 8:51 AM Reuven Lax wrote: > >> ValueState is not necessarily racy if you're doing a read-modify-write. >> It's only racy if you're doing something like writing last element seen. >> > > Race conditions are not

Re: [DISCUSS] (Forked thread) Beam issue triage & assignees

2019-05-01 Thread Kenneth Knowles
Yes, new issues should have that status. And a correction: it is "Triage Needed" On Wed, May 1, 2019, 11:39 Pablo Estrada wrote: > Hi Kenn, > For my information... is the Needs Triage status automatically assigned to > new issues? Are users expected to give their issue the Needs Triage status >

Re: [DISCUSS] (Forked thread) Beam issue triage & assignees

2019-05-01 Thread Pablo Estrada
Hi Kenn, For my information... is the Needs Triage status automatically assigned to new issues? Are users expected to give their issue the Needs Triage status when they create it? Thanks -P. On Wed, May 1, 2019 at 11:12 AM Kenneth Knowles wrote: > An update here: we have the new workflow in

Re: [DISCUSS] (Forked thread) Beam issue triage & assignees

2019-05-01 Thread Kenneth Knowles
An update here: we have the new workflow in place. I have transitioned untriaged issues to the "Needs Triage" status" so they are very easy to find, and removed the obsolete triaged label. Please help to triage! You can just look at all issues with the Needs Triage status and make sure it is in

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-01 Thread Kenneth Knowles
On Wed, May 1, 2019 at 8:51 AM Reuven Lax wrote: > ValueState is not necessarily racy if you're doing a read-modify-write. > It's only racy if you're doing something like writing last element seen. > Race conditions are not inherently a problem. They are neither necessary nor sufficient for

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-01 Thread Brian Hulette
I guess it depends on how we define LatestCombineFn. My assumption was LatestCombineFn meant "throw away everything else when a new element is written" i.e. latest means latest in processing time. Then a CombiningValueState(LatestCombineFn) would be the same as ValueState I think - and you could

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-01 Thread Reuven Lax
ValueState is not necessarily racy if you're doing a read-modify-write. It's only racy if you're doing something like writing last element seen. While it's true that many read-modify-write patterns can be expressed via a combiner, having to write a full combiner to do something simple will be a

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-01 Thread Brian Hulette
> LatestCombineFn sounds to me like the worst possible world. It will almost always be racy and confusing. But isn't that Robert's point? ValueState is already racy and confusing, the LatestCombineFn just makes it explicit. On Wed, May 1, 2019 at 8:39 AM Reuven Lax wrote: > I also think that

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-01 Thread Reuven Lax
I also think that ValueState is very useful, for all the reasons mentioned in this thread. Also keep in mind that even for cases where CombiningState can be used, that will be much more cumbersome unless a preexisting combiner is already written. Writing .a custom combiner is a lot of boilerplate

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-01 Thread Lukasz Cwik
Note, the example didn't support merging windows so I also ignored it. In the case of merging windows, your solution would depend on whether you needed to know from what window the enriched event was from. On Wed, May 1, 2019 at 8:30 AM Lukasz Cwik wrote: > Isn't a value state just a bag state

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-01 Thread Lukasz Cwik
Isn't a value state just a bag state with at most one element and the usage pattern would be? 1) value_state.get == bag_state.read.next() (both have to handle the case when neither have been set) 2) user logic on what to do with current state + additional information to produce new state 3)

Re: Custom shardingFn for FileIO

2019-05-01 Thread Jozef Vilcek
That coder is added extra as a re-map stage from "original" key to new ShardAwareKey ... But pipeline might get broken I guess. Very fair point. I am having a second thought pass over this and will try to simplify it much more On Wed, May 1, 2019 at 2:12 PM Reuven Lax wrote: > I haven't looked

Re: Custom shardingFn for FileIO

2019-05-01 Thread Reuven Lax
I haven't looked at the PR in depth yet, but it appears that someone running a pipeline today who then tries to update post this PR will have the coder change to DefaultShardKeyCoder, even if they haven't picked any custom function. Is that correct, or am I misreading things? Reuven On Tue, Apr

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-01 Thread Jozef Vilcek
On Tue, Apr 30, 2019 at 5:42 PM Kenneth Knowles wrote: > > > On Tue, Apr 30, 2019, 07:05 Reuven Lax wrote: > >> In that case, Robert's point is quite valid. The old Flink runner I >> believe had no knowledge of fusion, which was known to make it extremely >> slow. A lot of work went into making