Re: learning kata as root project

2019-05-24 Thread Kenneth Knowles
Agree that "just import learnings/kata/java" sounds right. Is there any build that checks the content so that changes in Beam do not break it? Kenn On Fri, May 17, 2019 at 2:19 AM Michael Luckey wrote: > Hi Henry, > > unfortunately I do not know how to give advice here. I would first need to >

Re: Environments for External Transforms

2019-05-24 Thread Lukasz Cwik
Dataflow has been doing something similar in this route where it is trying to get rid of the driver program running on the users machine. If you can get the expansion service to launch and run an environment to perform the expansion, you could also get it to create and submit a job as well

Re: [ANNOUNCE] New PMC Member: Pablo Estrada

2019-05-24 Thread Joana Filipa Bernardo Carrasqueira
Congratulations Pablo! Well deserved :D On Fri, May 17, 2019 at 3:14 PM Hannah Jiang wrote: > Congratulations, Pablo, you deserve it! > > *From: *Mark Liu > *Date: *Fri, May 17, 2019 at 2:45 PM > *To: * > > Congratulations, Pablo! >> >> *From: *Alexey Romanenko >> *Date: *Fri, May 17, 2019

Re: [Discuss] Ideas for Apache Beam presence in social media

2019-05-24 Thread Aizhamal Nurmamat kyzy
Hi everyone, I'd like to pilot this if that's okay by everyone. I'll set up a spreadsheet, write a blog post publicizing it, and perhaps send out a tweet. We can improve the process later with tools if necessary. Thanks all and have a great weekend! Aizhamal On Tue, May 21, 2019 at 8:37 PM

Re: [DISCUSS] Portability representation of schemas

2019-05-24 Thread Lukasz Cwik
Your reasoning about SchemaCoder really being a type coercion coder makes a lot of sense to me. On Fri, May 24, 2019 at 11:42 AM Brian Hulette wrote: > *tl;dr:* SchemaCoder represents a logical type with a base type of Row > and we should think about that. > > I'm a little concerned that the

Re: [Discuss] Ideas for Apache Beam presence in social media

2019-05-24 Thread Kenneth Knowles
Thanks for taking on this work! Kenn On Fri, May 24, 2019 at 2:52 PM Aizhamal Nurmamat kyzy wrote: > Hi everyone, > > I'd like to pilot this if that's okay by everyone. I'll set up a > spreadsheet, write a blog post publicizing it, and perhaps send out a > tweet. We can improve the process

Re: Definition of Unified model

2019-05-24 Thread Kenneth Knowles
I strongly prefer explicit sequence metadata over FIFO requirements, because: - FIFO is complex to specify: for example Dataflow has "per stage key-to-key" FIFO today, but it is not guaranteed to remain so (plus "stage" is not a portable concept, nor even guaranteed to remain a Dataflow concept)

Re: contributor permission for Beam Jira tickets

2019-05-24 Thread Lukasz Cwik
Welcome. I have added you as a contributor and assigned BEAM-7414 to you. On Fri, May 24, 2019 at 9:40 AM Nicolas Delsaux wrote: > Hi > > I've just submitted https://issues.apache.org/jira/browse/BEAM-7414 for > which i have a working solution. > > Could you please add me as contributor (in

Re: Hazelcast Jet Runner

2019-05-24 Thread Kenneth Knowles
My request was that the artifact be beam-runners-jet-experimental or beam-runners-experimental-jet so that a user was clearly opting in to experimental functionality, per the discussion. I try not to have a strong opinion about the mechanism. Probably the most natural thing to do is just configure

Re: Beam Summit Europe: speakers and schedule online!

2019-05-24 Thread Joana Filipa Bernardo Carrasqueira
Great! Thanks for spotting that. We'll update it in our page! On Fri, May 24, 2019 at 4:56 AM Aljoscha Krettek wrote: > You’re both right. The Kulturbrauerei Area is between Knaackstraße and > Schönhauser Allee and there’s entrances on multiple sides. Schönhauser Alee > is the more prominent

Re: [DISCUSS] Portability representation of schemas

2019-05-24 Thread Brian Hulette
*tl;dr:* SchemaCoder represents a logical type with a base type of Row and we should think about that. I'm a little concerned that the current proposals for a portable representation don't actually fully represent Schemas. It seems to me that the current java-only Schemas are made up three

PubSubIT tests topic cleanup?

2019-05-24 Thread Pablo Estrada
I've found a bunch of topics created by PubSub integration tests - they dont seem to be getting cleaned up, perhaps? 614 name: projects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSelectsPayloadContent- 614 name:

Re: PubSubIT tests topic cleanup?

2019-05-24 Thread Andrew Pilloud
This came up on the list in before in February: https://lists.apache.org/thread.html/38384d193e6f0af89f00a583e56cff93b18cfaebbf84e743eb900bc5@%3Cdev.beam.apache.org%3E We should be cleaning up topics, but it sounds like we aren't. Andrew On Fri, May 24, 2019 at 11:42 AM Pablo Estrada wrote: >

Re: PubSubIT tests topic cleanup?

2019-05-24 Thread Kenneth Knowles
Is there a jira tracking this? Kenn On Fri, May 24, 2019, 11:50 Andrew Pilloud wrote: > This came up on the list in before in February: > > https://lists.apache.org/thread.html/38384d193e6f0af89f00a583e56cff93b18cfaebbf84e743eb900bc5@%3Cdev.beam.apache.org%3E > > We should be cleaning up

Re: PubSubIT tests topic cleanup?

2019-05-24 Thread Pablo Estrada
Seems like Mikhail created https://issues.apache.org/jira/browse/BEAM-6610 last time ^^' On Fri, May 24, 2019 at 11:58 AM Kenneth Knowles wrote: > Is there a jira tracking this? > > Kenn > > On Fri, May 24, 2019, 11:50 Andrew Pilloud wrote: > >> This came up on the list in before in February:

Re: PubSubIT tests topic cleanup?

2019-05-24 Thread Mikhail Gryzykhin
Last time it was decided to manually cleanup topics and postpone fix. My estimate was that we need to cleanup topics about every two months. I think, we should cleanup topics manually to mitigate issue and prioritize proper fix. On Fri, May 24, 2019, 12:00 Pablo Estrada wrote: > Seems like

Re: PubSubIT tests topic cleanup?

2019-05-24 Thread Andrew Pilloud
I believe it is https://issues.apache.org/jira/browse/BEAM-6610 On Fri, May 24, 2019 at 11:58 AM Kenneth Knowles wrote: > Is there a jira tracking this? > > Kenn > > On Fri, May 24, 2019, 11:50 Andrew Pilloud wrote: > >> This came up on the list in before in February: >> >>

Re: Shuffling on apache beam

2019-05-24 Thread Reza Rokni
Hi, Have you explored the use of triggers with your use case? https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/transforms/windowing/Trigger.html Cheers Reza On Fri, 24 May 2019 at 14:14, pasquale.bon...@gmail.com < pasquale.bon...@gmail.com> wrote: > Hi Reuven, > I would

Re: Hazelcast Jet Runner

2019-05-24 Thread Jozsef Bartok
Hi Ismaël! Quoting Kenn (from PR-8410 ): "We discussed on list that it would be better to have new things always start as experimental in a way that clearly distinguishes them from the core." Rgds On Thu, May 23, 2019 at 10:44 PM Ismaël Mejía wrote: >

Re: Shuffling on apache beam

2019-05-24 Thread pasquale . bonito
Hi Reuven, I would like to know if is possible to guarantee that record are processed by the same thread/task based on a key, as probably happens in a combine/stateful operation, without adding the delay of a windows. This could increase efficiency of caching and reduce same racing condition

Re: Shuffling on apache beam

2019-05-24 Thread Reza Rokni
PS You can also make use of the GlobalWindow with a stateful DoFn. On Fri, 24 May 2019 at 15:13, Reza Rokni wrote: > Hi, > > Have you explored the use of triggers with your use case? > > > https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/transforms/windowing/Trigger.html > >

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Aljoscha Krettek
This is quite interesting! The Flink Table API (relational and SQL) has an implementation for the type of join you mention in the example. We call it Temporal Table Join, and it works on something we call Temporal Tables:

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Jan Lukavský
Hi, absolutely +1 to add this to the model, but does this imply that MapState can be dropped (or backed by this)? It can have different insert or delete time complexity (O(1)) instead of O(logn). Jan -- Původní e-mail -- Od: Aljoscha Krettek Komu: dev@beam.apache.org Datum: 24.

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Robert Bradshaw
On Fri, May 24, 2019 at 5:32 AM Reuven Lax wrote: > > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay wrote: >> >> >> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik wrote: >>> >>> >>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang wrote: > > A few obvious problems with this code: > 1.

Re: Beam Summit Europe: speakers and schedule online!

2019-05-24 Thread Suneel Marthi
Kulturbraurei is on Schönhauser Allee - u have the address wrong on the event page. On Thu, May 23, 2019 at 4:58 PM Joana Filipa Bernardo Carrasqueira < joanafil...@google.com> wrote: > Hi all! > > Looking forward to the conversations about Beam and to meet new people in > the community! > >

Re: Beam Summit Europe: speakers and schedule online!

2019-05-24 Thread Aljoscha Krettek
You’re both right. The Kulturbrauerei Area is between Knaackstraße and Schönhauser Allee and there’s entrances on multiple sides. Schönhauser Alee is the more prominent street, though. Btw, I live on Schönhauser Allee. :-) > On 24. May 2019, at 13:48, Suneel Marthi wrote: > > Kulturbraurei

Re: Hazelcast Jet Runner

2019-05-24 Thread Ismaël Mejía
I see thanks Jozsef, marking things as Experimental was discussed but we never agreed on doing this at the directory level. We can cover the same ground by putting an annotation in the classes (in particular the JetRunner and JetPipelineOptions classes which are the real public interface, or in

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Reuven Lax
Some great comments! *Aljoscha*: absolutely this would have to be implemented by runners to be efficient. We can of course provide a default (inefficient) implementation, but ideally runners would provide better ones. *Jan* Exactly. I think MapState can be dropped or backed by this. E.g.

Re: Quota: In use IP-adresses

2019-05-24 Thread Valentyn Tymofieiev
I did this for a few other resources recently (CPU, Disk). If this keeps being a problem we can lower test parallelism. On Thu, May 23, 2019, 3:48 PM Mikhail Gryzykhin wrote: > Hello everybody, > > Some of our jobs fail with 1/0 in use IP-addresses quota exception. > > Seems that we spin-up too

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Reuven Lax
Can you explain how fetching and deleting ranges of keys would work with this data structure? On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik wrote: > Reuven, for the example, I assume that we never want to store more then 2 > values at a given sort key prefix, and if we do then we will create a

Re: Join the Beam Community Request Email

2019-05-24 Thread Lukasz Cwik
Welcome Zhang, I have added you as a contributor the Apache Beam JIRA. I would suggest you take a look at the contribution guide[1] to learn on how to get started. If I understand correctly, your interested in translating several documents found on the Beam website, if so Melissa would be a good

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Lukasz Cwik
For the API that you proposed, the map key is always "void" and the sort key == user key. So in my example of key: dummy value key.000: token, (0001, value4) key.001: token, (0010, value1), (0011, value2) key.01: token key.1: token, (1011, value3) you would have: "void": dummy value "void".000:

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Lukasz Cwik
In my look for, it should have said "void". instead of "key". when explaining how to do it. On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik wrote: > For the API that you proposed, the map key is always "void" and the sort > key == user key. So in my example of > key: dummy value > key.000: token,

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Rui Wang
> > > *Jan* Exactly. I think MapState can be dropped or backed by this. E.g. > >> >> Regarding to performance, I have a concern to drop MapSate or use SortedMapState to back it. Although SortedMapState provide API to remove a single key, I would imagine its implementation in runners will different

contributor permission for Beam Jira tickets

2019-05-24 Thread Nicolas Delsaux
Hi I've just submitted https://issues.apache.org/jira/browse/BEAM-7414 for which i have a working solution. Could you please add me as contributor (in roder for this issue to be assigned to me) ? my jira user handle is "riduidel" Thanks !

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Reuven Lax
On Fri, May 24, 2019 at 9:36 AM Rui Wang wrote: > >> *Jan* Exactly. I think MapState can be dropped or backed by this. E.g. >> >>> >>> Regarding to performance, I have a concern to drop MapSate or use > SortedMapState to back it. Although SortedMapState provide API to remove a > single key, I

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Lukasz Cwik
Reuven, for the example, I assume that we never want to store more then 2 values at a given sort key prefix, and if we do then we will create a new longer prefix splitting up the values based upon the sort key. Tuple representation in examples below is (key, sort key, value) and . is a character

Re: Quota: In use IP-adresses

2019-05-24 Thread Udi Meiri
We're running up against this limit: "Quota 'IN_USE_ADDRESSES' exceeded. Limit: 750.0 in region us-central1." On Fri, May 24, 2019 at 8:36 AM Valentyn Tymofieiev wrote: > I did this for a few other resources recently (CPU, Disk). If this keeps > being a problem we can lower test parallelism. >

Re: Quota: In use IP-adresses

2019-05-24 Thread Udi Meiri
I opened a support request to increase the quota. On Fri, May 24, 2019 at 9:59 AM Udi Meiri wrote: > We're running up against this limit: "Quota 'IN_USE_ADDRESSES' exceeded. > Limit: 750.0 in region us-central1." > > On Fri, May 24, 2019 at 8:36 AM Valentyn Tymofieiev > wrote: > >> I did this

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Kenneth Knowles
On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote: > Some great comments! > > *Aljoscha*: absolutely this would have to be implemented by runners to be > efficient. We can of course provide a default (inefficient) implementation, > but ideally runners would provide better ones. > > *Jan* Exactly.

Re: DISCUSS: Sorted MapState API

2019-05-24 Thread Kenneth Knowles
On Fri, May 24, 2019 at 9:51 AM Kenneth Knowles wrote: > > > On Fri, May 24, 2019 at 8:14 AM Reuven Lax wrote: > >> Some great comments! >> >> *Aljoscha*: absolutely this would have to be implemented by runners to >> be efficient. We can of course provide a default (inefficient) >>