Re: [ANNOUNCEMENT] New Beam chair: Kenneth Knowles

2018-09-19 Thread Amit Sela
Well deserved! Congrats Kenn. On Wed, Sep 19, 2018 at 4:25 PM Kai Jiang wrote: > Congrats, Kenn! > ᐧ > > On Wed, Sep 19, 2018 at 1:23 PM Alan Myrvold wrote: > >> Congrats, Kenn. >> >> On Wed, Sep 19, 2018 at 1:08 PM Maximilian Michels >> wrote: >> >>> Congrats! >>> >>> On 19.09.18 22:07,

Re: A personal update

2017-12-13 Thread Amit Sela
Congrats and welcome back! can't wait.. On Wed, Dec 13, 2017 at 1:30 PM Mingmin Xu wrote: > Welcome back and best wishes for your new phase! > > On Wed, Dec 13, 2017 at 10:05 AM, Raghu Angadi wrote: > >> Great to have you back Davor! New venture sounds

Re: [VOTE] Choose the "new" Spark runner

2017-11-19 Thread Amit Sela
[X] Use Spark 2 Only Branch On Sun, Nov 19, 2017, 02:46 Reuven Lax wrote: > [ ] Use Spark 1 & Spark 2 Support Branch > [X] Use Spark 2 Only Branch > > On Sat, Nov 18, 2017 at 1:54 AM, Ben Sidhom > wrote: > > > [ ] Use Spark 1 & Spark 2

Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

2017-11-09 Thread Amit Sela
+1 for dropping Spark 1 support. I don't think we have enough users to justify supporting both, and its been a long time since this idea originally came-up (when Spark2 wasn't stable) and now Spark 2 is standard in all Hadoop distros. As for switching to the Dataframe API, as long as Spark 2

Re: [ANNOUNCEMENT] New PMC members, August 2017 edition!

2017-08-11 Thread Amit Sela
Congratulations, well deserved! On Fri, Aug 11, 2017 at 1:53 PM Jesse Anderson wrote: > Welcome! > > On Fri, Aug 11, 2017, 10:43 AM Ted Yu wrote: > > > Congratulations to Ahmet and Aviem. > > > > On Fri, Aug 11, 2017 at 10:40 AM, Davor Bonaci

Low availability on my end in the coming 3 weeks

2017-04-12 Thread Amit Sela
Hi everyone, I will be traveling and moving in the next ~3 weeks so I will be less available than usual. I believe our dev community is more mature now so my absence won't be noticed :-) but I still asked JB to formally take my place with anything concerning the Spark runner and he kindly agreed.

Re: Travis-CI Build failures

2017-04-05 Thread Amit Sela
Unfortunately Travis is not stable enough right now. I don't think this means that there is an issue with your work on the PR, you should notice Jenkins after PRing to see if all tests pass and the reviewing committer will followup with you further if necessary. Thanks! On Wed, Apr 5, 2017 at

Re: Call for help: let's add Splittable DoFn to Spark, Flink and Apex runners

2017-03-30 Thread Amit Sela
ts; but at some point will need performance tests > > >>>>> too. > > >>>>> > > >>>>> Next steps: > > >>>>> - Amit will look at the standard SplittableParDo expansion and > > > >>>>> imp

Re: Beam spark 2.x runner status

2017-03-29 Thread Amit Sela
// spark 2.x > + case iterator: Iterator[R] => iterator > +} > ) > > FYI > > On Wed, Mar 29, 2017 at 1:47 AM, Amit Sela <amitsel...@gmail.com> wrote: > > > Just tried to replace dependencies and see what happens: > > > > Most req

Re: Beam spark 2.x runner status

2017-03-29 Thread Amit Sela
ame code in 2.X without any need for a > branch? > > 2017-03-23 9:47 GMT+02:00 Amit Sela <amitsel...@gmail.com>: > > > If StreamingContext is valid and we don't have to use SparkSession, and > > Accumulators are valid as well and we don't need AccumulatorsV2, I don't >

Re: Beam spark 2.x runner status

2017-03-23 Thread Amit Sela
es ? > > > > Ismaël > > > > > > > > On Wed, Mar 22, 2017 at 5:53 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> hbase-spark module doesn't use SparkSession. So situation there is > simpler > >> :-) > >> > >> On Wed, Mar 2

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-22 Thread Amit Sela
s tolerated, compared to the performance > cost introduced by 'state'/'checkpoint'. > > On Tue, Mar 21, 2017 at 1:36 PM, Amit Sela <amitsel...@gmail.com> wrote: > > > On Tue, Mar 21, 2017 at 7:26 PM Mingmin Xu <mingm...@gmail.com> wrote: > > > > > Move discuss

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-21 Thread Amit Sela
On Tue, Mar 21, 2017 at 7:26 PM Mingmin Xu wrote: > Move discuss to dev-list > > Savepoint in Flink, also checkpoint in Spark, should be good enough to > handle this case. > > When people don't enable these features, for example only need at-most-once > The Spark runner

Fwd: Build failed in Jenkins: beam_PostCommit_Java_RunnableOnService_Spark #1282

2017-03-19 Thread Amit Sela
I think its a Jenkins issue. Jenkins is shutting down. I'll follow and relate if this keeps happening. -- Forwarded message - From: Apache Jenkins Server Date: Sun, Mar 19, 2017 at 8:21 PM Subject: Build failed in Jenkins:

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela
adapt those, and anyone who wants to could see how I did it on my branch: https://github.com/amitsela/beam/commit/8a1cf889d14d2b47e9e35bae742d78a290cbbdc9 > > Thanks, > > Abbass, > > > On 15/03/2017 17:57, Amit Sela wrote: > > So you're suggesting we copy-paste the current r

Re: Call for help: let's add Splittable DoFn to Spark, Flink and Apex runners

2017-03-15 Thread Amit Sela
ide minute notes on the mailing list. > > > > Unfortunately, next Friday, I will still be in China, so not possible to > > join > > (even if I would have like to participate :(). > > > > Regards > > JB > > > > On 03/15/2017 07:45 PM, Amit Sela wrote:

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela
ased on the Structured Streaming API but the question is > how much time this will take to be in shape and the impact on final > users who are already requesting this. This is the reason why I think > the more conservative approach (keeping around the RDD translator) and > moving incrementally

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela
sing it (like > this we don’t have to keep backporting stuff). Do you see any other > particular issue? > > Ismaël > > On Wed, Mar 15, 2017 at 3:39 PM, Amit Sela <amitsel...@gmail.com> wrote: > > So you propose to have the Spark 2 branch a clone of the current one with &g

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela
eintroduce the Spark 2 branch, what about > > "extending" the version in the current Spark runner ? Still using > > RDD/DStream, I think we can support Spark 2.x even if we don't yet > leverage > > the new provided features. > > > > Thoughts ? > > >

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela
Hi Cody, I will re-introduce this branch soon as part of the work on BEAM-913 . For now, and from previous experience with the mentioned branch, batch implementation should be straight-forward. Only issue is with streaming support - in the current

Re: Beam deploy uploads some artifacts twice

2017-03-14 Thread Amit Sela
Moving to: BEAM-1717 <https://issues.apache.org/jira/browse/BEAM-1717> to be handled as a ticket now. On Tue, Mar 14, 2017 at 4:11 PM Amit Sela <amitsel...@gmail.com> wrote: > It seems like there is more than one duplicate issue here. > > First one is a double javadoc,

Re: Beam deploy uploads some artifacts twice

2017-03-14 Thread Amit Sela
uced it. Also, "mvn help:effective-pom" often helps. > > On Sun, Mar 12, 2017 at 7:57 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > Hi Amit, > > > > I just arrived in Hong Kong. As I have all setup on my machine to > > reproduce,

Re: Call for help: let's add Splittable DoFn to Spark, Flink and Apex runners

2017-03-13 Thread Amit Sela
to work: > > https://github.com/apache/beam/pull/2235 > > > > There are still two failing tests, as described in the PR. > > > > On Mon, Mar 13, 2017, at 20:10, Amit Sela wrote: > > > +1 for a video call. I think it should be pretty straight fo

Re: Call for help: let's add Splittable DoFn to Spark, Flink and Apex runners

2017-03-13 Thread Amit Sela
; > > >@NewTracker > > public OffsetRangeTracker newTracker(OffsetRange range) { return new > >OffsetRangeTracker(range); } > > } > > > >== What I'm asking == > >So, I'd like to ask for help integrating SDF into Spark, Flink and Apex > >

Beam deploy uploads some artifacts twice

2017-03-12 Thread Amit Sela
I've been trying to release an internal fork and found out that trying to release with maven release plugin uploads twice some artifacts - for me it was "beam-sdks-java-core", for JB (who helped me investigate this) it was a different artifact. At first I blamed release plugin, but even a simple

Re: [VOTE] Release 0.6.0, release candidate #2

2017-03-11 Thread Amit Sela
Building the RC2 tag failed for me with: "mvn clean install -Prelease" on a missing artifact "beam-sdks-java-harness" when trying to build "beam-sdks-java-javadoc". I want to make sure It's not something local that happens in my env. so if anyone else could validate this it would be great. Amit

Re: [PROPOSAL] Add 2.0.0 version in Jira

2017-03-09 Thread Amit Sela
Well, for now at least. We have to use something for fixed issues.. On Thu, Mar 9, 2017 at 6:32 PM Jean-Baptiste Onofré wrote: > By the way, waiting the end of this discussion, we can use "First stable > release" as version in Jira. > > Regards > JB > > On 03/09/2017 07:21

Re: First stable release: version designation?

2017-03-08 Thread Amit Sela
If we were to go with a 2.0 release, we would have to be very clear on maturity of different modules; for example python SDK is not as mature as Java SDK, some runners support streaming better than others, some run on YARN better than others, etc. My only reservation here is that the Apache

Re: Apache Beam (virtual) contributor meeting @ Tue Mar 7, 2017

2017-03-06 Thread Amit Sela
...@apache.org> wrote: > > > I'd prefer not to record the video; just to keep things informal. We'll, > > however, keep the notes and share anything that may be relevant. > > > > On Thu, Mar 2, 2017 at 2:24 PM, Amit Sela <amitsel...@gmail.com> wrote: > > > &

Re: First stable release: version designation?

2017-03-01 Thread Amit Sela
I think 1.0.0 for a couple of reasons: * It makes sense coming after 0.X (+1 Jesse). * It is the FIRST stable release as a project, regardless of its roots. * while the SDK is definitely a 2.0.0, Beam is not made only of the SDK, and I hope we'll have more milage with users running all sorts of

Re: tf.Transform library for using TensorFlow with Beam

2017-02-24 Thread Amit Sela
That's great! many people have asked me about that and I'm glad to see this happening. Anyone know if there's something at work for the Java SDK (assuming I don't want to wait for Fn API support) ? On Fri, Feb 24, 2017 at 8:44 AM Jean-Baptiste Onofré wrote: > Fantastic ! > >

Re: Interest in a (virtual) contributor meeting?

2017-02-21 Thread Amit Sela
+1 On Wed, Feb 22, 2017 at 9:09 AM Stas Levin wrote: > +1 > > On Wed, Feb 22, 2017, 08:25 Jean-Baptiste Onofré wrote: > > > +1 > > > > Regards > > JB > > > > On 02/22/2017 04:18 AM, Davor Bonaci wrote: > > > In the early days of the project, we have

Re: Setting synchronized processing time triggers

2017-02-20 Thread Amit Sela
> be good for every group by key to reintroduce the same delay. Instead it > >> uses synchronized processing time to wait for all upstream firings of > the > >> same target time to fire. This ensures that all of the aligned triggers > >> have been processed but

Re: Metrics for Beam IOs.

2017-02-18 Thread Amit Sela
what I named a "scheduled collector". So, yes, the > adapter will periodically harvest metric to push. > > Regards > JB > > On 02/18/2017 05:30 PM, Amit Sela wrote: > > First issue with "push" metrics plugin - what if the runner's underlying >

Re: Metrics for Beam IOs.

2017-02-18 Thread Amit Sela
execution > > engine. That's the current approach and it works fine. And it's up to me > > to leverage (for intance Accumulators) it with my own system. > > > > My thought is more to provide a generic way. It's only a discussion for > > now ;) > > > > Regards >

Re: Metrics for Beam IOs.

2017-02-17 Thread Amit Sela
. > > WDYT ? > > Regards > JB > > On 02/15/2017 09:22 AM, Stas Levin wrote: > > +1 to making the IO metrics (e.g. producers, consumers) available as part > > of the Beam pipeline metrics tree for debugging and visibility. > > > > As it has already been mentio

Re: Pipeline Surgery and an interception-free future

2017-02-16 Thread Amit Sela
Awesome! First thing I'm gonna do: 1. traverse the pipeline to determine if streaming. 2. If streaming, replace Read.Bounded with an adapted Read.Unbounded. Current implementation forces translating bounded reads by the unbounded translator and it feels awkward, this makes it right again.

Re: We've hit 2000 PRs!

2017-02-16 Thread Amit Sela
It's not just 1000 more PRs, it's also new contributors. We're growing and that's awesome! Congrats to everyone. On Thu, Feb 16, 2017 at 9:43 PM Thomas Groh wrote: > Impressive work everyone. Very cool. > > On Thu, Feb 16, 2017 at 8:05 AM, Dan Halperin

Re: Stream and Batch Use Case

2017-02-15 Thread Amit Sela
transforms programming guide is still under construction, should be available here once ready : https://beam.apache.org/documentation/programming-guide/#transforms-composite On Wed, Feb 15, 2017 at 10:28 AM Amit Sela <amitsel...@gmail.com> wrote: > You can write one pipeline and si

Re: Stream and Batch Use Case

2017-02-15 Thread Amit Sela
You can write one pipeline and simply replace the IO, for example: To read from (text) files you can use: *PCollection lines = p.apply(TextIO.Read.from("file://some/inputData.txt"));* and from Kafka (I'm adding a generic key here because Kafka messages are keyed): *PCollection>

Re: Metrics for Beam IOs.

2017-02-14 Thread Amit Sela
I think this is a great discussion and I'd like to relate to some of the points raised here, and raise some of my own. First of all I think we should be careful here not to cross boundaries. IOs naturally have many metrics, and Beam should avoid "taking over" those. IO metrics should focus on

Testing streaming pipeline on Dataflow

2017-02-03 Thread Amit Sela
Hi Davor, My team wants to try and test streaming pipelines on Dataflow (Beam pipelines of course) and I was wondering how this works in terms of UnboundedSources - Kafka/Pubsub ? We currently use Kafka, and I was wondering if I could "record" a chunk of (public) data we use and import it ? can I

Re: How to implement Timer in runner

2017-01-27 Thread Amit Sela
I think the biggest challenge here in terms of using external store is to be able to (periodically) scan through keys even if their corresponding values are not updated and check if they should be fired/GCed. The implementation I'm working on in Spark runner does that by using Spark's

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-27 Thread Amit Sela
Welcome and congratulations to all! On Fri, Jan 27, 2017, 10:12 Ahmet Altay wrote: > Thank you all! And congratulations to other new committers. > > Ahmet > > On Thu, Jan 26, 2017 at 9:45 PM, Kobi Salant > wrote: > > > Congrats! Well deserved

Re: Conceptually, what are bundles?

2017-01-25 Thread Amit Sela
On Wed, Jan 25, 2017 at 8:23 PM Thomas Groh wrote: > I have a couple of points in addition to what Robert said > > Runners are permitted to determine bundle sizes as appropriate to their > implementation, so long as bundles are atomically committed. The contents > of a

Re: Committed vs. attempted metrics results

2017-01-19 Thread Amit Sela
the user know that ? I'm afraid this could be confusing as a user-facing query API, and I think most users would simply name metrics differently. > > On Thu, Jan 19, 2017 at 1:57 PM Amit Sela <amitsel...@gmail.com> wrote: > > > I think Luke's example is interesting, but I wonder

Re: Committed vs. attempted metrics results

2017-01-19 Thread Amit Sela
I think Luke's example is interesting, but I wonder how common it is/would be ? I'd expect failures to happen but not in a rate that would be so dramatic that it'd be interesting to follow applicatively (I'd expect the runner/cluster to properly monitor up time of processes/nodes separately). And

Re: Graduation!

2017-01-10 Thread Amit Sela
Congratulations and thanks to everyone involved! On Tue, Jan 10, 2017, 18:16 Aljoscha Krettek wrote: > Sweet! Congratulations to everyone. :-) > > On Tue, 10 Jan 2017 at 17:01 Mukul Jain wrote: > > > Congrats to everyone on this good news !! > > > > Sent