Re: [component] tag in JIRA tickets

2016-12-16 Thread Amit Sela
t; > > > On Thu, Dec 15, 2016 at 6:15 AM, Jean-Baptiste Onofré <j...@nanthrax.net> > > wrote: > > > >> Yes, agree. We had kind of similar discussion while ago: > >> "java-sdk-extension" vs "io" afair ;) > >> > >

Re: [component] tag in JIRA tickets

2016-12-15 Thread Amit Sela
component is also generic (for instance, > java-sdk-extension is for both extensions and IOs). > > If possible, I would more work on the component (and customize the > Release Notes output to include it). > > Regards > JB > > On 12/15/2016 02:30 PM, Amit Sela wrote: >

[component] tag in JIRA tickets

2016-12-15 Thread Amit Sela
I took a look at the release notes for 0.4.0-incubating now and I felt like it could have been "tagged" in a way that helps people focus on what's interesting to them Currently, all resolved issues simply appear as they are in JIRA, but we don't have any way to tag them. What if we were to prefix

Re: [VOTE] Release 0.4.0-incubating, release candidate #1

2016-12-15 Thread Amit Sela
I see three problems in the release notes (related to Spark runner): Improvement: [BEAM-757] - The SparkRunner should utilize the SDK's DoFnRunner instead of writing it's own. [BEAM-807] - [SparkRunner] Replace OldDoFn with DoFn [BEAM-855] - Remove the need for --streaming option

Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Spark #409

2016-12-14 Thread Amit Sela
This relates to my question in Dev list. On Wed, Dec 14, 2016 at 9:07 PM Kenneth Knowles wrote: > This is still https://issues.apache.org/jira/browse/BEAM-1149. We recently > added a test for it. The actual behavior has been broken for everyone for a > while. It is

New testSideInputsWithMultipleWindows and should DoFnRunner explode if DoFn contains a side input ?

2016-12-14 Thread Amit Sela
Hi all, Yesterday a new test was added to ParDoTest suite: "testSideInputsWithMultipleWindows". To the best of my understanding, it's meant to test sideInputs for elements in multiple windows (unexploded). The Spark runner uses the DoFnRunner (Simple) to process DoFns, and it will explode

Re: Beam Tuple

2016-12-13 Thread Amit Sela
e than in the data format extension). > > Regards > JB > > On 12/13/2016 11:06 AM, Amit Sela wrote: > > Hi all, > > > > I was wondering why Beam doesn't have tuples as part of the SDK ? > > To the best of my knowledge all currently supported (OSS) runners:

Beam Tuple

2016-12-13 Thread Amit Sela
Hi all, I was wondering why Beam doesn't have tuples as part of the SDK ? To the best of my knowledge all currently supported (OSS) runners: Spark, Flink, Apex provide a Tuple abstraction and I was wondering if Beam should too ? Consider KV for example; it is a special ("*keyed*" by the first

Re: PCollection to PCollection Conversion

2016-12-13 Thread Amit Sela
, 2016 at 2:59 AM Kenneth Knowles > >> <k...@google.com.invalid > >>>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> On this point from Amit and Ismaël, I agree: we could benefit > from a

Re: New DoFn and WindowedValue/WinowingInternals

2016-12-11 Thread Amit Sela
it's usage in direct runner - this is exactly what > you're looking for, a new DoFn that with per-runner support is able to emit > multi-windowed values. > On Sun, Dec 11, 2016 at 4:28 AM Amit Sela <amitsel...@gmail.com> wrote: > > > Hi all, > > > > I've been w

Re: Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #1776

2016-12-09 Thread Amit Sela
I see: 502. That’s an error. The server encountered a temporary error and could not complete your request.Please try again in 30 seconds. That’s all we know. So I assume this is temporary... anyone with better Dataflow insights could probably provide more input. On Fri, Dec 9, 2016 at 7:32

Re: Performance Benchmarking Beam

2016-12-09 Thread Amit Sela
This is great Jason! Let me know if / how I can assist with Spark, or generally. Thanks, Amit On Thu, Dec 8, 2016 at 9:01 PM Jason Kuster wrote: > Hey all, > > So as I mentioned on Stephen's IO Testing thread a few days ago I've been > doing a bunch of

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-08 Thread Amit Sela
+1 On Thu, Dec 8, 2016 at 1:27 PM Manu Zhang wrote: > +1 > > Manu > > On Thu, Dec 8, 2016 at 2:40 PM Tyler Akidau > wrote: > > > +1 > > > > On Thu, Dec 8, 2016 at 1:10 PM Jean-Baptiste Onofré > > wrote: > > > > > +1 > > >

Re: Increase stream parallelism after reading from UnboundedSource

2016-12-06 Thread Amit Sela
event-at-a-time" processors and everything in-between - such as bundles that may be of size 1, but might contain more elements. On Tue, Dec 6, 2016 at 10:18 PM Raghu Angadi <rang...@google.com.invalid> wrote: > On Sun, Dec 4, 2016 at 11:48 PM, Amit Sela <amitsel...@gmail.com&

Increase stream parallelism after reading from UnboundedSource

2016-12-04 Thread Amit Sela
Hi all, I have a general question about how stream-processing frameworks/engines usually behave in the following scenario: Say I have a Pipeline that consumes from 1 Kafka partition, so that my initial (optimal) parallelism is 1 as well. For any downstream computation, is it common for stream

Re: Jenkins build became unstable: beam_PostCommit_MavenVerify #1906

2016-11-26 Thread Amit Sela
Following build #1907 succeeded. Probably just a flake. I'll followup. On Sat, Nov 26, 2016, 13:46 Amit Sela <amitsel...@gmail.com> wrote: > Seems to fail on DataflowRunner "WordCountIT.testE2EWordCount". > *Error*: > *Expected: Expected checksum is (508517575eba8d8

Fwd: Jenkins build became unstable: beam_PostCommit_MavenVerify #1906

2016-11-26 Thread Amit Sela
Seems to fail on DataflowRunner "WordCountIT.testE2EWordCount". *Error*: *Expected: Expected checksum is (508517575eba8d8d5a54f7f0080a00951cfe84ca)* * but: was (cfdcdcec05fc8424abc168bf5b0c0ed66e376547)* Anyone with access (and knowledge) to the Dataflow runner could take a look ? Thanks!

Re: [DISCUSS] Graduation to a top-level project

2016-11-22 Thread Amit Sela
+1, super exciting! Thanks to JB, Davor and the whole team for creating this community. I think we've achieved a lot in a short time. Amit. On Tue, Nov 22, 2016, 20:36 Tyler Akidau wrote: > +1, thanks to everyone who's invested time getting us to this point. :-) >

Re: Introduction + contributing to docs

2016-11-11 Thread Amit Sela
Welcome Melissa! On Fri, Nov 11, 2016, 22:31 Jean-Baptiste Onofré wrote: > Hi Melissa, > > welcome aboard !! > > Regards > JB > > On 11/11/2016 08:11 PM, Melissa Pashniak wrote: > > Hello! > > > > > > My name is Melissa. I’ve previously been involved with Dataflow > >

Re: [PROPOSAL] Change to KafkaIO splits

2016-11-11 Thread Amit Sela
+1 I think this makes more sense then the existing form of a split that is made of several Kafka partitions since, as mentioned, Kafka partitions are in fact it's parallelism. As for supporting a change in the number of partitions (mainly, added partitions), I'll suggest something I brought up

Re: SBT/ivy dependency issues

2016-11-10 Thread Amit Sela
gt; > > m: > classifier="tests"/> > ext= > "*" conf="" matcher="exact"/> > > "0.3.0-incubating" force="true" conf="test->runtime(*),master(compile)"> > conf="" ma

Re: SBT/ivy dependency issues

2016-11-10 Thread Amit Sela
@Abbass/Manu does SBT state that it's looking for *beam-sdks-java-core *& *beam-runners-core-java* but fails to find them ? On Thu, Nov 10, 2016 at 3:43 AM Manu Zhang wrote: > Hi all, > > I tried and reproduced the issue. "sbt-dependency-graph" doesn't show >

Re: [DISCUSS] Change "RunnableOnService" To A More Intuitive Name

2016-11-10 Thread Amit Sela
How about @ValidatesRunner ? Seems to complement @NeedsRunner as well. On Thu, Nov 10, 2016 at 9:47 AM Aljoscha Krettek wrote: > +1 > > What I would really like to see is automatic derivation of the capability > matrix from an extended Runner Test Suite. (As outlined in

Re: PCollection to PCollection Conversion

2016-11-09 Thread Amit Sela
I think Jesse has a very good point on one hand, while Luke's and Kenneth's worries about committing users to specific implementations is in place. The Spark community has a 3rd party repository for useful libraries that for various reasons are not a part of the Apache Spark project:

Re: [PROPOSAL] Merge apex-runner to master branch

2016-11-08 Thread Amit Sela
+1 awesome. Congrats Thomas! On Tue, Nov 8, 2016 at 3:54 PM Thomas Weise wrote: > Hi, > > As per previous discussion [1], I would like to propose to merge the > apex-runner branch into master. The runner satisfies the criteria outlined > in [2] and merging it to master will

Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-03 Thread Amit Sela
global window and triggers "never" this actually means that its > internal GBK emits all of its output exactly once, when the window is > expiring. So producing more than one output - even if spread across > microbatches - is actually the trouble. > > On Wed, Nov 2, 201

Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Amit Sela
wrote: > Agree, this element should be removed. > > Regards > JB > > On 11/02/2016 10:53 AM, Amit Sela wrote: > > Hi all, > > > > I've been looking at PAssert and I've notice that PAssert.GroupedGlobally > > points > > < > https://github.com/apache

Re: GitHub mirroring issue

2016-10-26 Thread Amit Sela
Thanks! On Wed, Oct 26, 2016, 23:32 Suneel Marthi <smar...@apache.org> wrote: > We have been seeing Github mirroring issues today on other projects too, > filed an Infra jira - INFRA-12830 > > On Wed, Oct 26, 2016 at 4:21 PM, Amit Sela <amitsel...@gmail.com> wrote: &g

GitHub mirroring issue

2016-10-26 Thread Amit Sela
Hi all, I've merged a PR ~2 hours ago and while the apache remote seems up-to-date, github didn't nor did the PR or JIRA. The last commit hash is: 6db9424 (9f30b21 merge commit). Hopefully this will update after the next commit but FYI I guess. Thanks, Amit

Re: [DISCUSS] Merging master -> feature branch

2016-10-26 Thread Amit Sela
I generally agree with Kenneth. While working on the SparkRunnerV2 branch, it was a pain - i avoided frequent merges to avoid trivial PRs, but it cost me with very large and non-trivial merges later. I think that frequent merges for feature-branches should most of the time be trivial (no

Re: build failed with dependency problems

2016-10-26 Thread Amit Sela
I just fetched and pulled latest master and build succeeded, maybe try again ? On Wed, Oct 26, 2016 at 9:19 AM Manu Zhang wrote: > Hi All, > > I tried to build latest master but failed with the following dependency > problems. > > [INFO] ---

Re: The Availability of PipelineOptions

2016-10-25 Thread Amit Sela
+1 On Tue, Oct 25, 2016 at 8:43 PM Robert Bradshaw wrote: > +1 > > On Tue, Oct 25, 2016 at 7:26 AM, Thomas Weise wrote: > > +1 > > > > > > On Tue, Oct 25, 2016 at 3:03 AM, Jean-Baptiste Onofré > > wrote: > > > >> +1 > >> > >>

Re: [DISCUSS] Current ongoing work on runners

2016-10-25 Thread Amit Sela
SparkRunner status: V1 (Spark 1.6.x - DStream/RDD API): *Batch*: Full model support for batch, continuous ROS testing setup is in process now so that CI will validate constantly. *Streaming*: Supporting UnboundedSource is in review , starting to

Re: [DISCUSS] Deferring (pre) combine for merging windows.

2016-10-22 Thread Amit Sela
nt map (like to GlobalWindow)) until the main input window is known. On Fri, Oct 21, 2016 at 3:50 PM, Amit Sela <amitsel...@gmail.com> wrote: > Please excuse my typos and apply "s/differ/defer/g" ;-). > Amit. > > On Fri, Oct 21, 2016 at 2:59 PM Amit Sela <amitsel.

Re: [DISCUSS] Deferring (pre) combine for merging windows.

2016-10-21 Thread Amit Sela
Please excuse my typos and apply "s/differ/defer/g" ;-). Amit. On Fri, Oct 21, 2016 at 2:59 PM Amit Sela <amitsel...@gmail.com> wrote: > I'd like to raise an issue that was discussed in BEAM-696 > <https://issues.apache.org/jira/browse/BEAM-696>. > I wo

[DISCUSS] Deferring (pre) combine for merging windows.

2016-10-21 Thread Amit Sela
I'd like to raise an issue that was discussed in BEAM-696 . I won't recap here because it would be extensive (and probably exhaustive), and I'd also like to restart the discussion here rather then summarize it. *The problem* In the case of (main)

Re: Start of release 0.3.0-incubating

2016-10-20 Thread Amit Sela
avor > > [1] > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12338051 > > On Thu, Oct 20, 2016 at 9:32 AM, Amit Sela <amitsel...@gmail.com> wrote: > > > +1 > > > > I would like to have my standing PRs merged please - they s

Re: Start of release 0.3.0-incubating

2016-10-20 Thread Amit Sela
+1 I would like to have my standing PRs merged please - they should provide support for UnboundedSource for the SparkRunner. If it won't be ready for merge at the beginning of next week, don't hold for me. Thanks, Amit On Thu, Oct 20, 2016 at 7:27 PM Jean-Baptiste Onofré

Re: [DISCUSS] Sources and Runners

2016-10-19 Thread Amit Sela
and it comes as part of one of > kafka libs. > > On Wed, Oct 19, 2016 at 10:49 PM, Amit Sela <amitsel...@gmail.com> wrote: > > > The SparkRunner actually has an embedded Kafka for its unit tests. > > > > On Wed, Oct 19, 2016, 20:16 Thomas Weise <t...@apache.org&

Re: [DISCUSS] Sources and Runners

2016-10-19 Thread Amit Sela
rs of us have been thinking hard about the best > ways > > to build these tests and necessary test infrastructure. (See the > > performance thread Jason started. IMO the most important issue to solve > > first is infrastructure). Please help! > > > > Dan > > &g

Re: Introduction

2016-10-17 Thread Amit Sela
Done. Feel free to take a pick at the Spark runner since you have Spark experience and that's great! Most open issues are usually automatically assigned to me, but ping me (dev list/Slack) if you want to work on something and not sure what's the status there. Thanks, Amit On Mon, Oct 17, 2016

Re: [KUDOS] Contributed runner: Apache Apex!

2016-10-17 Thread Amit Sela
Congrats and thanks to everyone who was involved in this effort! On Mon, Oct 17, 2016 at 8:07 PM Neelesh Salian wrote: > Awesome. Great work. > > On Mon, Oct 17, 2016 at 10:03 AM, Aljoscha Krettek > wrote: > > > Congrats! :-) > > > > On Mon, 17 Oct

Re: Simplifying User-Defined Metrics in Beam

2016-10-13 Thread Amit Sela
On Thu, Oct 13, 2016 at 12:27 PM Aljoscha Krettek wrote: > I finally found the time to have a look. :-) > > The API looks very good! (It's very similar to an API we recently added to > Flink, which is inspired by the same Codahale/Dropwizard metrics). > > About the

Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

2016-10-13 Thread Amit Sela
s opened, I am wondering could any people help > > on them? > > > > https://issues.apache.org/jira/browse/BEAM-596 > > https://issues.apache.org/jira/browse/BEAM-595 > > https://issues.apache.org/jira/browse/BEAM-593 > > > > Thanks > > -- > > Pei

Re: Introducing a Redistribute transform

2016-10-10 Thread Amit Sela
On Mon, Oct 10, 2016 at 9:21 PM Robert Bradshaw <rober...@google.com.invalid> wrote: > On Sat, Oct 8, 2016 at 7:31 AM, Amit Sela <amitsel...@gmail.com> wrote: > > > Hi Eugene, > > > > > > This is very interesting. > > > Let me see if I get this

Re: [DISCUSS] UnboundedSource and the KafkaIO.

2016-10-10 Thread Amit Sela
this would work on https://s.apache.org/splittable-do-fn? > > On Mon, Oct 10, 2016 at 6:35 AM, Amit Sela <amitsel...@gmail.com> wrote: > > Thanks Max! > > > > I'll try to explain Spark's stateful operators and how/why I used them > with > > UnboundedSo

Re: [DISCUSS] UnboundedSource and the KafkaIO.

2016-10-10 Thread Amit Sela
t; thanks for the explanation. > > > > > > For 4, you are right, it's slightly different from DataXchange (related > to > > > the elements in the PCollection). I think storing the "starting point" > for a > > > reader makes sense. > > > > > > Regard

Re: [DISCUSS] UnboundedSource and the KafkaIO.

2016-10-10 Thread Amit Sela
a reader, even if no records were read (yet), so that the next time the reader attempts to read it will pick of there. This has more to do with how the CheckpointMark handles this. I have to say that I'm not familiar with your DataXchange proposal, I will take a look though. > > > >

Re: [DISCUSS] UnboundedSource and the KafkaIO.

2016-10-08 Thread Amit Sela
Some answers inline. @Raghu I'll review the PR tomorrow. Thanks, Amit On Sat, Oct 8, 2016 at 3:47 AM Raghu Angadi <rang...@google.com.invalid> wrote: > On Fri, Oct 7, 2016 at 4:55 PM, Amit Sela <amitsel...@gmail.com> wrote: > > >3. Support reading of Kafka pa

Re: Should UnboundedSource provide a split identifier ?

2016-10-07 Thread Amit Sela
RN container as well. On Tue, Oct 4, 2016 at 10:39 PM Raghu Angadi <rang...@google.com.invalid> wrote: > On Wed, Sep 14, 2016 at 1:43 PM, Amit Sela <amitsel...@gmail.com> wrote: > > > > > > > For generateInitialSplits, the UnboundedSource API doesn't require

Re: Preferred locations (or data locality) for batch pipelines.

2016-09-26 Thread Amit Sela
Kafka cluster, let's try to be near > it". For example. > > Does that answer the question / did I miss something? > > Thanks, > Dan > > On Thu, Sep 22, 2016 at 8:29 AM, Amit Sela <amitsel...@gmail.com> wrote: > > > Generally this makes sense, tho

Re: Preferred locations (or data locality) for batch pipelines.

2016-09-22 Thread Amit Sela
p 22, 2016 at 5:45 PM Jesse Anderson <je...@smokinghand.com> wrote: > I've only ever seen that being used to figure out which file the > runner/mapper/operation is working on. Otherwise, I haven't seen those > operations care where in the file they're working. > > On Thu, Se

Preferred locations (or data locality) for batch pipelines.

2016-09-22 Thread Amit Sela
It's not new that batch pipeline can optimize on data locality, my question is regarding this responsibility in Beam. If runners should implement a generic Read.Bounded support, should they also implement locating the input blocks ? or should it be a part of IOChannelFactory implementations ? or

Re: IntervalWindow toString()

2016-09-19 Thread Amit Sela
Looks like included, excluded bounds. On Mon, Sep 19, 2016, 18:57 Jesse Anderson wrote: > The toString() to IntervalWindow starts with a square bracket and ends with > a parenthesis. Is this a type of notation or a bug? Code: > > @Override > public String toString() {

Re: FYI: All Runners Tested In Precommit

2016-09-15 Thread Amit Sela
That's great news. Thanks Jason! On Thu, Sep 15, 2016 at 10:16 PM Jason Kuster wrote: > Hi all, > > Just a quick update -- as of yesterday all new PRs now run the WordCount > end-to-end test against every runner in master (Flink, Spark, Dataflow, and > Direct).

Re: Hi

2016-09-15 Thread Amit Sela
Welcome Etienne! On Thu, Sep 15, 2016, 14:34 Jean-Baptiste Onofré wrote: > Hi Etienne, > > welcome aboard (again ;)) ! > > Regards > JB > > On 09/15/2016 10:19 AM, Etienne Chauchot wrote: > > Hi guys, > > > > I'll be working with JB on Beam. I'm just starting, for now I'm

Re: Should UnboundedSource provide a split identifier ?

2016-09-14 Thread Amit Sela
added partitions, no ? > > On Tue, Sep 13, 2016 at 1:49 AM, Amit Sela <amitsel...@gmail.com> wrote: > > > If I understand correctly this will break > > https://github.com/apache/incubator-beam/blob/master/sdks/ > > > java/io/kafka/src/main/java/org/apache/be

Re: Should UnboundedSource provide a split identifier ?

2016-09-13 Thread Amit Sela
If I understand correctly this will break https://github.com/apache/incubator-beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L857 in KafkaIO. So it's a KafkaIO limitation (for now ?) ? On Tue, Sep 13, 2016 at 11:31 AM Amit Sela <amitsel...@gmail.

Re: Should UnboundedSource provide a split identifier ?

2016-09-12 Thread Amit Sela
checkpoint. On Mon, Sep 12, 2016 at 9:15 PM Raghu Angadi <rang...@google.com.invalid> wrote: > On Wed, Sep 7, 2016 at 7:13 AM, Amit Sela <amitsel...@gmail.com> wrote: > > > @Raghu, hashing <topic, partition> is exactly what I mean, but I'm asking > > if it

Re: Should UnboundedSource provide a split identifier ?

2016-09-07 Thread Amit Sela
t; like to know the answer too, in this case. > > +Daniel Mills <mil...@google.com> can you comment ? > > > > On Tue, Sep 6, 2016 at 3:32 PM Amit Sela <amitsel...@gmail.com> wrote: > > > > > That is correct, as long as non of the Kafka topics "grow&

Re: Remove legacy import-order?

2016-08-24 Thread Amit Sela
+1 on import order as well. Kenneth has a good point about history if we reformat. On Wed, Aug 24, 2016, 18:59 Kenneth Knowles wrote: > +1 to import order > > I don't care about actually enforcing formatting, but would add it to IDE > tips and just make it an "OK topic

Re: [PROPOSAL] Splittable DoFn - Replacing the Source API with non-monolithic element processing in DoFn

2016-08-12 Thread Amit Sela
+1 as in I'll join ;-) On Fri, Aug 12, 2016, 19:14 Eugene Kirpichov wrote: > Sounds good, thanks! > Then Friday Aug 19th it is, 8am-9am PST, > https://staging.talkgadget.google.com/hangouts/_/google.com/splittabledofn > > On Thu, Aug 11, 2016 at 11:12 PM

Re: [PROPOSAL] Splittable DoFn - Replacing the Source API with non-monolithic element processing in DoFn

2016-08-08 Thread Amit Sela
e runner would have an efficient override for this transform too; worst > case, we'd have two overloads - one for a fixed filepattern and one for a > PCollection of filepatterns, only one of these efficiently overridden by > the Spark runner. Does this make sense? > > Thanks. > > O

Re: Proposal: Dynamic PIpelineOptions

2016-08-07 Thread Amit Sela
+1 sounds like a good idea. Spark's driver actually takes all dynamic parameters starting with "spark." and propagates them into SparkConf which is propagated onto the Executors and is available via the environment's SparkEnv. I'm wondering, does this mean that PipelineOption will be available

Re: [PROPOSAL] Having 2 Spark runners to support Spark 1 users while advancing towards better streaming implementation with Spark 2

2016-08-05 Thread Amit Sela
w (at > > > least while we support spark 1) ? > > > - We must probably make clear for users the advantages/disadvantages of > > > both versions of the runner, and make clear that the spark 1 runner > will > > be > > > almost on maintenance mode (with no

Re: [DISCUSS] cluster infrastructure - resource manager - for on going tests

2016-07-28 Thread Amit Sela
k it is valuable to test > the > > support for YARN. The question is, should the tests be run on > 'Standalone' > > OR YARN' or maybe we can have tests for 'Standalone AND YARN' ? > > > > Ismael. > > > > > > > > > > On Thu, Jul 28, 2016 at 12:24

Re: [VOTE] Release version 0.2.0-incubating

2016-07-28 Thread Amit Sela
+1 (binding) On Thu, Jul 28, 2016 at 3:57 PM Jean-Baptiste Onofré wrote: > +1 (binding) > > I checked: > - artefact names contain incubating > - signatures and hashes good > - DISCLAIMER exists > - LICENSE file looks good > - NOTICE file looks good > - Source files have ASF

Re: [VOTE] Release version 0.2.0-incubating

2016-07-28 Thread Amit Sela
Built and ran WordCount example (Spark runner) local and on YARN, so I'm not sure if the unresolved issue in the release notes is a blocker, but besides that +1 On Thu, Jul 28, 2016 at 1:32 PM Amit Sela <amitsel...@gmail.com> wrote: > In the release notes: > >- [

Re: [VOTE] Release version 0.2.0-incubating

2016-07-28 Thread Amit Sela
In the release notes: - [BEAM-349 ] - Spark runner should provide a default BoM This is still marked as unresolved (and for a good reason AFAIK..), I guess this should be removed ? On Thu, Jul 28, 2016 at 12:52 PM Dan Halperin

[DISCUSS] cluster infrastructure - resource manager - for on going tests

2016-07-28 Thread Amit Sela
Following a discussion I had with Kenneth and Dan here . I want to raise the issue of which resource manager we should use for on going tests that will run on actual clusters (on top of local/in-mem tests). If we plan to test all runners on all

Re: Podling Report Reminder - August 2016

2016-07-27 Thread Amit Sela
+1 On Wed, Jul 27, 2016, 22:12 Dan Halperin wrote: > +1 on all the above. > > On Wed, Jul 27, 2016 at 12:07 PM, Jean-Baptiste Onofré > wrote: > > > Hi James, > > > > Sure, please go ahead. > > > > I propose to send the draft on the mailing list

Re: [Discuss] Beam SDK (Java) providing a shaded jar as a dependency

2016-07-25 Thread Amit Sela
gt; > really see a problem. > > > > I tend to favor private-by-default, to keep a tight grip on what our API > > surface really is, but I also have no problem just giving up when a > library > > is not "shadeable". I could also be convinced that th

Fwd: Jenkins build is still unstable: beam_PostCommit_RunnableOnService_SparkLocal #12

2016-07-23 Thread Amit Sela
Not sure what's the setup here, but there seems to be issues with the ports for the UI. Generally we don't need it for tests so you could add -Dspark.ui.enabled=false to the executing command. Thanks, Amit -- Forwarded message - From: Apache Jenkins Server

Re: [KUDOS] Contributed runner: Gearpump!

2016-07-21 Thread Amit Sela
Congrats Manu! On Thu, Jul 21, 2016, 06:35 Frances Perry wrote: > Awesome! > > On Wed, Jul 20, 2016 at 6:42 PM, Manu Zhang > wrote: > > > Thanks Kenn and others for the review and help along the way. Feel free > to > > ping me on slack if you

Re: Adding DoFn Setup and Teardown methods

2016-07-20 Thread Amit Sela
+1 I think this will prove itself even more important with Per-Key workflows where DoFn's might communicate constantly with external services - DB, Elasticsearch, etc. On Mon, Jul 18, 2016 at 5:44 PM Maximilian Michels wrote: > Hoping it becomes usual as soon as we have this

Re: [DISCUSS] Spark runner packaging

2016-07-08 Thread Amit Sela
anks Amit, that does clear things up! > > On Thu, Jul 7, 2016 at 3:30 PM, Amit Sela <amitsel...@gmail.com> wrote: > > > I don't think that the Spark runner is special, it's just the way it was > > until now and that's why I brought up the subject here. > > > > T

Re: Beam Interview

2016-07-08 Thread Amit Sela
That's great Jesse! Added my comments. Thanks, Amit On Fri, Jul 8, 2016 at 8:56 PM Shiv Shankar wrote: > Hi, > I am a User and learner. I just added my view points. > > Thanks > SV > > > On Fri, Jul 8, 2016 at 1:51 AM, Sergio Fernández > wrote:

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Amit Sela
tiste Onofré <j...@nanthrax.net> > > wrote: > > > > > No problem and good idea to discuss in the Jira. > > > > > > Actually, I started to experiment a bit beam distributions on a branch > > > (that I can share with people interested). > &

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Amit Sela
/jira/browse/BEAM-320 > > As described in the Jira, I'm planning to provide (in dedicated Maven > modules) is a Beam distribution including: > - an uber jar to wrap the dependencies > - the underlying runtime backends > - etc > > Regards > JB > > On 07/07/2016 07:49 PM, Amit Sela w

[DISCUSS] Spark runner packaging

2016-07-07 Thread Amit Sela
Hi everyone, Lately I've encountered a number of issues concerning the fact that the Spark runner does not package Spark along with it and forcing people to do this on their own. In addition, this seems to get in the way of having beam-examples executed against the Spark runner, again because it

Re: Improvements to issue/version tracking

2016-06-29 Thread Amit Sela
+1 On Wed, Jun 29, 2016 at 12:04 AM Lukasz Cwik wrote: > +1 > > On Tue, Jun 28, 2016 at 12:15 PM, Kenneth Knowles > wrote: > > > +1 > > > > On Tue, Jun 28, 2016 at 12:06 AM, Jean-Baptiste Onofré > > wrote: > > > > > +1 > >

Re: Scala DSL

2016-06-25 Thread Amit Sela
Just looked at some Scio examples - and saw Spark Scala code ;-) For me, this made some sense - Spark is written in Scala (let's call it Scala SDK ?) but it also provides Java API. New version has a unified API (Java-Scala interop.) So I see Scio in a similar way, It's Scala API because it's

Re: [DISCUSS] PTransform.named vs. named apply

2016-06-22 Thread Amit Sela
+1 On Thu, Jun 23, 2016 at 7:27 AM Jean-Baptiste Onofré wrote: > +1 > > Regards > JB > > On 06/23/2016 12:17 AM, Ben Chambers wrote: > > Based on a recent PR (https://github.com/apache/incubator-beam/pull/468) > I > > was reminded of the confusion around the use of > >

Re: [DISCUSS] Beam data plane serialization tech

2016-06-17 Thread Amit Sela
+1 on Aljoscha comment, not sure where's the benefit in having a "schematic" serialization. I know that Spark and I think Flink as well, use Kryo for serialization (to be accurate it's Chill for Spark) and I found it

Re: [dev] Announcing 0.1.0-incubating release

2016-06-15 Thread Amit Sela
And thanks to you Davor for leading the release! On Wed, Jun 15, 2016, 21:03 Davor Bonaci wrote: > Hi everyone, > I’m happy to announce that we have completed our first release – version > 0.1.0-incubating is now available [1]. > > I'm thrilled about this -- it is an

Re: Talking About Beam

2016-06-15 Thread Amit Sela
Great writing Jesse! >From my experience in the last year, working on a stream processing (and generally data processing) platform at PayPal, Beam could also offer a great approach for large projects - up until now (and in my case as well), the process was: 1. Research and paper analysis of

Re: [VOTE] Release version 0.1.0-incubating

2016-06-09 Thread Amit Sela
+1 (binding) I checked the cluster mode example for the Spark runner (READM - https://github.com/apache/incubator-beam/tree/master/runners/spark) using the published jar and it looks like there is an issue with running on cluster (if the input is on HDFS and not local FS). Since the goal of this

Re: 0.1.0-incubating release

2016-06-08 Thread Amit Sela
To Davor, JB and anyone else helping with the release, Thanks! this looks great. On Wed, Jun 8, 2016 at 9:11 PM Amit Sela <amitsel...@gmail.com> wrote: > Regarding Dan's questions: > 1. I'm not sure - it is built with spark-*_2.10 but I honestly don't know > if this matters for th

Re: 0.1.0-incubating release

2016-06-08 Thread Amit Sela
Regarding Dan's questions: 1. I'm not sure - it is built with spark-*_2.10 but I honestly don't know if this matters for the runner itself, it could be nice to have in order to be more informative. In addition, this will change with Spark 2.0 to Scala 2.11 AFAIK. 2. This is to allow running

Re: Apache Beam for Python

2016-06-03 Thread Amit Sela
Welcome Python people ;) I know a few people who've been waiting for this one! On Fri, Jun 3, 2016, 19:53 Davor Bonaci wrote: > Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara! > > On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré

Re: [VOTE] groupId/artifactId naming & layout

2016-06-03 Thread Amit Sela
+1 for Option2 On Fri, Jun 3, 2016 at 2:09 PM Jean-Baptiste Onofré wrote: > As said in my previous e-mail, just proposed PR #416. > > Let's start a vote for groupId and artifactId naming. > > [ ] Option1: use the current layout (multiple groupId, artifactId > relative to

Re: Serialization for org.apache.beam.sdk.util.WindowedValue$*

2016-06-02 Thread Amit Sela
r uses Coders like Thomas Groh described. And i agree that we should consider making PipelineOptions Serializable or provide a generic solution for Runners. Hope this helps, Amit On Thu, Jun 2, 2016 at 10:35 PM Amit Sela <amitsel...@gmail.com> wrote: > Thomas is right, though in my case

Re: Serialization for org.apache.beam.sdk.util.WindowedValue$*

2016-06-02 Thread Amit Sela
gt; > > com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:547) > > at > > > > > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:523) > > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:76

Re: Serialization for org.apache.beam.sdk.util.WindowedValue$*

2016-06-01 Thread Amit Sela
Hi Thomas, Spark and the Spark runner are using kryo for serialization and it seems to work just fine. What is your exact problem ? stack trace/message ? I've hit an issue with Guava's ImmutableList/Map etc. and used https://github.com/magro/kryo-serializers for that. For PipelineOptions you can

Re: [PROPOSAL] IRC or slack channel for Apache Beam

2016-05-18 Thread Amit Sela
+1 for Slack On Wed, May 18, 2016 at 10:47 AM Jean-Baptiste Onofré wrote: > Hi all, > > What do you think about creating a #apache-beam IRC channel on freenode > ? Or if it's more convenient a channel on Slack ? > > Regards > JB > -- > Jean-Baptiste Onofré >

Re: A question about windowed values

2016-04-13 Thread Amit Sela
t;> WindowedValue.valueInEmptyWindows(T) are generally an implementation > >> detail of a transform; for example, in the InProcessPipelineRunner, the > KV<K, > >> Iterable<WindowedValue>> elements output by a GroupByKeyOnly are in > >> empty windows - but by the t

Re: TextIO.Read.Bound vs Create

2016-04-13 Thread Amit Sela
ster/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java#L472 > > On Tue, Apr 12, 2016 at 9:43 PM, Amit Sela <amits...@apache.org> wrote: > > > Why input values from *TextIO.Read.Bound *belong to an empty window while > > v

A question about windowed values

2016-04-13 Thread Amit Sela
My instinct tells me that if a value does not belong to a specific window (in time) it's a part of a global window, but if so, what's the role of the "empty window". When should an element be a "value in an empty window" ?

TextIO.Read.Bound vs Create

2016-04-12 Thread Amit Sela
Why input values from *TextIO.Read.Bound *belong to an empty window while values from *Create* belong in a global window ? Thanks, Amit

Re: PROPOSAL: Apache Beam (virtual) meeting: 05/11/2016 08:00 - 11:00 Pacific time

2016-04-12 Thread Amit Sela
Anytime works for me. On Tue, Apr 12, 2016, 21:24 Jean-Baptiste Onofré wrote: > Hi James, > > 5/4 works for me ! > > Thanks, > Regards > JB > > On 04/12/2016 05:05 PM, James Malone wrote: > > Hey JB, > > > > Sorry for the late reply! That is a good point; apologies I missed >

  1   2   >