Re: Merge branch DSL_SQL to master

2017-09-07 Thread Ismaël Mejía
; > components >> > > and create additional tests as appropriate >> > > >> > > * Besides of integration tests in package >> > org.apache.beam.sdk.extensions.sql, >> > > there's another example in org.apache.beam.sdk.extensions.sql.ex

Re: Beam 2.2.0 release

2017-08-30 Thread Ismaël Mejía
The current master has accumulated a good amount of nice features since 2.1.0 so a new release is welcomed. I have two JIRAs/PR that I think are important to check/solve before the cut: BEAM-2516 (this is a regression on the performance of Direct runner on Java). We had never really defined if a

Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-14 Thread Ismaël Mejía
+1 (non-binding) - Validated signatures OK - mvn clean verify -Prelease on both OpenJDK 1.7 and Oracle JDK 8 with the docker development images (WIP), both OK - Run WordCount on local Flink and Spark runners OK Everything looks nice, only one minor thing (not blocking at all). The proto

Re: Proposal : An extension for sketch-based statistics

2017-08-14 Thread Ismaël Mejía
Kenneth’s idea of using sketches for state with the State API is really interesting, it really opens some interesting use cases, I haven’t really thought about it but I believe it is really an appealing use case for the sketches. Note that the origin of this work was in the line of statistics, in

Re: [ANNOUNCEMENT] New committers, August 2017 edition!

2017-08-11 Thread Ismaël Mejía
Congrats everyone, well deserved, excellent work guys ! On Fri, Aug 11, 2017 at 7:53 PM, Jesse Anderson wrote: > Welcome! > > On Fri, Aug 11, 2017, 10:48 AM Jason Kuster > wrote: > >> Congrats to all, many thanks for the great

Re: [CANCEL][VOTE] Release 2.1.0, release candidate #2

2017-07-24 Thread Ismaël Mejía
Not a blocker but maybe it is worth considering the fix for https://issues.apache.org/jira/browse/BEAM-2587 too. I also was bitten by this issue and I could only get it to work by doing a 'pip install --user grpcio-tools' (not sure if this is a proper solution but it works for me), however when I

Re: [PROPOSAL] Connectors for memcache and Couchbase

2017-07-11 Thread Ismaël Mejía
ey basis. >> >> Interestingly, the "watch mutations" command would allow one to build a >> streaming memcache IO which shows all changes occurring underneath. >> >> memcached protocol: >> https://github.com/memcached/memcached/blob/master/doc/pr

Re: [DISCUSS] Bridge beam metrics to underlying runners to support metrics reporters?

2017-06-23 Thread Ismaël Mejía
Cody not sure if I follow, but isn't Distribution on Beam similar to codahale/dropwizard's HIstogram (without the quantiles) ? Meters are also in the plan but not implemented yet, see the Metrics design doc: https://s.apache.org/beam-metrics-api If I understand what you want is to have some sort

Re: Beam Proposal: Pipeline Drain

2017-06-13 Thread Ismaël Mejía
Hello Reuven, I finally took the time to read the Drain proposal, thanks a lot for bringing this, it looks like a nice fit with the current APIs and it would be great if this could be implemented as much as possible in a Runner independent way. I am eager now to see the snapshot and update

Re: [DISCUSS] HadoopInputFormat based IOs

2017-06-01 Thread Ismaël Mejía
O test code, I'd > propose that we add comments in there directing people to the correct > native source. > > S > [1] writeThenRead style IO IT - > https://lists.apache.org/thread.html/26ee3ba827c2917c393ab26ce97e7491846594d8f574b5ae29a44551@%3Cdev.beam.apache.org%3E > > On Tue, May 3

Re: [DISCUSS] HadoopInputFormat based IOs

2017-05-30 Thread Ismaël Mejía
The whole goal of this discussion is that we define what shall we do when someone wants to add a new IO that uses HIFIO. The consensus so far following the PR comments + this thread is that it should be discouraged and those contribution be included as documentation in the website, and that we

Re: [DISCUSS] HadoopInputFormat based IOs

2017-05-30 Thread Ismaël Mejía
adable" IOs (better name suggestions appreciated :) - > that could include a list of data stores that jdbc/jms/hifio support and > link to HIFIO's info on how to use them. (That might also be a good place > to document the performance tradeoffs of using HIFIO) > > S > >

Re: [New Proposal] Hive connector using native api

2017-05-24 Thread Ismaël Mejía
Hello, I created a new JIRA for this native implementation of the IO so feel free to PR the 'native' implementation using this ticket. https://issues.apache.org/jira/browse/BEAM-2357 We will discuss all the small details in the PR. The old JIRA (BEAM-1158) will still be there just to add the

[DISCUSS] HadoopInputFormat based IOs

2017-05-23 Thread Ismaël Mejía
better to add just the tests/docs of how to use them as proposed in the PR (option 2). Feel free to comment/vote or maybe add an eventual third option if you think there is one better option. Regards, Ismaël Mejía [1] https://issues.apache.org/jira/browse/BEAM-1158

Re: First stable release completed!

2017-05-17 Thread Ismaël Mejía
Amazing milestone, congrats everyone! On Wed, May 17, 2017 at 7:54 PM, Reuven Lax wrote: > Sweet! > > On Wed, May 17, 2017 at 4:28 AM, Davor Bonaci wrote: > >> The first stable release is now complete! >> >> Release artifacts are available through

Re: First stable release: version designation?

2017-05-04 Thread Ismaël Mejía
My vote, like Davor: Slight preference toward 2.0.0, but fine with 1.0.0 On Thu, May 4, 2017 at 9:32 PM, Thomas Weise wrote: > I'm in the relaxed 1.0.0 camp. > > -- > sent from mobile > On May 4, 2017 12:29 PM, "Mingmin Xu" wrote: > >> I slightly prefer1.0.0

Re: Congratulations Davor!

2017-05-04 Thread Ismaël Mejía
Congratulations Davor! Your membership is really deserved, You really got the Apache spirit ! On Thu, May 4, 2017 at 5:02 PM, Thomas Groh wrote: > Congratulations! > > On Thu, May 4, 2017 at 7:56 AM, Thomas Weise wrote: > >> Congrats! >> >> >> On Thu,

Re: [PROPOSAL] HiveIO - updated link to document

2017-04-25 Thread Ismaël Mejía
Hello, I created the HiveIO JIRA and followed the initial discussions about the best approach for HiveIO so I want first to suggest you to read the previous thread(s) on the mailing list. https://www.mail-archive.com/dev@beam.incubator.apache.org/msg02313.html The main idea I concluded from

Re: [DISCUSSION] Encouraging more contributions

2017-04-24 Thread Ismaël Mejía
+1 Great idea Aviem, thanks for bringing this subject to the mailing list. I agree in particular with the freeing JIRA part, I think we shouldn’t keep assigned JIRAs that are things that we don’t expect to solve in the next weeks. (note the exception for this are the long features). I would add

Re: [DISCUSSION] PAssert success/failure count validation for all runners

2017-04-10 Thread Ismaël Mejía
I have the impression this conversation went into a different sub-discussion ignoring the core subject that is if it makes sense to do the implementation of Passert as we are doing it right now (1), or in a runner agnostic way (2). Big +1 for (2). And I think also this is critical enough to be

Re: Update of Pei in Alibaba

2017-04-07 Thread Ismaël Mejía
> For the basic “at most once” job, JStorm runner can be reused on Storm. But > for “window”, “state” and “exactly once” job, unfortunately, JStorm runner > can’t be reused. Anyway, we will figure out if the propagation is possible > for Storm in the future. > > > >

Re: Update of Pei in Alibaba

2017-04-03 Thread Ismaël Mejía
Thanks Jingsong for answering, and the Streamscope ref, I am going to check the paper, the concept of non-global-checkpointing sounds super interesting. It is nice that you guys are also trying to promote the move to a unified model. Regards, Ismaël On Sun, Apr 2, 2017 at 3:40 PM, JingsongLee

Re: [PROPOSAL] ORC support

2017-04-01 Thread Ismaël Mejía
+1 >From my previous work experience ORC in certain cases performs better than Parquet and really deserves to be supported. On Sat, Apr 1, 2017 at 5:58 PM, Ted Yu wrote: > +1 > >> On Apr 1, 2017, at 8:31 AM, Tibor Kiss wrote: >> >> Hello, >> >>

Re: Update of Pei in Alibaba

2017-04-01 Thread Ismaël Mejía
Excellent news, Pei it would be great to have a new runner. I am curious about how different are the implementations of storm among them considering that there are already three 'versions': Storm, Jstorm and Heron, I wonder if one runner could traduce to an API that would cover all of them (of

Re: [ANNOUNCEMENT] New committers, March 2017 edition!

2017-03-20 Thread Ismaël Mejía
Thanks everyone, Feels great to be part of the team. Congratulations to the other new committers ! -Ismaël On Mon, Mar 20, 2017 at 2:50 PM, Tyler Akidau wrote: > Welcome! > > On Mon, Mar 20, 2017, 02:25 Jean-Baptiste Onofré wrote: > >> Welcome

Re: splitIntoBundles vs. generateInitialSplits

2017-03-20 Thread Ismaël Mejía
This is an forgotten one, Stas did you create a JIRA about this one? I think this change should be also tagged as First version release, because this is an API change and can break stuff if we do it later on. On Wed, Jan 11, 2017 at 4:30 PM, Jean-Baptiste Onofré wrote: > Hi

Re: Performance Testing Next Steps

2017-03-16 Thread Ismaël Mejía
e a hand if needed. On Thu, Mar 16, 2017 at 9:17 AM, Jason Kuster <jasonkus...@google.com.invalid> wrote: > Thanks Ismael for the comments! Replied inline. > > On Wed, Mar 15, 2017 at 8:18 AM, Ismaël Mejía <ieme...@gmail.com> wrote: > >> Excellent proposal, sorry to jump

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía
r to supporting streaming with Spark 1 > runner, and having Structured Streaming advance in Spark 2, we could start > work on Spark 2 runner in a separate branch. > > However, I do feel that we should use the Dataset API, starting with batch > support first. WDYT ? > > On Wed,

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía
not heavily > investing there. > > We could think of starting to migrate the Spark 1 runner to Spark 2 and > follow with Dataset API support feature-by-feature as ot advances, but I > think most Spark installations today still run 1.X, or am I wrong ? > > On Wed, Mar 15, 2017

Re: Performance Testing Next Steps

2017-03-15 Thread Ismaël Mejía
Excellent proposal, sorry to jump into this discussion so late, this was in my toread list for almost two weeks, and I finally got the time to read the document and I have two minor comments: I have the impression that the strict separation of Providers (the data-processing systems) and Resources

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía
BIG +1 JB, If we can just jump the version number with minor changes staying as close as possible to the current implementation for spark 1 we can go faster and offer in principle the exact same support but for version 2. I know that the advanced streaming stuff based on the DataSet API won't be

Re: Docker image dependencies

2017-03-15 Thread Ismaël Mejía
Hi, Thanks for bringing this subject to the mailing list. +1 We definitely need a consensus on this, and I agree with your proposal and JB’s comments modulo certain clarifications: I think we shall go in this priority order if the version of the image we want is available: 1. Image provided by

Re: Style: how much testing for transform builder classes?

2017-03-15 Thread Ismaël Mejía
ose which validation to >> unit-test and which to skip as trivial, so documentation on this topic >> should be in the form of guidelines, high-quality example code (i.e. clean >> up the unit tests of IOs bundled with Beam SDK), and informal knowledge in >> the heads of readers of th

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Ismaël Mejía
4 of which are binding: > > * Aljoscha Krettek > > * Davor Bonaci > > * Ismaël Mejía > > * Jean-Baptiste Onofré > > * Robert Bradshaw > > * Ted Yu > > * Tibor Kiss > > > > There are no disapproving votes. > > > > Thanks everyone! > > > > Ahmet > > >

Re: Style: how much testing for transform builder classes?

2017-03-14 Thread Ismaël Mejía
​+0.5 I used to think that some of those tests were not worth, for example testBuildRead and testBuildReadAlt. However the reality is that these tests allowed me to find bugs both during the development of HBaseIO and just yesterday when I tried to test the write support for the emulator with

Re: [VOTE] Release 0.6.0, release candidate #2

2017-03-13 Thread Ismaël Mejía
​+1 (non-binding) - verified signatures + checksums - run mvn clean install -Prelease, all artifacts build and the tests run smoothly (modulo some local issues I had with the installation of tox for the python sdk, I created a PR to fix those in case other people can have the same trouble). Some

Re: [VOTE] Release 0.6.0, release candidate #2

2017-03-12 Thread Ismaël Mejía
I found an issue too with the .md5 and sha1 files of the python release, they refer to a different default file (a forgotten part of the renaming): curl https://dist.apache.org/repos/dist/dev/beam/0.6.0/apache-beam-0.6.0-python.zip.md5 7d4170e381ce0e1aa8d11bee2e63d151 apache-beam-0.6.0.zip This

Re: Interest in a (virtual) contributor meeting?

2017-02-23 Thread Ismaël Mejía
+1 to do it periodically about different subjects. It is a good idea to have a sort of mini agenda, in the sense that the two previous meetings had really different focus, the first one was about contributors meeting each other and discussion of ongoing work just after the project started on

Re: Metrics for Beam IOs.

2017-02-22 Thread Ismaël Mejía
Hello, Thanks everyone for giving your points of view. I was waiting to see how the conversation evolved to summarize it and continue on the open points. Points where mostly everybody agrees (please correct me if somebody still disagrees): - Default metrics should not affect performance, for

Re: Better developer instructions for using Maven?

2017-02-15 Thread Ismaël Mejía
This question got lost in the discussion, but there is a small improvement that we can do: > Just to check, are we doing parallel builds? We are on jenkins, not in travis, there is an ongoing PR to fix this. What we can improve is to check if we can run some of the test suites in parallel to

Metrics for Beam IOs.

2017-02-14 Thread Ismaël Mejía
​Hello, The new metrics API allows us to integrate some basic metrics into the Beam IOs. I have been following some discussions about this on JIRAs/PRs, and I think it is important to discuss the subject here so we can have more awareness and obtain ideas from the community. First I want to

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-27 Thread Ismaël Mejía
Congratulations, well deserved guys ! On Fri, Jan 27, 2017 at 9:28 AM, Amit Sela wrote: > Welcome and congratulations to all! > > On Fri, Jan 27, 2017, 10:12 Ahmet Altay wrote: > > > Thank you all! And congratulations to other new committers. >

Re: Request for becoming a contributor

2017-01-24 Thread Ismaël Mejía
Similar to yesterday's discussion about opening access to the slack channel, I wonder if it makes sense to let people assign themselves as contributors and pick JIRAs without asking for this, Is this possible with Apache's JIRA? And do you think this is a good idea? On Tue, Jan 24, 2017 at 7:15

Re: Hosting data stores for IO Transform testing

2017-01-18 Thread Ismaël Mejía
t; pre-built packages for multi-node clusters of data stores. If there's a >>> good repository of them that we trust, that would definitely save us >>> time. >>> Can you point me at the mesos repository? >>> >>> S >>> >>> >>> >>>

<    3   4   5   6   7   8