Re: [PROPOSAL] for AWS Aurora relational database connector

2017-06-13 Thread Eugene Kirpichov
...Final note: performance when executing queries "limit A, B" and "limit C, D" in sequence may be completely different than when executing them in parallel. In particular, if they are being run in parallel, most likely a lot fewer caching will happen. Make sure your benchmarks account for this

Re: [PROPOSAL] for AWS Aurora relational database connector

2017-06-13 Thread Eugene Kirpichov
Most likely the identical performance you observed for "limit" clause is because you are not sorting the rows. Without sorting, a "limit" query is meaningless: the database is technically allowed return exactly the same result for "limit 0, 10" and "limit 10, 20", because both of these queries are

Re: [PROPOSAL] for AWS Aurora relational database connector

2017-06-13 Thread Eugene Kirpichov
Thanks Madhusudan. Please note that in your case, likely, the time was dominated by shipping the rows over the network, rather than executing the query. Please make sure to include benchmarks where the query itself is expensive to evaluate (e.g. "select count(*) from query" takes time comparable

Re: [PROPOSAL] for AWS Aurora relational database connector

2017-06-13 Thread Madhusudan Borkar
Hi, Appreciate your questions. One thing I believe, AWS Aurora even though it is based on MySQL, it is no MySQL. The reason being, AWS has developed this database service RDS ground up and has improved or completely changed its implementation. That being said some of things that one may have

Re: DataWorks Summit San Jose 2017

2017-06-13 Thread Davor Bonaci
A quick remainder that Beam talks are tomorrow. And, of course, stickers will make an appearance! If you'd like to chat about all-things-Beam, please stay in the room after any of these sessions, or stop me if you see me around. I hope to see many of you there! On Mon, Jun 5, 2017 at 4:22 PM,

Re: [PROPOSAL] for AWS Aurora relational database connector

2017-06-13 Thread Sourabh Bajaj
+1 for S3 being more of a FS @Madhusudan can you point to some documentation on how to do row-range queries in Aurora as from a quick scan it follows the MySql 5.6 syntax so you will still need an order by for the IO to do exactly once reads. So wanted to learn more about how the questions raised

Re: BeamSQL status and merge to master

2017-06-13 Thread Lukasz Cwik
Nevermind, I merged it into #2 about usability. On Tue, Jun 13, 2017 at 8:50 AM, Lukasz Cwik wrote: > I added a section about maven module structure/packaging (#6). > > On Tue, Jun 13, 2017 at 8:30 AM, Tyler Akidau > wrote: > >> Thanks Mingmin.

Re: BeamSQL status and merge to master

2017-06-13 Thread Lukasz Cwik
I added a section about maven module structure/packaging (#6). On Tue, Jun 13, 2017 at 8:30 AM, Tyler Akidau wrote: > Thanks Mingmin. I've copied your list into a doc[1] to make it easier to > collaborate on comments and edits. > > [1]

Re: BeamSQL status and merge to master

2017-06-13 Thread Tyler Akidau
Thanks Mingmin. I've copied your list into a doc[1] to make it easier to collaborate on comments and edits. [1] https://s.apache.org/beam-dsl-sql-burndown -Tyler On Mon, Jun 12, 2017 at 10:09 PM Jean-Baptiste Onofré wrote: > Hi Mingmin > > Sorry, the meeting was in the

Report to the Board, June 2017 edition

2017-06-13 Thread Davor Bonaci
We are expected to submit a project report to the ASF Board of Directors ahead of its next meeting. The report is due on Wednesday, 6/14. If interested, please take a look at the draft [1], and comment or contribute content, as appropriate. I'll submit the report sometime on Wednesday. Thanks!

Re: Beam Proposal: Pipeline Drain

2017-06-13 Thread Reuven Lax
Thanks Ismaël, I think the SDK portions of the Drain proposal are completely runner independent. Some parts of Drain (e.g. advancing watermarks) will have to be done by the runners of course. I'm working on the snapshot and update proposal. I hope to have time to send it out soon! Reuven On

Re: Beam Proposal: Pipeline Drain

2017-06-13 Thread Ismaël Mejía
Hello Reuven, I finally took the time to read the Drain proposal, thanks a lot for bringing this, it looks like a nice fit with the current APIs and it would be great if this could be implemented as much as possible in a Runner independent way. I am eager now to see the snapshot and update

[DISCUSS] Bundle in Flink Runner

2017-06-13 Thread JingsongLee
Hi everyone, I take a discussion to the implement of real bundle in Flink Runner. https://docs.google.com/document/d/1UzELM4nFu8SIeu-QJkbs0sv7Uzd1Ux4aXXM3cw4s7po/edit?usp=sharing Feel free to comment/edit it. Best, JingsongLee