Re: [DISCUSS] Multiple-triggering SQL Join with retractions support

2019-08-21 Thread Mingmin Xu
am not sure if I understand the question: >>> >>> 1. multiple GBK with retraction is solved by [1]. >>> 2. In terms of SQL and its view, the output are defined by the last GBK. >>> >>> [1]: >>> https://docs.google.com/document/d/14WRfxwk_iLUHGPt

Re: [DISCUSS] Multiple-triggering SQL Join with retractions support

2019-08-19 Thread Mingmin Xu
+1 to support EMIT in Beam side first if we cannot include it in Calcite in short time(See #1, #2). I'm open to use any format, the one above or something as below. The tricky question is, what's the expected behavior for a complex query with more than 1 GBK operators? EMIT |

Re: [VOTE] Support ZetaSQL as another SQL dialect for BeamSQL in Beam repo

2019-08-12 Thread Mingmin Xu
+1 On Mon, Aug 12, 2019 at 8:53 PM Ryan McDowell wrote: > +1 > > On Mon, Aug 12, 2019 at 8:30 PM Reza Rokni wrote: > >> +1 >> >> On Tue, 13 Aug 2019 at 09:28, Ahmet Altay wrote: >> >>> +1 >>> >>> On Mon, Aug 12, 2019 at 6:27 PM Kenneth Knowles wrote: >>> +1 On Mon, Aug 12,

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

2019-08-07 Thread Mingmin Xu
practice, see their blog post > <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite> > ). > > For the longer term, it would be interesting to see how we can add > BigQuery syntax (plus its data types and sql functions) to Calcit

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

2019-08-06 Thread Mingmin Xu
Just take a look at https://issues.apache.org/jira/browse/CALCITE-2280 which introduced Babel parser in Calcite to support varied dialects, this may be an easier way to support BigQuery syntax. @Rui do you notice any big difference between Calcite engine and ZetaSQL, like parsing, optimization? If

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

2019-08-04 Thread Mingmin Xu
Interesting feature, thanks Rui to bring the new option. Please keep me in loop, I’ll take a look when back to home tomorrow. It seems the chance to support other dialects, we see lots of concerns to translate from like SparkSQL. Mingmin Sent from my iPhone > On Aug 4, 2019, at 2:43 PM, Rui

Re: [PROPOSAL] Revised streaming extensions for Beam SQL

2019-07-24 Thread Mingmin Xu
+1 to remove those magic words in Calcite streaming SQL, just because they're not SQL standard. The idea to replace HOP/TUMBLE with table-view-functions makes it concise, my only question is, is it(or will it be) part of SQL standard? --I'm a big fan to align with standards :lol Ps, although the

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-06 Thread Mingmin Xu
Good point to reject DISTINCT operations currently, as it's not handled now. There could be more similar cases need to revise and document well. Regarding to how to DISTINCT support, I was confused by stateful CombineFn at first. To make it simple, we can extend step by step, like reject

Re: kafka 0.9 support

2019-04-02 Thread Mingmin Xu
We're still using Kafka 0.10 a lot, similar as 0.9 IMO. To expand multiple versions in KafkaIO is quite complex now, and it confuses users which is supported / which is not. I would prefer to support Kafka 2.0+ only in the latest version. For old versions, there're some options: 1). document

Re: Migrating Beam SQL to Calcite's code generation

2018-11-15 Thread Mingmin Xu
e main use case for BEAM-5204. Is it your use case? > > Kenn > > On Thu, Nov 15, 2018 at 10:08 AM Mingmin Xu wrote: > >> Raise this thread. >> Seems there're more changes in the backend on how a FUNCTION is executed >> in the backend, as noticed in #6996 >&g

Re: Migrating Beam SQL to Calcite's code generation

2018-11-15 Thread Mingmin Xu
L. Both work with a more extensive set of >> arguments after this change. There are now 4 outstanding calcite PRs that >> get all the tests passing. >> >> Unfortunately there is no easy way to mix our current implementation and >> using Calcite's code generator. >>

Re: Migrating Beam SQL to Calcite's code generation

2018-09-17 Thread Mingmin Xu
Awesome work, we should call Calcite operator functions if available. I haven't get time to read the PR yet, for those impacted would keep existing implementation. One example is, I notice FLOOR/CEIL only supports months/years recently which is quite a surprise to me. Mingmin On Mon, Sep 17,

Re: [SQL] Create External Schema

2018-08-13 Thread Mingmin Xu
awesome proposal to integrate with existing external schemas, add some comments in doc. On Mon, Aug 13, 2018 at 4:13 PM, Reuven Lax wrote: > Is it possible to extend Beam's SchemaRegistry to do this? > > On Mon, Aug 13, 2018 at 4:06 PM Anton Kedin wrote: > >> Hi, >> >> I am planning to work on

Re: [VOTE] Apache Beam, version 2.6.0, release candidate #1

2018-08-02 Thread Mingmin Xu
+1 Verified with SQL component. On Thu, Aug 2, 2018 at 10:05 AM, Thomas Weise wrote: > It does include *some* of the portable Flink runner (you will be able to > run wordcount as documented on https://beam.apache.org/ > contribute/portability/#status). > > I would recommend to continue using

POM beam-sdks-java-extensions-parent is not available in repository.apache.org

2018-07-06 Thread Mingmin Xu
Hello, Seems some versions are lost in the repository, can someone help to deploy it? [Releases]: 2.5.0 [Snapshots]: 2.6.0-SNAPSHOT Thanks! Mingmin

Re: Building and visualizing the Beam SQL graph

2018-06-14 Thread Mingmin Xu
PTransform> buildPTransform(); >>> >>> default PCollection toPCollection(Pipeline pipeline) { >>> return buildPInput(pipeline).apply(getStageName(), >>> buildPTransform()); >>> } >>> >>> Andrew >>> >>> On Mon, Ju

Re: Building and visualizing the Beam SQL graph

2018-06-11 Thread Mingmin Xu
EXPLAIN shows the execution plan in SQL perspective only. After converting to a Beam composite PTransform, there're more steps underneath, each Runner re-org Beam PTransforms again which makes the final pipeline hard to read. In SQL module itself, I don't see any difference between `toPTransform`

Re: Merge options in Github UI are confusing

2018-04-17 Thread Mingmin Xu
Not strongly against `*Create a merge commit*`, but I use `squash and merge` by default. I understand the potential impact mentioned by Andrew, it's still a better option IMO: 1. if a PR contains several parts, it can be documented in commit message instead of several commits; --If it's a big

Re: SQL in Python SDK

2018-04-13 Thread Mingmin Xu
With current implementation we're not able to extent it for Python as Calcite has Jave API only. Another separated Python based SQL should be the solution. Based on our practice, we write lots of UDF/UDAF and customized TABLE to fit our own data source/storage. For the former it could be possible

daily build is not consistent

2018-03-27 Thread Mingmin Xu
Hi all, Find that daily snapshot build could be partially successful, which causes failure w/ SNAPSHOT dependencies. Is it possible to have a consistent 'deploy' action? Here's one example: https://github.com/apache/beam/pull/4918 changes both `beam-runners-flink_2.11` and `beam-sdks-java-core`.

Re: [SQL] Windowing and triggering changes proposal

2018-01-16 Thread Mingmin Xu
Thanks @Anton for the proposal. Window(w/ trigger) support in SQL is limited now, you're very welcome to join the improvement. There's a balance between injected DSL mode and CLI mode when we were implementing BealmSQL overall, not only widowing. Many default behaviors are introduced to make it

Re: KafkaIO reading from latest offset when pipeline fails on FlinkRunner

2018-01-10 Thread Mingmin Xu
@Sushil, I have several jobs running on KafkaIO+FlinkRunner, hope my experience can help you a bit. For short, `ENABLE_AUTO_COMMIT_CONFIG` doesn't meet your requirement, you need to leverage exactly-once checkpoint/savepoint in Flink. The reason is, with `ENABLE_AUTO_COMMIT_CONFIG` KafkaIO

Re: Is upgrading to Kafka Client 0.10.2.0+ in the roadmap?

2017-10-30 Thread Mingmin Xu
/io/kafka/pom.xml#L33 > > I have just verified that adding Kafka-Client 0.11 in the application > pom.xml works fine for me. I can now avoid the JAAS configuration file by > using the "java.security.auth.login.config" property. > > Best, > Shen > > On Mon, Oct

Re: Is upgrading to Kafka Client 0.10.2.0+ in the roadmap?

2017-10-30 Thread Mingmin Xu
Hi Shen, Can you share which Beam version are you using? Just check master code, the default version for Kafka is `0.11.0.1`. I cannot recall the usage for old versions, my application(2.2.0-SNAPSHOT) works with a customized kafka version based on 0.10.00-SASL. What you need to do is 1). exclude

Re: Support for window analytic functions in SQL DSL

2017-10-05 Thread Mingmin Xu
@Kobi, Currently we don't support window analytic functions, feel free to create a new-feature JIRA ticket. On Thu, Oct 5, 2017 at 12:07 PM, Tyler Akidau <taki...@google.com> wrote: > I'm not aware of analytic window support. +Mingmin Xu <mingm...@gmail.com> > or +James <

Re: Beam 2.2.0 release

2017-09-11 Thread Mingmin Xu
er 18 as a target date for cutting the first RC. That should > >> hopefully give plenty of time to get SQL and the remaining PRs merged > into > >> master. > >> > >> Reuven > >> > >> On Thu, Aug 31, 2017 at 3:04 PM, Mingmin Xu <mingm...@gmail.com> wrote

Re: Merge branch DSL_SQL to master

2017-09-11 Thread Mingmin Xu
xperience! > > > > >> > > > > >> Regards, > > > > >> Tarush > > > > >> > > > > >> On Thu, 7 Sep 2017 at 1:05 PM, Jean-Baptiste Onofré < > > j...@nanthrax.net> > > > > >> wrote: >

Re: Beam 2.2.0 release

2017-09-07 Thread Mingmin Xu
18 as a target date for cutting the first RC. That should >> hopefully give plenty of time to get SQL and the remaining PRs merged into >> master. >> >> Reuven >> >> On Thu, Aug 31, 2017 at 3:04 PM, Mingmin Xu <mingm...@gmail.com> wrote: >>

Re: Beam 2.2.0 release

2017-09-06 Thread Mingmin Xu
; On Thu, Aug 31, 2017 at 3:04 PM, Mingmin Xu <mingm...@gmail.com> wrote: > > > Add https://issues.apache.org/jira/browse/BEAM-2833 which is a blocker > to > > merge DSL_SQL. There may be something wrong in the back-end(maybe > > RunnerApi) to handle parametered CustomCoder

Re: Beam 2.2.0 release

2017-08-31 Thread Mingmin Xu
>>>>>>> critical to be a blocker. I executed WordCount with the > >> > kinglear.txt > >> > >>>>>>> (170KB) file in version 2.1.0 vs the current 2.2.0-SNAPSHOT > >and I > >> > >>>>>>> found that the execution

Re: Beam 2.2.0 release

2017-08-30 Thread Mingmin Xu
Glad to see that 2.2.0 is coming. Can we include SQL feature in next release? We're in the final stage and expect to merge back to master this week. On Wed, Aug 30, 2017 at 11:27 AM, Reuven Lax wrote: > Now that Beam 2.1.0 has finally completed, I think we should cut

Re: [PROPOSAL] External Join with KV Stores

2017-08-30 Thread Mingmin Xu
> > To my knowledge, nothing requires an SDK/runner to hold the entire side > input in memory. Lists, maps, iterables, ... can all be broken up into > smaller segments which can be loaded, cached and discarded separately. > > On Thu, Aug 24, 2017 at 5:10 PM, Mingmin Xu <ming

Re: [PROPOSAL] External Join with KV Stores

2017-08-24 Thread Mingmin Xu
wanna bring up this thread as we're looking for similar feature in SQL. --Please point me if something is there, I don't find any JIRA task. Now the streaming+batch/batch+batch join is implemented with sideInput. It's not a one-fit-all rule as Jingsong mentioned, the batch data may be too large,

Re: [DISCUSS] Capability Matrix revamp

2017-08-23 Thread Mingmin Xu
I would like to have an API compatibility testing. AFAIK there's still gap to achieve our goal (one job for any runner), that means developers should notice the limitation when writing the job. For example PCollectionView is not well supported in FlinkRunner(not quite sure the current status as my

Re: [ANNOUNCEMENT] New PMC members, August 2017 edition!

2017-08-11 Thread Mingmin Xu
Congratulations to Ahmet and Aviem! On Fri, Aug 11, 2017 at 11:30 AM, Thomas Groh wrote: > Congratulations to both of you! Looking forwards to both of your continued > contributions. > > On Fri, Aug 11, 2017 at 10:40 AM, Davor Bonaci wrote: > > >

Re: DSL_SQL branch API review

2017-08-03 Thread Mingmin Xu
Thank you @Tyler to gather the APIs introduced in SQL DSL, add some comments in the doc. On Thu, Aug 3, 2017 at 4:21 PM, Tyler Akidau wrote: > Hello Beam dev listers! > > TL;DR - DSL_SQL API review happening at > https://s.apache.org/beam-sql-dsl-api-review > > As

Re: Requiring PTransform to set a coder on its resulting collections

2017-07-26 Thread Mingmin Xu
Second that 'it's responsibility of the transform'. For the case when a PTransform doesn't have enough information(PTransform developer should have the knowledge), I would prefer a strict way so users won't forget to call withSomethingCoder(), like - a Coder is required to new the PTransform; - or

Re: [NEED HELP] how to revert a PR in branch DSL_SQL

2017-07-20 Thread Mingmin Xu
Thanks @Kenn, awesome explanation! Following option (1) now. On Thu, Jul 20, 2017 at 10:14 AM, Kenneth Knowles <k...@google.com.invalid> wrote: > On Wed, Jul 19, 2017 at 9:35 PM, Mingmin Xu <mingm...@gmail.com> wrote: > > > Merge with conflict is not a good choice to me

[NEED HELP] how to revert a PR in branch DSL_SQL

2017-07-19 Thread Mingmin Xu
Hi there, It seems branch DSL_SQL is broken after #3553 , as I cannot create a PR to master branch with error message '*Can’t automatically merge.*'. Googled and find two solutions: 1. submit a revert PR with Git

Re: [VOTE] Release 2.1.0, release candidate #2

2017-07-18 Thread Mingmin Xu
s all on a feature branch and the release notes when it goes to > master will include "Add SQL DSL" I did not associate the little bits with > a release. > > On Tue, Jul 18, 2017 at 2:51 PM, Mingmin Xu <mingm...@gmail.com> wrote: > > > The tasks of SQL should not be labe

Re: [VOTE] Release 2.1.0, release candidate #2

2017-07-18 Thread Mingmin Xu
The tasks of SQL should not be labeled as 2.1.0, I've updated some with 2.2.0, fail to change the 'closed' ones. Can anyone with the permission update these tasks

Re: BeamSQL status and merge to master

2017-07-05 Thread Mingmin Xu
in the middle of the night for me and I wasn't > >> able > >> > to > >> > make it. > >> > > >> > The timing and checklist look good to me. > >> > > >> > We plan to do a Beam release end of June, so, merging in July me

BeamSQL status and merge to master

2017-06-12 Thread Mingmin Xu
Hi all, Thanks to join the meeting. As discussed, we're planning to merge DSL_SQL branch back to master, targeted in the middle of July. A tag 'dsl_sql_merge'[1] is created to track all todo tasks. *What's added in Beam SQL?* BeamSQL provides the capability to execute SQL queries with Beam Java

Re: First stable release completed!

2017-05-17 Thread Mingmin Xu
Congratulations to everyone! On Wed, May 17, 2017 at 8:36 AM, Dan Halperin wrote: > Great job, folks. What an amazing amount of work, and I'd like to > especially thank the community for participating in hackathons and > extensive release validation over the last

Re: [VOTE] First stable release: release candidate #4

2017-05-13 Thread Mingmin Xu
+1 Test beam-examples with FlinkRunner, and several cases of KafkaIO/JdbcIO. Thanks! Mingmin On Sat, May 13, 2017 at 7:38 PM, Ahmet Altay wrote: > +1 > > - Tested Python wordcount with DirectRunner & DataflowRunner on > Windows/Mac/Linux, and python mobile gaming

[PROPOSAL] design of DSL SQL interface

2017-05-12 Thread Mingmin Xu
Hi all, As you may know, we're working on BeamSQL to execute SQL queries as a Beam pipeline. This is a valuable feature, not only shipped as a packaged CLI, but also as part of the SDK to assemble a pipeline. I prepare a document[1] to list the high level APIs, to show how SQL queries can be

Re: Pull request - power function

2017-05-12 Thread Mingmin Xu
Thanks @Tarush, will also take a look. On Fri, May 12, 2017 at 7:19 AM, Jean-Baptiste Onofré wrote: > Thanks, > > we gonna take a look. > > Regards > JB > > > On 05/12/2017 04:12 PM, tarush grover wrote: > >> Hi Team, >> >> I have opened a pull request Beam-2171 power

Re: Congratulations Davor!

2017-05-04 Thread Mingmin Xu
Congratulations @Davor! > On May 4, 2017, at 7:08 AM, Amit Sela wrote: > > Congratulations Davor! > >> On Thu, May 4, 2017, 10:02 JingsongLee wrote: >> >> Congratulations! >> -- >>

Re: [DISCUSSION] Encouraging more contributions

2017-04-24 Thread Mingmin Xu
many design documents are mixed in maillist, jira comments, it would be a big help to put them in a centralized list. Also I would expect more wiki/blogs to provide in-depth analysis, like the translation from pipeline to runner specified topology, window/trigger implementation. Without these

Re: [DISCUSSION] Encouraging more contributions

2017-04-22 Thread Mingmin Xu
Good point, could also disable the auto assignment when creating JIRA ticket. Now it goes to component leader directly. Sent from my iPhone > On Apr 22, 2017, at 7:34 AM, Ted Yu wrote: > > +1 > >> On Sat, Apr 22, 2017 at 7:31 AM, Aviem Zur wrote: >>

Re: [PROPOSAL]: a new feature branch for SQL DSL

2017-04-14 Thread Mingmin Xu
stand what you're saying. > > -Tyler > > On Wed, Apr 12, 2017 at 9:38 PM Mingmin Xu <mingm...@gmail.com> wrote: > > > Expose streaming snapshot via STATE is attractive in Beam model, but > doubt > > it's the right way in SQL. IMO,there's 'INSERT INTO' to persistent &

Re: [PROPOSAL]: a new feature branch for SQL DSL

2017-04-12 Thread Mingmin Xu
> > > > > -Tyler > > > > > > > > > > > > On Sun, Apr 9, 2017 at 12:39 PM Mingmin Xu <mingm...@gmail.com> > wrote: > > > > > > > > > Thanks @JB, will come out the initial PR soon. > > > > &

Re: [PROPOSAL]: a new feature branch for SQL DSL

2017-04-11 Thread Mingmin Xu
valent is for SQL? Or are > you > > > asking an operational question about where the state for a given SQL > > > pipeline will live? > > > > > > -Tyler > > > > > > > > > On Sun, Apr 9, 2017 at 12:39 PM Mingmin Xu <ming

Re: [PROPOSAL]: a new feature branch for SQL DSL

2017-04-09 Thread Mingmin Xu
t; > On 04/09/2017 08:02 PM, Mingmin Xu wrote: > >> State is not touched yet, welcome to add it. >> >> On Sun, Apr 9, 2017 at 2:40 AM, 陈竞 <cj.mag...@gmail.com> wrote: >> >> how will this sql support state both in streaming and batch mode >>> >>

Re: [PROPOSAL]: a new feature branch for SQL DSL

2017-04-09 Thread Mingmin Xu
State is not touched yet, welcome to add it. On Sun, Apr 9, 2017 at 2:40 AM, 陈竞 <cj.mag...@gmail.com> wrote: > how will this sql support state both in streaming and batch mode > > 2017-04-07 4:54 GMT+08:00 Mingmin Xu <mingm...@gmail.com>: > > > @Tyler, there's

Re: [PROPOSAL]: a new feature branch for SQL DSL

2017-04-06 Thread Mingmin Xu
gt; > > Mingmin and I prepared a new branch to have the SQL DSL in dsls/sql > > > location. > > > > > > Any help is welcome ! > > > > > > Thanks, > > > Regards > > > JB > > > > > > > > > On 04/06/2017 06:36 PM,

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-21 Thread Mingmin Xu
ink runner support needed, nor is the > State API involved. > > Dan > > On Tue, Mar 21, 2017 at 9:01 AM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> Would not it be Flink runner specific ? >> >> Maybe the State API could do the same in a run

Re: [ANNOUNCEMENT] New committers, March 2017 edition!

2017-03-17 Thread Mingmin Xu
Congratulations to all! On Fri, Mar 17, 2017 at 2:29 PM, Jason Kuster < jasonkus...@google.com.invalid> wrote: > Congratulations to the new committers! > > On Fri, Mar 17, 2017 at 2:16 PM, Kenneth Knowles > wrote: > > > Congrats all! > > > > On Fri, Mar 17, 2017 at 2:13

Re: [BEAM-301] Add a Beam SQL DSL

2017-02-28 Thread Mingmin Xu
actual both: > > > > > > 1. an interactive SQL prompt where we can express pipeline directly > > > using SQL. > > > 2. a SQL DSL to describe a pipeline in SQL and create the corresponding > > > Java code under the hood. > > > > > > I pro

[BEAM-301] Add a Beam SQL DSL

2017-02-27 Thread Mingmin Xu
a CLI interactive way, not SQL DSL. Doc link: https://docs.google.com/document/d/1Uc5xYTpO9qsLXtT38OfuoqSLimH_0a1Bz5BsCROMzCU/edit?usp=sharing -- Mingmin Xu