[jira] [Commented] (BEAM-9198) BeamSQL aggregation analytics functionality
[ https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126168#comment-17126168 ] John Mora commented on BEAM-9198: - Hi [~kenn] I am still working on this issue. I am on the stage 2. "SQL core to implement physical relational operator" of my GSoC proposal. I have been reporting my progress to my mentor [~amaliujia] . Additionally, I have sent a PR with an experiment in order to receive feedback, a few days ago. Regards, John > BeamSQL aggregation analytics functionality > > > Key: BEAM-9198 > URL: https://issues.apache.org/jira/browse/BEAM-9198 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: John Mora >Priority: P2 > Labels: gsoc, gsoc2020, mentor > Time Spent: 50m > Remaining Estimate: 0h > > Mentor email: ruw...@google.com. Feel free to send emails for your questions. > Project Information > - > BeamSQL has a long list of of aggregation/aggregation analytics > functionalities to support. > To begin with, you will need to support this syntax: > {code:sql} > analytic_function_name ( [ argument_list ] ) > OVER ( > [ PARTITION BY partition_expression_list ] > [ ORDER BY expression [{ ASC | DESC }] [, ...] ] > [ window_frame_clause ] > ) > {code} > As there is a long list of analytics functions, a good start point is support > rank() first. > This will requires touch core components of BeamSQL: > 1. SQL parser to support the syntax above. > 2. SQL core to implement physical relational operator. > 3. Distributed algorithms to implement a list of functions in a distributed > manner. > 4. Enable in ZetaSQL dialect. > To understand what SQL analytics functionality is, you could check this great > explanation doc: > https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts. > To know about Beam's programming model, check: > https://beam.apache.org/documentation/programming-guide/#overview -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9864) Support ANY_VALUE in OVER/window clauses.
John Mora created BEAM-9864: --- Summary: Support ANY_VALUE in OVER/window clauses. Key: BEAM-9864 URL: https://issues.apache.org/jira/browse/BEAM-9864 Project: Beam Issue Type: New Feature Components: dsl-sql, dsl-sql-zetasql Reporter: John Mora Add support for the ANY_VALUE aggregation function in OVER/window clauses. Spec: [https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9198) BeamSQL aggregation analytics functions
[ https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038095#comment-17038095 ] John Mora commented on BEAM-9198: - Hi. I am John Mora, a student at UTPL, and I am interested in participating in the GSoC program. Currently, I am committer/PMC of the Apache Gora project and I have some experience with distributed storage for data analytics (i.e Apache Kudu), Java programming and SQL, so this issue caught my attention. I was wondering if you could give more information. I noticed that the SQL extensions of Beam are only implemented for the Java SDK, therefore this project only involves working in that SDK, right?. According to the documentation there are two SQL dialects (Calcite and Zeta) that are supported by Beam, will these new aggregation functions be implemented in both dialects?. Finally, are there some other implementations of aggregation functions (or similar) that I could check out in other SDKs?. I would really appreciated if you could give some resources / examples that I could analyze. Best regards, John. > BeamSQL aggregation analytics functions > > > Key: BEAM-9198 > URL: https://issues.apache.org/jira/browse/BEAM-9198 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > Labels: gsoc, gsoc2020, mentor > > BeamSQL has a long list of of aggregation/aggregation analytics > functionalities to support. > To begin with, you will need to support this syntax: > analytic_function_name ( [ argument_list ] ) > OVER ( > [ PARTITION BY partition_expression_list ] > [ ORDER BY expression [{ ASC | DESC }] [, ...] ] > [ window_frame_clause ] > ) > This will requires touch core components of BeamSQL: > 1. SQL parser to support the syntax above. > 2. SQL core to implement physical relational operator. > 3. Distributed algorithms to implement a list of functions in a distributed > manner. > 4. Build benchmarks to measure performance of your implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)