[jira] [Commented] (BEAM-9198) BeamSQL aggregation analytics functionality

2020-06-04 Thread John Mora (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126168#comment-17126168
 ] 

John Mora commented on BEAM-9198:
-

Hi [~kenn]

I am still working on this issue. I am on the stage 2. "SQL core to implement 
physical relational operator" of my GSoC proposal. I have been reporting my 
progress to my mentor [~amaliujia] . Additionally, I have sent a PR with an 
experiment in order to receive feedback, a few days ago.

Regards,
John

> BeamSQL aggregation analytics functionality 
> 
>
> Key: BEAM-9198
> URL: https://issues.apache.org/jira/browse/BEAM-9198
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: John Mora
>Priority: P2
>  Labels: gsoc, gsoc2020, mentor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Mentor email: ruw...@google.com. Feel free to send emails for your questions.
> Project Information
> -
> BeamSQL has a long list of of aggregation/aggregation analytics 
> functionalities to support. 
> To begin with, you will need to support this syntax:
> {code:sql}
> analytic_function_name ( [ argument_list ] )
>   OVER (
> [ PARTITION BY partition_expression_list ]
> [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
> [ window_frame_clause ]
>   )
> {code}
> As there is a long list of analytics functions, a good start point is support 
> rank() first.
> This will requires touch core components of BeamSQL:
> 1. SQL parser to support the syntax above.
> 2. SQL core to implement physical relational operator.
> 3. Distributed algorithms to implement a list of functions in a distributed 
> manner. 
> 4. Enable in ZetaSQL dialect.
> To understand what SQL analytics functionality is, you could check this great 
> explanation doc: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.
> To know about Beam's programming model, check: 
> https://beam.apache.org/documentation/programming-guide/#overview



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9864) Support ANY_VALUE in OVER/window clauses.

2020-04-30 Thread John Mora (Jira)
John Mora created BEAM-9864:
---

 Summary: Support ANY_VALUE in OVER/window clauses.
 Key: BEAM-9864
 URL: https://issues.apache.org/jira/browse/BEAM-9864
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql, dsl-sql-zetasql
Reporter: John Mora


Add support for the  ANY_VALUE aggregation function  in OVER/window clauses.

Spec: 
[https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9198) BeamSQL aggregation analytics functions

2020-02-16 Thread John Mora (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038095#comment-17038095
 ] 

John Mora commented on BEAM-9198:
-

Hi.

I am John Mora, a student at UTPL, and I am interested in participating in the 
GSoC program. Currently, I am committer/PMC of the Apache Gora project and I 
have some experience with distributed storage for data analytics (i.e Apache 
Kudu), Java programming and SQL, so this issue caught my attention. I was 
wondering if you could give more information.

I noticed that the SQL extensions of Beam are only implemented for the Java 
SDK, therefore this project only involves working in that SDK, right?. 
According to the documentation there are two SQL dialects (Calcite and Zeta) 
that are supported by Beam, will these new aggregation functions be implemented 
in both dialects?.

Finally, are there some other implementations of aggregation functions (or 
similar) that I could check out in other SDKs?. I would really appreciated if 
you could give some resources / examples that I could analyze.


Best regards, 
John.

> BeamSQL aggregation analytics functions 
> 
>
> Key: BEAM-9198
> URL: https://issues.apache.org/jira/browse/BEAM-9198
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>  Labels: gsoc, gsoc2020, mentor
>
> BeamSQL has a long list of of aggregation/aggregation analytics 
> functionalities to support. 
> To begin with, you will need to support this syntax:
> analytic_function_name ( [ argument_list ] )
>   OVER (
> [ PARTITION BY partition_expression_list ]
> [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
> [ window_frame_clause ]
>   )
> This will requires touch core components of BeamSQL:
> 1. SQL parser to support the syntax above.
> 2. SQL core to implement physical relational operator.
> 3. Distributed algorithms to implement a list of functions in a distributed 
> manner. 
> 4. Build benchmarks to measure performance of your implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)