Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2021-01-21 Thread pzwpzw
Li From: pzwpzw Reply-To: "dev@hudi.apache.org" Date: Wednesday, January 20, 2021 at 11:52 PM To: "dev@hudi.apache.org" Cc: "dev@hudi.apache.org" Subject: Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite Hi, we have implemented the spark sql exte

Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2021-01-21 Thread vino yang
hudi.apache.org" > Date: Wednesday, January 20, 2021 at 11:52 PM > To: "dev@hudi.apache.org" > Cc: "dev@hudi.apache.org" > Subject: Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite > > Hi, we have implemented the spark sql extension

Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2021-01-21 Thread pzwpzw
? We can continue the discussion from there. Thanks, Best Regards, Gary Li From: pzwpzw Reply-To: "dev@hudi.apache.org" Date: Wednesday, January 20, 2021 at 11:52 PM To: "dev@hudi.apache.org" Cc: "dev@hudi.apache.org" Subject: Re: Reply:Re: [DISCUSS] SQL Suppor

Re:  Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2021-01-21 Thread Gary Li
t 11:52 PM To: "dev@hudi.apache.org" Cc: "dev@hudi.apache.org" Subject: Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite Hi, we have implemented the spark sql extension for hudi in our Internal version. Here is the main implementation, including the extension sql syn

Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2021-01-20 Thread pzwpzw
Hi, we have implemented the spark sql extension for hudi in our Internal version. Here is the main implementation, including the extension sql syntax and implementation scheme  on spark. I am waiting for your feedback. Any comments are welcome~

Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-28 Thread wei li
First, I think it is necessary to improve spark sql, because the main scenario of hudi is datalake or warehouse, and spark has strong ecological capabilities in this field. Second, but in the long run, Hudi needs a more general SQL layer, and it is very necessary to embrace calcite. Then based

Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-22 Thread Danny Chan
That's great, I can help with the Apache Calcite integration. Vinoth Chandar 于2020年12月23日周三 上午12:29写道: > Sounds great. There will be a RFC/DISCUSS thread once 0.7.0 is out I think. > love to have you involved. > > On Tue, Dec 22, 2020 at 3:20 AM pzwpzw > wrote: > > > Yes, it looks good . > >

Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-22 Thread Vinoth Chandar
Sounds great. There will be a RFC/DISCUSS thread once 0.7.0 is out I think. love to have you involved. On Tue, Dec 22, 2020 at 3:20 AM pzwpzw wrote: > Yes, it looks good . > We are building the spark sql extensions to support for hudi in > our internal version. > I am interested in

Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-22 Thread pzwpzw
Yes, it looks good . We are building the spark sql extensions to support for hudi in our internal version. I am interested in participating in the extension of SparkSQL on hudi. 2020年12月22日 下午4:30,Vinoth Chandar 写道: Hi, I think what we are landing on finally is. - Keep pushing for SparkSQL

Reply:Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-22 Thread 受春柏
Yes,I think it should be ok 在 2020-12-22 16:30:37,"Vinoth Chandar" 写道: >Hi, > >I think what we are landing on finally is. > >- Keep pushing for SparkSQL support using Spark extensions route >- Calcite effort will be a separate/orthogonal approach, down the line > >Please feel free to

Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-22 Thread Vinoth Chandar
Hi, I think what we are landing on finally is. - Keep pushing for SparkSQL support using Spark extensions route - Calcite effort will be a separate/orthogonal approach, down the line Please feel free to correct me, if I got this wrong. On Mon, Dec 21, 2020 at 3:30 AM pzwpzw wrote: > Hi 受春柏

Reply:Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-21 Thread 受春柏
Hi,pzwpzw I see what you mean, it is very necessary to implement a common layer for hudi, and we are also planning to implement sparkSQL write capabilities for SQL-based ETL processing.Common Layer and SparkSQL Write can combine to form HUDI's SQL capabilities At 2020-12-21

Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-21 Thread pzwpzw
Hi 受春柏 ,here is my point. We can use Calcite to build a common sql layer to process engine independent SQL,  for example most of the DDL、Hoodie CLI command and also provide parser for the common SQL extensions(e.g. Merge Into). The Engine-related syntax can be taught to the respective engines

Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-21 Thread 受春柏
Hi,all That's very good,Hudi SQL syntax can support Flink、hive and other analysis components at the same time, But there are some questions about SparkSQL. SparkSQL syntax is in conflict with Calctite syntax.Is our strategy user migration or syntax compatibility? In addition ,will it also

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-18 Thread Nishith
That’s awesome. Looks like we have a consensus on Calcite. Look forward to the RFC as well! -Nishith > On Dec 18, 2020, at 9:03 AM, Vinoth Chandar wrote: > > Sounds good. Look forward to a RFC/DISCUSS thread. > > Thanks > Vinoth > >> On Thu, Dec 17, 2020 at 6:04 PM Danny Chan wrote: >>

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-18 Thread Vinoth Chandar
Sounds good. Look forward to a RFC/DISCUSS thread. Thanks Vinoth On Thu, Dec 17, 2020 at 6:04 PM Danny Chan wrote: > Yes, Apache Flink basically reuse the DQL syntax of Apache Calcite, i would > add support for SQL connectors of Hoodie Flink soon ~ > Currently, i'm preparing a refactoring to

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-17 Thread Danny Chan
Yes, Apache Flink basically reuse the DQL syntax of Apache Calcite, i would add support for SQL connectors of Hoodie Flink soon ~ Currently, i'm preparing a refactoring to the current Flink writer code. Vinoth Chandar 于2020年12月18日周五 上午6:39写道: > Thanks Kabeer for the note on gmail. Did not

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-17 Thread Vinoth Chandar
Thanks Kabeer for the note on gmail. Did not realize that. :) >> My desired use case is user use the Hoodie CLI to execute these SQLs. They can choose what engine to use by a CLI config option. Yes, that is also another attractive aspect of this route. We can build out a common SQL layer and

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-17 Thread Vinoth Chandar
I think Dongwook is investigating on the same lines. and it does seem better to pursue this first, before trying other approaches. On Tue, Dec 15, 2020 at 1:38 AM pzwpzw wrote: >Yeah I agree with Nishith that an option way is to look at the ways to > plug in custom logical and physical

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-17 Thread vino yang
+1 for Calcite Best, Vino David Sheard 于2020年12月17日周四 下午2:15写道: > I agree with Calcite > > On Thu, 17 Dec 2020 at 5:04 pm, Danny Chan wrote: > > > Apache Calcite is a good candidate for parsing and executing the SQL, > > Apache Flink has an extension for the SQL based on the Calcite parser >

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-16 Thread David Sheard
I agree with Calcite On Thu, 17 Dec 2020 at 5:04 pm, Danny Chan wrote: > Apache Calcite is a good candidate for parsing and executing the SQL, > Apache Flink has an extension for the SQL based on the Calcite parser [1], > > > users will write : hudiSparkSession.sql("UPDATE ") > > Should

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-16 Thread Danny Chan
Apache Calcite is a good candidate for parsing and executing the SQL, Apache Flink has an extension for the SQL based on the Calcite parser [1], > users will write : hudiSparkSession.sql("UPDATE ") Should user still need to instatiate the hudiSparkSession first ? My desired use case is user

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-15 Thread Kabeer Ahmed
Vinoth and All, Users on gmail might be missing out on these emails as Gmail is down and emails sent to gmail.com domain are bouncing back. At 11pm UK time below is the google update: https://www.google.com/appsstatus#hl=en=issue=1=a8b67908fadee664c68c240ff9f529ab Best to bump this thread again

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-15 Thread pzwpzw
   Yeah I agree with Nishith that an option way is to look at the ways to plug in custom logical and physical plans in Spark. It can simplify the implementation and reuse the Spark SQL syntax. And also users familiar with Spark SQL will be able to use HUDi's SQL features more quickly. In

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-14 Thread Nishith
Thanks for starting this thread Vinoth. In general, definitely see the need for SQL style semantics on Hudi tables. Apache Calcite is a great option to considering given DatasourceV2 has the limitations that you described. Additionally, even if Spark DatasourceV2 allowed for the flexibility,

Re: [DISCUSS] SQL Support using Apache Calcite

2020-12-14 Thread Vinoth Chandar
Hello all, Just bumping this thread again thanks vinoth On Thu, Dec 10, 2020 at 11:58 PM Vinoth Chandar wrote: > Hello all, > > One feature that keeps coming up is the ability to use UPDATE, MERGE sql > syntax to support writing into Hudi tables. We have looked into the Spark 3 > DataSource

[DISCUSS] SQL Support using Apache Calcite

2020-12-10 Thread Vinoth Chandar
Hello all, One feature that keeps coming up is the ability to use UPDATE, MERGE sql syntax to support writing into Hudi tables. We have looked into the Spark 3 DataSource V2 APIs as well and found several issues that hinder us in implementing this via the Spark APIs - As of this writing, the