Li
From: pzwpzw
Reply-To: "dev@hudi.apache.org"
Date: Wednesday, January 20, 2021 at 11:52 PM
To: "dev@hudi.apache.org"
Cc: "dev@hudi.apache.org"
Subject: Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite
Hi, we have implemented the spark sql exte
hudi.apache.org"
> Date: Wednesday, January 20, 2021 at 11:52 PM
> To: "dev@hudi.apache.org"
> Cc: "dev@hudi.apache.org"
> Subject: Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite
>
> Hi, we have implemented the spark sql extension
? We can continue the discussion from there.
Thanks,
Best Regards,
Gary Li
From: pzwpzw
Reply-To: "dev@hudi.apache.org"
Date: Wednesday, January 20, 2021 at 11:52 PM
To: "dev@hudi.apache.org"
Cc: "dev@hudi.apache.org"
Subject: Re: Reply:Re: [DISCUSS] SQL Suppor
t 11:52 PM
To: "dev@hudi.apache.org"
Cc: "dev@hudi.apache.org"
Subject: Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite
Hi, we have implemented the spark sql extension for hudi in our Internal
version. Here is the main implementation, including the extension sql syn
Hi, we have implemented the spark sql extension for hudi in our Internal
version. Here is the main implementation, including the extension sql syntax
and implementation scheme on spark. I am waiting for your feedback. Any
comments are welcome~
First, I think it is necessary to improve spark sql, because the main scenario
of hudi is datalake or warehouse, and spark has strong ecological capabilities
in this field.
Second, but in the long run, Hudi needs a more general SQL layer, and it is
very necessary to embrace calcite. Then based
That's great, I can help with the Apache Calcite integration.
Vinoth Chandar 于2020年12月23日周三 上午12:29写道:
> Sounds great. There will be a RFC/DISCUSS thread once 0.7.0 is out I think.
> love to have you involved.
>
> On Tue, Dec 22, 2020 at 3:20 AM pzwpzw
> wrote:
>
> > Yes, it looks good .
> >
Sounds great. There will be a RFC/DISCUSS thread once 0.7.0 is out I think.
love to have you involved.
On Tue, Dec 22, 2020 at 3:20 AM pzwpzw
wrote:
> Yes, it looks good .
> We are building the spark sql extensions to support for hudi in
> our internal version.
> I am interested in
Yes, it looks good .
We are building the spark sql extensions to support for hudi in our internal
version.
I am interested in participating in the extension of SparkSQL on hudi.
2020年12月22日 下午4:30,Vinoth Chandar 写道:
Hi,
I think what we are landing on finally is.
- Keep pushing for SparkSQL
Yes,I think it should be ok
在 2020-12-22 16:30:37,"Vinoth Chandar" 写道:
>Hi,
>
>I think what we are landing on finally is.
>
>- Keep pushing for SparkSQL support using Spark extensions route
>- Calcite effort will be a separate/orthogonal approach, down the line
>
>Please feel free to
Hi,
I think what we are landing on finally is.
- Keep pushing for SparkSQL support using Spark extensions route
- Calcite effort will be a separate/orthogonal approach, down the line
Please feel free to correct me, if I got this wrong.
On Mon, Dec 21, 2020 at 3:30 AM pzwpzw
wrote:
> Hi 受春柏
Hi,pzwpzw
I see what you mean, it is very necessary to implement a common layer for hudi,
and we are also planning to implement sparkSQL write capabilities for SQL-based
ETL processing.Common Layer and SparkSQL Write can combine to form HUDI's SQL
capabilities
At 2020-12-21
Hi 受春柏 ,here is my point. We can use Calcite to build a common sql layer to
process engine independent SQL, for example most of the DDL、Hoodie CLI command
and also provide parser for the common SQL extensions(e.g. Merge Into). The
Engine-related syntax can be taught to the respective engines
Hi,all
That's very good,Hudi SQL syntax can support Flink、hive and other analysis
components at the same time,
But there are some questions about SparkSQL. SparkSQL syntax is in conflict
with Calctite syntax.Is our strategy
user migration or syntax compatibility?
In addition ,will it also
That’s awesome. Looks like we have a consensus on Calcite. Look forward to the
RFC as well!
-Nishith
> On Dec 18, 2020, at 9:03 AM, Vinoth Chandar wrote:
>
> Sounds good. Look forward to a RFC/DISCUSS thread.
>
> Thanks
> Vinoth
>
>> On Thu, Dec 17, 2020 at 6:04 PM Danny Chan wrote:
>>
Sounds good. Look forward to a RFC/DISCUSS thread.
Thanks
Vinoth
On Thu, Dec 17, 2020 at 6:04 PM Danny Chan wrote:
> Yes, Apache Flink basically reuse the DQL syntax of Apache Calcite, i would
> add support for SQL connectors of Hoodie Flink soon ~
> Currently, i'm preparing a refactoring to
Yes, Apache Flink basically reuse the DQL syntax of Apache Calcite, i would
add support for SQL connectors of Hoodie Flink soon ~
Currently, i'm preparing a refactoring to the current Flink writer code.
Vinoth Chandar 于2020年12月18日周五 上午6:39写道:
> Thanks Kabeer for the note on gmail. Did not
Thanks Kabeer for the note on gmail. Did not realize that. :)
>> My desired use case is user use the Hoodie CLI to execute these SQLs.
They can choose what engine to use by a CLI config option.
Yes, that is also another attractive aspect of this route. We can build out
a common SQL layer and
I think Dongwook is investigating on the same lines. and it does seem
better to pursue this first, before trying other approaches.
On Tue, Dec 15, 2020 at 1:38 AM pzwpzw
wrote:
>Yeah I agree with Nishith that an option way is to look at the ways to
> plug in custom logical and physical
+1 for Calcite
Best,
Vino
David Sheard 于2020年12月17日周四 下午2:15写道:
> I agree with Calcite
>
> On Thu, 17 Dec 2020 at 5:04 pm, Danny Chan wrote:
>
> > Apache Calcite is a good candidate for parsing and executing the SQL,
> > Apache Flink has an extension for the SQL based on the Calcite parser
>
I agree with Calcite
On Thu, 17 Dec 2020 at 5:04 pm, Danny Chan wrote:
> Apache Calcite is a good candidate for parsing and executing the SQL,
> Apache Flink has an extension for the SQL based on the Calcite parser [1],
>
> > users will write : hudiSparkSession.sql("UPDATE ")
>
> Should
Apache Calcite is a good candidate for parsing and executing the SQL,
Apache Flink has an extension for the SQL based on the Calcite parser [1],
> users will write : hudiSparkSession.sql("UPDATE ")
Should user still need to instatiate the hudiSparkSession first ? My
desired use case is user
Vinoth and All,
Users on gmail might be missing out on these emails as Gmail is down and emails
sent to gmail.com domain are bouncing back.
At 11pm UK time below is the google update:
https://www.google.com/appsstatus#hl=en=issue=1=a8b67908fadee664c68c240ff9f529ab
Best to bump this thread again
Yeah I agree with Nishith that an option way is to look at the ways to plug
in custom logical and physical plans in Spark. It can simplify the
implementation and reuse the Spark SQL syntax. And also users familiar with
Spark SQL will be able to use HUDi's SQL features more quickly.
In
Thanks for starting this thread Vinoth.
In general, definitely see the need for SQL style semantics on Hudi tables.
Apache Calcite is a great option to considering given DatasourceV2 has the
limitations that you described.
Additionally, even if Spark DatasourceV2 allowed for the flexibility,
Hello all,
Just bumping this thread again
thanks
vinoth
On Thu, Dec 10, 2020 at 11:58 PM Vinoth Chandar wrote:
> Hello all,
>
> One feature that keeps coming up is the ability to use UPDATE, MERGE sql
> syntax to support writing into Hudi tables. We have looked into the Spark 3
> DataSource
Hello all,
One feature that keeps coming up is the ability to use UPDATE, MERGE sql
syntax to support writing into Hudi tables. We have looked into the Spark 3
DataSource V2 APIs as well and found several issues that hinder us in
implementing this via the Spark APIs
- As of this writing, the
27 matches
Mail list logo