Re: [Discuss][SPIP] DataSource V2 SQL push down

2021-04-09 Thread huaxin gao
I am in PST. Beijing Time is 15 hours ahead of my time. Next Monday night works for me. Let's talk about the details offline? Moving spark dev and others to bcc. On Thu, Apr 8, 2021 at 11:48 PM Chang Chen wrote: > hi huaxin > > I look into your PR, there would be a way to consolidate the file so

Re: [Discuss][SPIP] DataSource V2 SQL push down

2021-04-08 Thread Chang Chen
hi huaxin I look into your PR, there would be a way to consolidate the file source and SQL source. What's the time difference between Beijing and your timezone? I prefer next Monday night or Tuesday morning. I can share zoom. huaxin gao 于2021年4月8日周四 上午7:10写道: > Hi Chang, > > Thanks for workin

Re: [Discuss][SPIP] DataSource V2 SQL push down

2021-04-07 Thread huaxin gao
Hi Chang, Thanks for working on this. Could you please explain how your proposal can be extended to the file-based data sources? Since at least half of the Spark community are using file-based data sources, I think any designs should consider the file-based data sources as well. I work on both sq

Re: [Discuss][SPIP] DataSource V2 SQL push down

2021-04-07 Thread Chang Chen
hi huaxin please review https://github.com/apache/spark/pull/32061 as for add a *trait PrunedFilteredAggregateScan* for V1 JDBC, I delete trait, since V1 DataSource needn't support aggregation push down Chang Chen 于2021年4月5日周一 下午10:02写道: > Hi huaxin > > What I am concerned about is abstraction

Re: [Discuss][SPIP] DataSource V2 SQL push down

2021-04-05 Thread Chang Chen
Hi huaxin What I am concerned about is abstraction 1. How to extend sources.Aggregation. Because Catalyst Expression is recursion, it is very bad to define a new hierarchy, I think ScanBuilder must convert pushed expressions to its format. 2. The optimization rule is also an extended

Re: [Discuss][SPIP] DataSource V2 SQL push down

2021-04-04 Thread huaxin gao
Hello Chang, Thanks for proposing the SPIP and initiating the discussion. However, I think the problem with your proposal is that you haven’t taken into consideration file-based data sources such as parquet, ORC, etc. As far as I know, most of the Spark users have file-based data sources. As a ma

[Discuss][SPIP] DataSource V2 SQL push down

2021-04-02 Thread Chang Chen
Hi All We would like to post s SPIP of Datasource V2 SQL PushDown in Spark. Here is document link: https://olapio.atlassian.net/wiki/spaces/TeamCX/pages/2667315361/Discuss+SQL+Data+Source+V2+SQL+Push+Down?atlOrigin=eyJpIjoiOTI5NGYzYWMzMWYwNDliOWIwM2ZkODllODk4Njk2NzEiLCJwIjoiYyJ9 This SPIP aims