Re: [DISCUSS] Change data feed for spark sql

2022-02-14 Thread Vinoth Chandar
Hi all,

I would love to not introduce new constructs like "timestamp", "snapshots.
Hudi already has a clear notion of commit times, that can unlock this.
Can we just use this as an opportunity to standardize the incremental
query's schema?
In fact, don't we already have change feed with our incremental query - we
need to emit delete records, and the old images of records.
Those are the only gaps I see.

+1 for an RFC. I would be happy to jam on the design!

Thanks
Vinoth

On Mon, Feb 14, 2022 at 6:25 AM Sivabalan  wrote:

> +1 for the feature. I see a lot of benefits like clustering, index
> building etc.
>
> On Sun, 13 Feb 2022 at 22:21, leesf  wrote:
> >
> > +1 for the feature.
> >
> > vino yang  于2022年2月12日周六 22:14写道:
> >
> > > +1 for this feature, looking forward to share more details or design
> doc.
> > >
> > > Best,
> > > Vino
> > >
> > > Xianghu Wang  于2022年2月12日周六 17:06写道:
> > >
> > > > this is definitely a great feature
> > > >  +1
> > > >
> > > > On 2022/02/12 02:32:32 Forward Xu wrote:
> > > > > Hi All,
> > > > >
> > > > > I want to support change data feed for to spark sql, This feature
> can
> > > be
> > > > > achieved in two ways.
> > > > >
> > > > > 1. Call Procedure Command
> > > > > sql syntax
> > > > > CALL system.table_changes('tableName',  start_timestamp,
> end_timestamp)
> > > > > example:
> > > > > CALL system.table_changes('tableName', TIMESTAMP '2021-01-23
> 04:30:45',
> > > > > TIMESTAMP '2021-02-23 6:00:00')
> > > > >
> > > > > 2. Support querying MOR(CDC) table as of a savepoint
> > > > > SELECT * FROM A.B TIMESTAMP AS OF 1643119574;
> > > > > SELECT * FROM A.B TIMESTAMP AS OF '2019-01-29 00:37:58' ;
> > > > >
> > > > > SELECT * FROM A.B TIMESTAMP AS OF '2019-01-29 00:37:58'  AND
> > > '2021-02-23
> > > > > 6:00:00' ;
> > > > > SELECT * FROM A.B VERSION AS OF 'Snapshot123456789';
> > > > >
> > > > > Any feedback is welcome!
> > > > >
> > > > > Thank you.
> > > > >
> > > > > Regards,
> > > > > Forward Xu
> > > > >
> > > > > Related Links:
> > > > > [1] Call Procedure Command <
> > > > https://issues.apache.org/jira/browse/HUDI-3161>
> > > > > [2] Support querying a table as of a savepoint
> > > > > 
> > > > > [3] Change data feed
> > > > > <
> > > >
> > >
> https://docs.databricks.com/delta/delta-change-data-feed.html#language-sql
> > > > >
> > > > >
> > > >
> > >
>
>
>
> --
> Regards,
> -Sivabalan
>


Re: [DISCUSS] Change data feed for spark sql

2022-02-14 Thread Sivabalan
+1 for the feature. I see a lot of benefits like clustering, index
building etc.

On Sun, 13 Feb 2022 at 22:21, leesf  wrote:
>
> +1 for the feature.
>
> vino yang  于2022年2月12日周六 22:14写道:
>
> > +1 for this feature, looking forward to share more details or design doc.
> >
> > Best,
> > Vino
> >
> > Xianghu Wang  于2022年2月12日周六 17:06写道:
> >
> > > this is definitely a great feature
> > >  +1
> > >
> > > On 2022/02/12 02:32:32 Forward Xu wrote:
> > > > Hi All,
> > > >
> > > > I want to support change data feed for to spark sql, This feature can
> > be
> > > > achieved in two ways.
> > > >
> > > > 1. Call Procedure Command
> > > > sql syntax
> > > > CALL system.table_changes('tableName',  start_timestamp, end_timestamp)
> > > > example:
> > > > CALL system.table_changes('tableName', TIMESTAMP '2021-01-23 04:30:45',
> > > > TIMESTAMP '2021-02-23 6:00:00')
> > > >
> > > > 2. Support querying MOR(CDC) table as of a savepoint
> > > > SELECT * FROM A.B TIMESTAMP AS OF 1643119574;
> > > > SELECT * FROM A.B TIMESTAMP AS OF '2019-01-29 00:37:58' ;
> > > >
> > > > SELECT * FROM A.B TIMESTAMP AS OF '2019-01-29 00:37:58'  AND
> > '2021-02-23
> > > > 6:00:00' ;
> > > > SELECT * FROM A.B VERSION AS OF 'Snapshot123456789';
> > > >
> > > > Any feedback is welcome!
> > > >
> > > > Thank you.
> > > >
> > > > Regards,
> > > > Forward Xu
> > > >
> > > > Related Links:
> > > > [1] Call Procedure Command <
> > > https://issues.apache.org/jira/browse/HUDI-3161>
> > > > [2] Support querying a table as of a savepoint
> > > > 
> > > > [3] Change data feed
> > > > <
> > >
> > https://docs.databricks.com/delta/delta-change-data-feed.html#language-sql
> > > >
> > > >
> > >
> >



-- 
Regards,
-Sivabalan


Re: [DISCUSS] Change data feed for spark sql

2022-02-13 Thread leesf
+1 for the feature.

vino yang  于2022年2月12日周六 22:14写道:

> +1 for this feature, looking forward to share more details or design doc.
>
> Best,
> Vino
>
> Xianghu Wang  于2022年2月12日周六 17:06写道:
>
> > this is definitely a great feature
> >  +1
> >
> > On 2022/02/12 02:32:32 Forward Xu wrote:
> > > Hi All,
> > >
> > > I want to support change data feed for to spark sql, This feature can
> be
> > > achieved in two ways.
> > >
> > > 1. Call Procedure Command
> > > sql syntax
> > > CALL system.table_changes('tableName',  start_timestamp, end_timestamp)
> > > example:
> > > CALL system.table_changes('tableName', TIMESTAMP '2021-01-23 04:30:45',
> > > TIMESTAMP '2021-02-23 6:00:00')
> > >
> > > 2. Support querying MOR(CDC) table as of a savepoint
> > > SELECT * FROM A.B TIMESTAMP AS OF 1643119574;
> > > SELECT * FROM A.B TIMESTAMP AS OF '2019-01-29 00:37:58' ;
> > >
> > > SELECT * FROM A.B TIMESTAMP AS OF '2019-01-29 00:37:58'  AND
> '2021-02-23
> > > 6:00:00' ;
> > > SELECT * FROM A.B VERSION AS OF 'Snapshot123456789';
> > >
> > > Any feedback is welcome!
> > >
> > > Thank you.
> > >
> > > Regards,
> > > Forward Xu
> > >
> > > Related Links:
> > > [1] Call Procedure Command <
> > https://issues.apache.org/jira/browse/HUDI-3161>
> > > [2] Support querying a table as of a savepoint
> > > 
> > > [3] Change data feed
> > > <
> >
> https://docs.databricks.com/delta/delta-change-data-feed.html#language-sql
> > >
> > >
> >
>


Re: [DISCUSS] Change data feed for spark sql

2022-02-12 Thread vino yang
+1 for this feature, looking forward to share more details or design doc.

Best,
Vino

Xianghu Wang  于2022年2月12日周六 17:06写道:

> this is definitely a great feature
>  +1
>
> On 2022/02/12 02:32:32 Forward Xu wrote:
> > Hi All,
> >
> > I want to support change data feed for to spark sql, This feature can be
> > achieved in two ways.
> >
> > 1. Call Procedure Command
> > sql syntax
> > CALL system.table_changes('tableName',  start_timestamp, end_timestamp)
> > example:
> > CALL system.table_changes('tableName', TIMESTAMP '2021-01-23 04:30:45',
> > TIMESTAMP '2021-02-23 6:00:00')
> >
> > 2. Support querying MOR(CDC) table as of a savepoint
> > SELECT * FROM A.B TIMESTAMP AS OF 1643119574;
> > SELECT * FROM A.B TIMESTAMP AS OF '2019-01-29 00:37:58' ;
> >
> > SELECT * FROM A.B TIMESTAMP AS OF '2019-01-29 00:37:58'  AND '2021-02-23
> > 6:00:00' ;
> > SELECT * FROM A.B VERSION AS OF 'Snapshot123456789';
> >
> > Any feedback is welcome!
> >
> > Thank you.
> >
> > Regards,
> > Forward Xu
> >
> > Related Links:
> > [1] Call Procedure Command <
> https://issues.apache.org/jira/browse/HUDI-3161>
> > [2] Support querying a table as of a savepoint
> > 
> > [3] Change data feed
> > <
> https://docs.databricks.com/delta/delta-change-data-feed.html#language-sql
> >
> >
>


Re: [DISCUSS] Change data feed for spark sql

2022-02-12 Thread Xianghu Wang
this is definitely a great feature
 +1

On 2022/02/12 02:32:32 Forward Xu wrote:
> Hi All,
> 
> I want to support change data feed for to spark sql, This feature can be
> achieved in two ways.
> 
> 1. Call Procedure Command
> sql syntax
> CALL system.table_changes('tableName',  start_timestamp, end_timestamp)
> example:
> CALL system.table_changes('tableName', TIMESTAMP '2021-01-23 04:30:45',
> TIMESTAMP '2021-02-23 6:00:00')
> 
> 2. Support querying MOR(CDC) table as of a savepoint
> SELECT * FROM A.B TIMESTAMP AS OF 1643119574;
> SELECT * FROM A.B TIMESTAMP AS OF '2019-01-29 00:37:58' ;
> 
> SELECT * FROM A.B TIMESTAMP AS OF '2019-01-29 00:37:58'  AND '2021-02-23
> 6:00:00' ;
> SELECT * FROM A.B VERSION AS OF 'Snapshot123456789';
> 
> Any feedback is welcome!
> 
> Thank you.
> 
> Regards,
> Forward Xu
> 
> Related Links:
> [1] Call Procedure Command 
> [2] Support querying a table as of a savepoint
> 
> [3] Change data feed
> 
> 


[DISCUSS] Change data feed for spark sql

2022-02-11 Thread Forward Xu
Hi All,

I want to support change data feed for to spark sql, This feature can be
achieved in two ways.

1. Call Procedure Command
sql syntax
CALL system.table_changes('tableName',  start_timestamp, end_timestamp)
example:
CALL system.table_changes('tableName', TIMESTAMP '2021-01-23 04:30:45',
TIMESTAMP '2021-02-23 6:00:00')

2. Support querying MOR(CDC) table as of a savepoint
SELECT * FROM A.B TIMESTAMP AS OF 1643119574;
SELECT * FROM A.B TIMESTAMP AS OF '2019-01-29 00:37:58' ;

SELECT * FROM A.B TIMESTAMP AS OF '2019-01-29 00:37:58'  AND '2021-02-23
6:00:00' ;
SELECT * FROM A.B VERSION AS OF 'Snapshot123456789';

Any feedback is welcome!

Thank you.

Regards,
Forward Xu

Related Links:
[1] Call Procedure Command 
[2] Support querying a table as of a savepoint

[3] Change data feed