Re: [DISCUSS] New RFC to support 'Snapshot view management'
Hi Sagar, HMS shouldn't be the core part, the external table location will depend on which metastore the user is using. I'm still working on it, will add more detail in this RFC pr. https://github.com/apache/hudi/pull/6576 On Fri, 16 Sept 2022 at 11:28, sagar sumit wrote: > Automatic lifecycle management based on a few configurations > would be very useful for the community. > > I read the description in > https://issues.apache.org/jira/browse/HUDI-4677 > May I ask the rationale for choosing > Hive Metastore to manage the snapshots? > > Perhaps, RFC would have more details. Looking forward to it! > > Regards, > Sagar > > > On Wed, Sep 14, 2022 at 8:13 AM 冯健 wrote: > > > Hi Ethan, > > > > Yes, based on the current situation, we still need to do much extra > > work to provide snapshot view feature for the users( or users do this by > > themself) > > . I plan to merge the COW part of this feature to 0.13.0 at least. > will > > consider your suggestion if time is tight > > Thanks > > > > > > > > On Wed, 14 Sept 2022 at 03:02, Y Ethan Guo wrote: > > > > > Hi Feng Jian, > > > > > > Looking forward to the RFC! Is the snapshot view management more like > > > managing commits / savepoints in the Hudi timeline and hiding Hudi > > > internals from the users? > > > > > > Do you plan to merge the implementation of snapshot view and lifecycle > > > management for the next major release (0.13.0)? Timeline-wise, if time > > is > > > tight, you may also consider scoping out a subset of features to target > > > 0.13.0. > > > > > > Best, > > > - Ethan > > > > > > On Mon, Sep 12, 2022 at 10:43 PM Sivabalan wrote: > > > > > > > Sounds like a nice feature to have. Eagerly looking forward for the > > RFC. > > > > > > > > On Sat, 27 Aug 2022 at 20:51, 冯健 wrote: > > > > > > > > > I attached the image in this Jira Epic > > > > > https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is > WIP, > > > > will > > > > > create a pr in the next few days > > > > > Yeah, the basic idea is to implement lifecycle management based on > > the > > > > > savepoint and time travel features, providing new ways for the user > > to > > > > > operate > > > > > and coordinate. won't propose any new concept > > > > > > > > > > On Sun, 28 Aug 2022 at 02:06, Shiyan Xu < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > The dev email list does not support showing images unfortunately. > > you > > > > may > > > > > > want to put it behind a link. > > > > > > > > > > > > As for the idea itself, > > > > > > > > > > > > What I plan to do is to let Hudi support release a snapshot view > > and > > > > > > > lifecycle management out-of-box. > > > > > > > > > > > > > > > > > > Are you planning to extend the savepoint feature to have > lifecycle > > > > mgmt > > > > > > capabilities? We should consolidate overlapping features > properly. > > > > > > > > > > > > On Sun, Aug 21, 2022 at 12:59 PM 冯健 > wrote: > > > > > > > > > > > > > Hi team, > > > > > > > [image: image.png] > > > > > > > for the snapshot view scenario, Hudi already provides two > key > > > > > > > features to support it: > > > > > > > > > > > > > >- Time travel: user provides a timestamp to query a specific > > > > > snapshot > > > > > > >view of a Hudi table > > > > > > >- Savepoint/restore: "savepoint" saves the table as of the > > > commit > > > > > time > > > > > > >so that it lets you restore the table to this savepoint at a > > > later > > > > > > point in > > > > > > >time if need be. but in this case, the user usually uses > this > > to > > > > > > prevent > > > > > > >cleaning snapshot view at a specific timestamp, only clean > > > unused > > > > > > files > > > > > > > > > > > > > > The situation is there some inconvenience for users if use them > > > > > directly > > > > > > > > > > > > > >- Usually users incline to use a meaningful name instead of > > > > querying > > > > > > >Hudi table with a timestamp, using the timestamp in SQL may > > lead > > > > to > > > > > > the > > > > > > >wrong snapshot view being used. for example, we can announce > > > that > > > > a > > > > > > new tag > > > > > > >of hudi table with table_nameMMDD was released, then the > > > user > > > > > can > > > > > > use > > > > > > >this new table name to query. > > > > > > >- Savepoint is not designed for this "snapshot view" > scenario > > in > > > > the > > > > > > >beginning, it is designed for disaster recovery. let's say a > > new > > > > > > snapshot > > > > > > >view will be created every day, and it has 7 days retention, > > we > > > > > should > > > > > > >support lifecycle management on top of it. > > > > > > > > > > > > > > What I plan to do is to let Hudi support release a snapshot > view > > > and > > > > > > > lifecycle management out-of-box. We have already done some work > > > when > > > > > > > supporting customers' snapshot view requirements in my company, > > and > > > > > hope
Re: [DISCUSS] New RFC to support 'Snapshot view management'
Automatic lifecycle management based on a few configurations would be very useful for the community. I read the description in https://issues.apache.org/jira/browse/HUDI-4677 May I ask the rationale for choosing Hive Metastore to manage the snapshots? Perhaps, RFC would have more details. Looking forward to it! Regards, Sagar On Wed, Sep 14, 2022 at 8:13 AM 冯健 wrote: > Hi Ethan, > > Yes, based on the current situation, we still need to do much extra > work to provide snapshot view feature for the users( or users do this by > themself) > . I plan to merge the COW part of this feature to 0.13.0 at least. will > consider your suggestion if time is tight > Thanks > > > > On Wed, 14 Sept 2022 at 03:02, Y Ethan Guo wrote: > > > Hi Feng Jian, > > > > Looking forward to the RFC! Is the snapshot view management more like > > managing commits / savepoints in the Hudi timeline and hiding Hudi > > internals from the users? > > > > Do you plan to merge the implementation of snapshot view and lifecycle > > management for the next major release (0.13.0)? Timeline-wise, if time > is > > tight, you may also consider scoping out a subset of features to target > > 0.13.0. > > > > Best, > > - Ethan > > > > On Mon, Sep 12, 2022 at 10:43 PM Sivabalan wrote: > > > > > Sounds like a nice feature to have. Eagerly looking forward for the > RFC. > > > > > > On Sat, 27 Aug 2022 at 20:51, 冯健 wrote: > > > > > > > I attached the image in this Jira Epic > > > > https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is WIP, > > > will > > > > create a pr in the next few days > > > > Yeah, the basic idea is to implement lifecycle management based on > the > > > > savepoint and time travel features, providing new ways for the user > to > > > > operate > > > > and coordinate. won't propose any new concept > > > > > > > > On Sun, 28 Aug 2022 at 02:06, Shiyan Xu > > > > > wrote: > > > > > > > > > The dev email list does not support showing images unfortunately. > you > > > may > > > > > want to put it behind a link. > > > > > > > > > > As for the idea itself, > > > > > > > > > > What I plan to do is to let Hudi support release a snapshot view > and > > > > > > lifecycle management out-of-box. > > > > > > > > > > > > > > > Are you planning to extend the savepoint feature to have lifecycle > > > mgmt > > > > > capabilities? We should consolidate overlapping features properly. > > > > > > > > > > On Sun, Aug 21, 2022 at 12:59 PM 冯健 wrote: > > > > > > > > > > > Hi team, > > > > > > [image: image.png] > > > > > > for the snapshot view scenario, Hudi already provides two key > > > > > > features to support it: > > > > > > > > > > > >- Time travel: user provides a timestamp to query a specific > > > > snapshot > > > > > >view of a Hudi table > > > > > >- Savepoint/restore: "savepoint" saves the table as of the > > commit > > > > time > > > > > >so that it lets you restore the table to this savepoint at a > > later > > > > > point in > > > > > >time if need be. but in this case, the user usually uses this > to > > > > > prevent > > > > > >cleaning snapshot view at a specific timestamp, only clean > > unused > > > > > files > > > > > > > > > > > > The situation is there some inconvenience for users if use them > > > > directly > > > > > > > > > > > >- Usually users incline to use a meaningful name instead of > > > querying > > > > > >Hudi table with a timestamp, using the timestamp in SQL may > lead > > > to > > > > > the > > > > > >wrong snapshot view being used. for example, we can announce > > that > > > a > > > > > new tag > > > > > >of hudi table with table_nameMMDD was released, then the > > user > > > > can > > > > > use > > > > > >this new table name to query. > > > > > >- Savepoint is not designed for this "snapshot view" scenario > in > > > the > > > > > >beginning, it is designed for disaster recovery. let's say a > new > > > > > snapshot > > > > > >view will be created every day, and it has 7 days retention, > we > > > > should > > > > > >support lifecycle management on top of it. > > > > > > > > > > > > What I plan to do is to let Hudi support release a snapshot view > > and > > > > > > lifecycle management out-of-box. We have already done some work > > when > > > > > > supporting customers' snapshot view requirements in my company, > and > > > > hope > > > > > to > > > > > > land this feature in Community too. > > > > > > > > > > > > Please feel free to let me know if you have any idea about this. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jian Feng > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best, > > > > > Shiyan > > > > > > > > > > > > > > > > > > -- > > > Regards, > > > -Sivabalan > > > > > >
Re: [DISCUSS] New RFC to support 'Snapshot view management'
Hi Ethan, Yes, based on the current situation, we still need to do much extra work to provide snapshot view feature for the users( or users do this by themself) . I plan to merge the COW part of this feature to 0.13.0 at least. will consider your suggestion if time is tight Thanks On Wed, 14 Sept 2022 at 03:02, Y Ethan Guo wrote: > Hi Feng Jian, > > Looking forward to the RFC! Is the snapshot view management more like > managing commits / savepoints in the Hudi timeline and hiding Hudi > internals from the users? > > Do you plan to merge the implementation of snapshot view and lifecycle > management for the next major release (0.13.0)? Timeline-wise, if time is > tight, you may also consider scoping out a subset of features to target > 0.13.0. > > Best, > - Ethan > > On Mon, Sep 12, 2022 at 10:43 PM Sivabalan wrote: > > > Sounds like a nice feature to have. Eagerly looking forward for the RFC. > > > > On Sat, 27 Aug 2022 at 20:51, 冯健 wrote: > > > > > I attached the image in this Jira Epic > > > https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is WIP, > > will > > > create a pr in the next few days > > > Yeah, the basic idea is to implement lifecycle management based on the > > > savepoint and time travel features, providing new ways for the user to > > > operate > > > and coordinate. won't propose any new concept > > > > > > On Sun, 28 Aug 2022 at 02:06, Shiyan Xu > > > wrote: > > > > > > > The dev email list does not support showing images unfortunately. you > > may > > > > want to put it behind a link. > > > > > > > > As for the idea itself, > > > > > > > > What I plan to do is to let Hudi support release a snapshot view and > > > > > lifecycle management out-of-box. > > > > > > > > > > > > Are you planning to extend the savepoint feature to have lifecycle > > mgmt > > > > capabilities? We should consolidate overlapping features properly. > > > > > > > > On Sun, Aug 21, 2022 at 12:59 PM 冯健 wrote: > > > > > > > > > Hi team, > > > > > [image: image.png] > > > > > for the snapshot view scenario, Hudi already provides two key > > > > > features to support it: > > > > > > > > > >- Time travel: user provides a timestamp to query a specific > > > snapshot > > > > >view of a Hudi table > > > > >- Savepoint/restore: "savepoint" saves the table as of the > commit > > > time > > > > >so that it lets you restore the table to this savepoint at a > later > > > > point in > > > > >time if need be. but in this case, the user usually uses this to > > > > prevent > > > > >cleaning snapshot view at a specific timestamp, only clean > unused > > > > files > > > > > > > > > > The situation is there some inconvenience for users if use them > > > directly > > > > > > > > > >- Usually users incline to use a meaningful name instead of > > querying > > > > >Hudi table with a timestamp, using the timestamp in SQL may lead > > to > > > > the > > > > >wrong snapshot view being used. for example, we can announce > that > > a > > > > new tag > > > > >of hudi table with table_nameMMDD was released, then the > user > > > can > > > > use > > > > >this new table name to query. > > > > >- Savepoint is not designed for this "snapshot view" scenario in > > the > > > > >beginning, it is designed for disaster recovery. let's say a new > > > > snapshot > > > > >view will be created every day, and it has 7 days retention, we > > > should > > > > >support lifecycle management on top of it. > > > > > > > > > > What I plan to do is to let Hudi support release a snapshot view > and > > > > > lifecycle management out-of-box. We have already done some work > when > > > > > supporting customers' snapshot view requirements in my company, and > > > hope > > > > to > > > > > land this feature in Community too. > > > > > > > > > > Please feel free to let me know if you have any idea about this. > > > > > > > > > > Thanks, > > > > > > > > > > Jian Feng > > > > > > > > > > > > > > > > > -- > > > > Best, > > > > Shiyan > > > > > > > > > > > > > -- > > Regards, > > -Sivabalan > > >
Re: [DISCUSS] New RFC to support 'Snapshot view management'
Hi Feng Jian, Looking forward to the RFC! Is the snapshot view management more like managing commits / savepoints in the Hudi timeline and hiding Hudi internals from the users? Do you plan to merge the implementation of snapshot view and lifecycle management for the next major release (0.13.0)? Timeline-wise, if time is tight, you may also consider scoping out a subset of features to target 0.13.0. Best, - Ethan On Mon, Sep 12, 2022 at 10:43 PM Sivabalan wrote: > Sounds like a nice feature to have. Eagerly looking forward for the RFC. > > On Sat, 27 Aug 2022 at 20:51, 冯健 wrote: > > > I attached the image in this Jira Epic > > https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is WIP, > will > > create a pr in the next few days > > Yeah, the basic idea is to implement lifecycle management based on the > > savepoint and time travel features, providing new ways for the user to > > operate > > and coordinate. won't propose any new concept > > > > On Sun, 28 Aug 2022 at 02:06, Shiyan Xu > > wrote: > > > > > The dev email list does not support showing images unfortunately. you > may > > > want to put it behind a link. > > > > > > As for the idea itself, > > > > > > What I plan to do is to let Hudi support release a snapshot view and > > > > lifecycle management out-of-box. > > > > > > > > > Are you planning to extend the savepoint feature to have lifecycle > mgmt > > > capabilities? We should consolidate overlapping features properly. > > > > > > On Sun, Aug 21, 2022 at 12:59 PM 冯健 wrote: > > > > > > > Hi team, > > > > [image: image.png] > > > > for the snapshot view scenario, Hudi already provides two key > > > > features to support it: > > > > > > > >- Time travel: user provides a timestamp to query a specific > > snapshot > > > >view of a Hudi table > > > >- Savepoint/restore: "savepoint" saves the table as of the commit > > time > > > >so that it lets you restore the table to this savepoint at a later > > > point in > > > >time if need be. but in this case, the user usually uses this to > > > prevent > > > >cleaning snapshot view at a specific timestamp, only clean unused > > > files > > > > > > > > The situation is there some inconvenience for users if use them > > directly > > > > > > > >- Usually users incline to use a meaningful name instead of > querying > > > >Hudi table with a timestamp, using the timestamp in SQL may lead > to > > > the > > > >wrong snapshot view being used. for example, we can announce that > a > > > new tag > > > >of hudi table with table_nameMMDD was released, then the user > > can > > > use > > > >this new table name to query. > > > >- Savepoint is not designed for this "snapshot view" scenario in > the > > > >beginning, it is designed for disaster recovery. let's say a new > > > snapshot > > > >view will be created every day, and it has 7 days retention, we > > should > > > >support lifecycle management on top of it. > > > > > > > > What I plan to do is to let Hudi support release a snapshot view and > > > > lifecycle management out-of-box. We have already done some work when > > > > supporting customers' snapshot view requirements in my company, and > > hope > > > to > > > > land this feature in Community too. > > > > > > > > Please feel free to let me know if you have any idea about this. > > > > > > > > Thanks, > > > > > > > > Jian Feng > > > > > > > > > > > > > -- > > > Best, > > > Shiyan > > > > > > > > -- > Regards, > -Sivabalan >
Re: [DISCUSS] New RFC to support 'Snapshot view management'
Sounds like a nice feature to have. Eagerly looking forward for the RFC. On Sat, 27 Aug 2022 at 20:51, 冯健 wrote: > I attached the image in this Jira Epic > https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is WIP, will > create a pr in the next few days > Yeah, the basic idea is to implement lifecycle management based on the > savepoint and time travel features, providing new ways for the user to > operate > and coordinate. won't propose any new concept > > On Sun, 28 Aug 2022 at 02:06, Shiyan Xu > wrote: > > > The dev email list does not support showing images unfortunately. you may > > want to put it behind a link. > > > > As for the idea itself, > > > > What I plan to do is to let Hudi support release a snapshot view and > > > lifecycle management out-of-box. > > > > > > Are you planning to extend the savepoint feature to have lifecycle mgmt > > capabilities? We should consolidate overlapping features properly. > > > > On Sun, Aug 21, 2022 at 12:59 PM 冯健 wrote: > > > > > Hi team, > > > [image: image.png] > > > for the snapshot view scenario, Hudi already provides two key > > > features to support it: > > > > > >- Time travel: user provides a timestamp to query a specific > snapshot > > >view of a Hudi table > > >- Savepoint/restore: "savepoint" saves the table as of the commit > time > > >so that it lets you restore the table to this savepoint at a later > > point in > > >time if need be. but in this case, the user usually uses this to > > prevent > > >cleaning snapshot view at a specific timestamp, only clean unused > > files > > > > > > The situation is there some inconvenience for users if use them > directly > > > > > >- Usually users incline to use a meaningful name instead of querying > > >Hudi table with a timestamp, using the timestamp in SQL may lead to > > the > > >wrong snapshot view being used. for example, we can announce that a > > new tag > > >of hudi table with table_nameMMDD was released, then the user > can > > use > > >this new table name to query. > > >- Savepoint is not designed for this "snapshot view" scenario in the > > >beginning, it is designed for disaster recovery. let's say a new > > snapshot > > >view will be created every day, and it has 7 days retention, we > should > > >support lifecycle management on top of it. > > > > > > What I plan to do is to let Hudi support release a snapshot view and > > > lifecycle management out-of-box. We have already done some work when > > > supporting customers' snapshot view requirements in my company, and > hope > > to > > > land this feature in Community too. > > > > > > Please feel free to let me know if you have any idea about this. > > > > > > Thanks, > > > > > > Jian Feng > > > > > > > > > -- > > Best, > > Shiyan > > > -- Regards, -Sivabalan
Re: [DISCUSS] New RFC to support 'Snapshot view management'
I attached the image in this Jira Epic https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is WIP, will create a pr in the next few days Yeah, the basic idea is to implement lifecycle management based on the savepoint and time travel features, providing new ways for the user to operate and coordinate. won't propose any new concept On Sun, 28 Aug 2022 at 02:06, Shiyan Xu wrote: > The dev email list does not support showing images unfortunately. you may > want to put it behind a link. > > As for the idea itself, > > What I plan to do is to let Hudi support release a snapshot view and > > lifecycle management out-of-box. > > > Are you planning to extend the savepoint feature to have lifecycle mgmt > capabilities? We should consolidate overlapping features properly. > > On Sun, Aug 21, 2022 at 12:59 PM 冯健 wrote: > > > Hi team, > > [image: image.png] > > for the snapshot view scenario, Hudi already provides two key > > features to support it: > > > >- Time travel: user provides a timestamp to query a specific snapshot > >view of a Hudi table > >- Savepoint/restore: "savepoint" saves the table as of the commit time > >so that it lets you restore the table to this savepoint at a later > point in > >time if need be. but in this case, the user usually uses this to > prevent > >cleaning snapshot view at a specific timestamp, only clean unused > files > > > > The situation is there some inconvenience for users if use them directly > > > >- Usually users incline to use a meaningful name instead of querying > >Hudi table with a timestamp, using the timestamp in SQL may lead to > the > >wrong snapshot view being used. for example, we can announce that a > new tag > >of hudi table with table_nameMMDD was released, then the user can > use > >this new table name to query. > >- Savepoint is not designed for this "snapshot view" scenario in the > >beginning, it is designed for disaster recovery. let's say a new > snapshot > >view will be created every day, and it has 7 days retention, we should > >support lifecycle management on top of it. > > > > What I plan to do is to let Hudi support release a snapshot view and > > lifecycle management out-of-box. We have already done some work when > > supporting customers' snapshot view requirements in my company, and hope > to > > land this feature in Community too. > > > > Please feel free to let me know if you have any idea about this. > > > > Thanks, > > > > Jian Feng > > > > > -- > Best, > Shiyan >
Re: [DISCUSS] New RFC to support 'Snapshot view management'
The dev email list does not support showing images unfortunately. you may want to put it behind a link. As for the idea itself, What I plan to do is to let Hudi support release a snapshot view and > lifecycle management out-of-box. Are you planning to extend the savepoint feature to have lifecycle mgmt capabilities? We should consolidate overlapping features properly. On Sun, Aug 21, 2022 at 12:59 PM 冯健 wrote: > Hi team, > [image: image.png] > for the snapshot view scenario, Hudi already provides two key > features to support it: > >- Time travel: user provides a timestamp to query a specific snapshot >view of a Hudi table >- Savepoint/restore: "savepoint" saves the table as of the commit time >so that it lets you restore the table to this savepoint at a later point in >time if need be. but in this case, the user usually uses this to prevent >cleaning snapshot view at a specific timestamp, only clean unused files > > The situation is there some inconvenience for users if use them directly > >- Usually users incline to use a meaningful name instead of querying >Hudi table with a timestamp, using the timestamp in SQL may lead to the >wrong snapshot view being used. for example, we can announce that a new tag >of hudi table with table_nameMMDD was released, then the user can use >this new table name to query. >- Savepoint is not designed for this "snapshot view" scenario in the >beginning, it is designed for disaster recovery. let's say a new snapshot >view will be created every day, and it has 7 days retention, we should >support lifecycle management on top of it. > > What I plan to do is to let Hudi support release a snapshot view and > lifecycle management out-of-box. We have already done some work when > supporting customers' snapshot view requirements in my company, and hope to > land this feature in Community too. > > Please feel free to let me know if you have any idea about this. > > Thanks, > > Jian Feng > -- Best, Shiyan
[DISCUSS] New RFC to support 'Snapshot view management'
Hi team, [image: image.png] for the snapshot view scenario, Hudi already provides two key features to support it: - Time travel: user provides a timestamp to query a specific snapshot view of a Hudi table - Savepoint/restore: "savepoint" saves the table as of the commit time so that it lets you restore the table to this savepoint at a later point in time if need be. but in this case, the user usually uses this to prevent cleaning snapshot view at a specific timestamp, only clean unused files The situation is there some inconvenience for users if use them directly - Usually users incline to use a meaningful name instead of querying Hudi table with a timestamp, using the timestamp in SQL may lead to the wrong snapshot view being used. for example, we can announce that a new tag of hudi table with table_nameMMDD was released, then the user can use this new table name to query. - Savepoint is not designed for this "snapshot view" scenario in the beginning, it is designed for disaster recovery. let's say a new snapshot view will be created every day, and it has 7 days retention, we should support lifecycle management on top of it. What I plan to do is to let Hudi support release a snapshot view and lifecycle management out-of-box. We have already done some work when supporting customers' snapshot view requirements in my company, and hope to land this feature in Community too. Please feel free to let me know if you have any idea about this. Thanks, Jian Feng
