Re: [DISCUSS] New RFC to support 'Snapshot view management'

2022-09-13 Thread 冯健
Hi Ethan,

Yes, based on the current situation, we still need to do much extra
work to provide snapshot view feature for the users( or users do this by
themself)
. I plan to merge the COW part of this feature to 0.13.0 at least. will
consider your suggestion if time is tight
Thanks



On Wed, 14 Sept 2022 at 03:02, Y Ethan Guo  wrote:

> Hi Feng Jian,
>
> Looking forward to the RFC!  Is the snapshot view management more like
> managing commits / savepoints in the Hudi timeline and hiding Hudi
> internals from the users?
>
> Do you plan to merge the implementation of snapshot view and lifecycle
> management for the next major release (0.13.0)?  Timeline-wise, if time is
> tight, you may also consider scoping out a subset of features to target
> 0.13.0.
>
> Best,
> - Ethan
>
> On Mon, Sep 12, 2022 at 10:43 PM Sivabalan  wrote:
>
> > Sounds like a nice feature to have. Eagerly looking forward for the RFC.
> >
> > On Sat, 27 Aug 2022 at 20:51, 冯健  wrote:
> >
> > > I attached the image in this Jira Epic
> > > https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is WIP,
> > will
> > > create a pr in the next few days
> > > Yeah, the basic idea is to implement lifecycle management based on the
> > > savepoint and time travel features, providing new ways for the user to
> > > operate
> > > and coordinate. won't propose any new concept
> > >
> > > On Sun, 28 Aug 2022 at 02:06, Shiyan Xu 
> > > wrote:
> > >
> > > > The dev email list does not support showing images unfortunately. you
> > may
> > > > want to put it behind a link.
> > > >
> > > > As for the idea itself,
> > > >
> > > > What I plan to do is to let Hudi support release a snapshot view and
> > > > > lifecycle management out-of-box.
> > > >
> > > >
> > > >  Are you planning to extend the savepoint feature to have lifecycle
> > mgmt
> > > > capabilities? We should consolidate overlapping features properly.
> > > >
> > > > On Sun, Aug 21, 2022 at 12:59 PM 冯健  wrote:
> > > >
> > > > > Hi team,
> > > > > [image: image.png]
> > > > > for the snapshot view scenario, Hudi already provides two key
> > > > > features to support it:
> > > > >
> > > > >- Time travel: user provides a timestamp to query a specific
> > > snapshot
> > > > >view of a Hudi table
> > > > >- Savepoint/restore: "savepoint" saves the table as of the
> commit
> > > time
> > > > >so that it lets you restore the table to this savepoint at a
> later
> > > > point in
> > > > >time if need be. but in this case, the user usually uses this to
> > > > prevent
> > > > >cleaning snapshot view at a specific timestamp, only clean
> unused
> > > > files
> > > > >
> > > > > The situation is there some inconvenience for users if use them
> > > directly
> > > > >
> > > > >- Usually users incline to use a meaningful name instead of
> > querying
> > > > >Hudi table with a timestamp, using the timestamp in SQL may lead
> > to
> > > > the
> > > > >wrong snapshot view being used. for example, we can announce
> that
> > a
> > > > new tag
> > > > >of hudi table with table_nameMMDD was released, then the
> user
> > > can
> > > > use
> > > > >this new table name to query.
> > > > >- Savepoint is not designed for this "snapshot view" scenario in
> > the
> > > > >beginning, it is designed for disaster recovery. let's say a new
> > > > snapshot
> > > > >view will be created every day, and it has 7 days retention, we
> > > should
> > > > >support lifecycle management on top of it.
> > > > >
> > > > > What I plan to do is to let Hudi support release a snapshot view
> and
> > > > > lifecycle management out-of-box. We have already done some work
> when
> > > > > supporting customers' snapshot view requirements in my company, and
> > > hope
> > > > to
> > > > > land this feature in Community too.
> > > > >
> > > > > Please feel free to let me know if you have any idea about this.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jian Feng
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Shiyan
> > > >
> > >
> >
> >
> > --
> > Regards,
> > -Sivabalan
> >
>


Re: [DISCUSS] New RFC to support 'Snapshot view management'

2022-09-13 Thread Y Ethan Guo
Hi Feng Jian,

Looking forward to the RFC!  Is the snapshot view management more like
managing commits / savepoints in the Hudi timeline and hiding Hudi
internals from the users?

Do you plan to merge the implementation of snapshot view and lifecycle
management for the next major release (0.13.0)?  Timeline-wise, if time is
tight, you may also consider scoping out a subset of features to target
0.13.0.

Best,
- Ethan

On Mon, Sep 12, 2022 at 10:43 PM Sivabalan  wrote:

> Sounds like a nice feature to have. Eagerly looking forward for the RFC.
>
> On Sat, 27 Aug 2022 at 20:51, 冯健  wrote:
>
> > I attached the image in this Jira Epic
> > https://issues.apache.org/jira/browse/HUDI-4677, and the RFC is WIP,
> will
> > create a pr in the next few days
> > Yeah, the basic idea is to implement lifecycle management based on the
> > savepoint and time travel features, providing new ways for the user to
> > operate
> > and coordinate. won't propose any new concept
> >
> > On Sun, 28 Aug 2022 at 02:06, Shiyan Xu 
> > wrote:
> >
> > > The dev email list does not support showing images unfortunately. you
> may
> > > want to put it behind a link.
> > >
> > > As for the idea itself,
> > >
> > > What I plan to do is to let Hudi support release a snapshot view and
> > > > lifecycle management out-of-box.
> > >
> > >
> > >  Are you planning to extend the savepoint feature to have lifecycle
> mgmt
> > > capabilities? We should consolidate overlapping features properly.
> > >
> > > On Sun, Aug 21, 2022 at 12:59 PM 冯健  wrote:
> > >
> > > > Hi team,
> > > > [image: image.png]
> > > > for the snapshot view scenario, Hudi already provides two key
> > > > features to support it:
> > > >
> > > >- Time travel: user provides a timestamp to query a specific
> > snapshot
> > > >view of a Hudi table
> > > >- Savepoint/restore: "savepoint" saves the table as of the commit
> > time
> > > >so that it lets you restore the table to this savepoint at a later
> > > point in
> > > >time if need be. but in this case, the user usually uses this to
> > > prevent
> > > >cleaning snapshot view at a specific timestamp, only clean unused
> > > files
> > > >
> > > > The situation is there some inconvenience for users if use them
> > directly
> > > >
> > > >- Usually users incline to use a meaningful name instead of
> querying
> > > >Hudi table with a timestamp, using the timestamp in SQL may lead
> to
> > > the
> > > >wrong snapshot view being used. for example, we can announce that
> a
> > > new tag
> > > >of hudi table with table_nameMMDD was released, then the user
> > can
> > > use
> > > >this new table name to query.
> > > >- Savepoint is not designed for this "snapshot view" scenario in
> the
> > > >beginning, it is designed for disaster recovery. let's say a new
> > > snapshot
> > > >view will be created every day, and it has 7 days retention, we
> > should
> > > >support lifecycle management on top of it.
> > > >
> > > > What I plan to do is to let Hudi support release a snapshot view and
> > > > lifecycle management out-of-box. We have already done some work when
> > > > supporting customers' snapshot view requirements in my company, and
> > hope
> > > to
> > > > land this feature in Community too.
> > > >
> > > > Please feel free to let me know if you have any idea about this.
> > > >
> > > > Thanks,
> > > >
> > > > Jian Feng
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Shiyan
> > >
> >
>
>
> --
> Regards,
> -Sivabalan
>