Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-12 Thread Shiyan Xu
Came up with the first draft. Thank you. https://cwiki.apache.org/confluence/display/HUDI/RFC-9%3A+%28WIP%29+Hudi+Dataset+Snapshotter On Tue, Nov 12, 2019 at 12:44 PM Shiyan Xu wrote: > Thank you all for the +1s! I'll go ahead add a RFC page then. > > On Tue, Nov 12, 2019 at 8:41 AM nishith

Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-12 Thread Shiyan Xu
Thank you all for the +1s! I'll go ahead add a RFC page then. On Tue, Nov 12, 2019 at 8:41 AM nishith agarwal wrote: > +1 on the exporter tool idea. > > -Nishith > > On Tue, Nov 12, 2019 at 5:06 AM leesf wrote: > > > +1. and we would discuss it further when design docs are available. > > > >

Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-12 Thread nishith agarwal
+1 on the exporter tool idea. -Nishith On Tue, Nov 12, 2019 at 5:06 AM leesf wrote: > +1. and we would discuss it further when design docs are available. > > Best, > Leesf > > Balaji Varadarajan 于2019年11月12日周二 下午4:17写道: > > > +1 on the exporter tool idea. > > > > On Mon, Nov 11, 2019 at 10:36

Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-12 Thread leesf
+1. and we would discuss it further when design docs are available. Best, Leesf Balaji Varadarajan 于2019年11月12日周二 下午4:17写道: > +1 on the exporter tool idea. > > On Mon, Nov 11, 2019 at 10:36 PM vino yang wrote: > > > Hi Shiyan, > > > > +1 for this proposal, Also, it looks like an exporter

Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-12 Thread Balaji Varadarajan
+1 on the exporter tool idea. On Mon, Nov 11, 2019 at 10:36 PM vino yang wrote: > Hi Shiyan, > > +1 for this proposal, Also, it looks like an exporter tool. > > @Vinoth Chandar Any thoughts about where to place it? > > Best, > Vino > > Vinoth Chandar 于2019年11月12日周二 上午8:58写道: > > > We can

Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-11 Thread vino yang
Hi Shiyan, +1 for this proposal, Also, it looks like an exporter tool. @Vinoth Chandar Any thoughts about where to place it? Best, Vino Vinoth Chandar 于2019年11月12日周二 上午8:58写道: > We can wait for others to chime in as well. :) > > On Mon, Nov 11, 2019 at 4:37 PM Shiyan Xu > wrote: > > >

Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-11 Thread Vinoth Chandar
We can wait for others to chime in as well. :) On Mon, Nov 11, 2019 at 4:37 PM Shiyan Xu wrote: > Yes, Vinoth, you're right that it is more of an exporter, which exports a > snapshot from Hudi dataset. > > It should support MOR too; it shall just leverage on existing > SnapshotCopier logic to

Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-11 Thread Shiyan Xu
Yes, Vinoth, you're right that it is more of an exporter, which exports a snapshot from Hudi dataset. It should support MOR too; it shall just leverage on existing SnapshotCopier logic to find the latest file slices. So is it good to create a RFC for further discussion? On Mon, Nov 11, 2019 at

Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-11 Thread Vinoth Chandar
What you suggest sounds more like an `Exporter` tool? I imagine you will support MOR as well? +1 on the idea itself. It could be useful if plain parquet snapshot was generated as a backup. On Mon, Nov 11, 2019 at 4:21 PM Shiyan Xu wrote: > Hi All, > > The existing SnapshotCopier under Hudi

[DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-11 Thread Shiyan Xu
Hi All, The existing SnapshotCopier under Hudi Utilities is a Hudi-to-Hudi copy and primarily for backup purpose. I would like to start a RFC for a more generic Hudi snapshotter, which - Supports existing SnapshotCopier features - Add option to export a Hudi dataset to plain parquet files