Re: Insert will generate at least one file each time when each spark or spark streaming batch?

2019-03-11 Thread kaka chen
Nishith, Thanks, will try it. Thanks, Frank nishith agarwal 于2019年3月12日周二 上午11:21写道: > Frank, > > You can play with a couple of configs to keep X number of older file > versions. Take a look at these configs : > https://hudi.apache.org/configurations.html#withCompactionConfig. > Specifically,

Re: Insert will generate at least one file each time when each spark or spark streaming batch?

2019-03-11 Thread nishith agarwal
Frank, You can play with a couple of configs to keep X number of older file versions. Take a look at these configs : https://hudi.apache.org/configurations.html#withCompactionConfig. Specifically, you can choose the number of commits you want to keep, here commits = versions. Depending on how

Re: Insert will generate at least one file each time when each spark or spark streaming batch?

2019-03-11 Thread kaka chen
Hi Vinoth, To use this feature, I find the new file will write a new file with old inserted records. But how to cleanup the old files when use cow tables? Thanks, Frank Vinoth Chandar 于2019年2月28日周四 上午3:24写道: > Similarly, please try the 0.4.5 release. This has small file handling > turned on

Re: Insert will generate at least one file each time when each spark or spark streaming batch?

2019-02-27 Thread Vinoth Chandar
Similarly, please try the 0.4.5 release. This has small file handling turned on by default.. Also please use the insert api/operation, (not bulk_insert) if you want this behavior. Let us know if you still run into issues.. On Tue, Feb 26, 2019 at 11:09 PM kaka chen wrote: > Thanks! > >

Re: Insert will generate at least one file each time when each spark or spark streaming batch?

2019-02-26 Thread kaka chen
Thanks! nishith agarwal 于2019年2月27日周三 下午2:56写道: > Hi Kaka, > > Hudi automatically does file sizing for you. As you ingest more inserts the > existing file will be automatically sized. You can play with a few configs > : > > https://hudi.apache.org/configurations.html#withStorageConfig -> This >

Re: Insert will generate at least one file each time when each spark or spark streaming batch?

2019-02-26 Thread nishith agarwal
Hi Kaka, Hudi automatically does file sizing for you. As you ingest more inserts the existing file will be automatically sized. You can play with a few configs : https://hudi.apache.org/configurations.html#withStorageConfig -> This config allows you to set a max size for your output file.

Insert will generate at least one file each time when each spark or spark streaming batch?

2019-02-26 Thread kaka chen
Hi All, I found Insert will generate at least one file each time when each spark or spark streaming batch. Is it expected result? If it is, how to control these small files, is hudi provide some tools to compact it? Thanks, Frank