Re: On MergeOnRead mode, when a record update more than once in a parttition, it does not work.

2019-02-26 Thread kaka chen
Thanks. nishith agarwal 于2019年2月27日周三 下午3:44写道: > Thanks for pointing that out Kaka, I think HoodieAvroPayload is assigned to > be the default class hence the confusion. > > You could implement your own payload class to achieve this or take a look > at > > https://github.com/uber/hudi/blob/maste

Re: On MergeOnRead mode, when a record update more than once in a parttition, it does not work.

2019-02-26 Thread nishith agarwal
Thanks for pointing that out Kaka, I think HoodieAvroPayload is assigned to be the default class hence the confusion. You could implement your own payload class to achieve this or take a look at https://github.com/uber/hudi/blob/master/hoodie-spark/src/main/java/com/uber/hoodie/OverwriteWithLatest

On MergeOnRead mode, when a record update more than once in a parttition, it does not work.

2019-02-26 Thread kaka chen
Hi All, On MergeOnRead mode, when a record update more than once in a parttition, it does not work. I found It used HoodieAvroPayload, which is the default class of "hoodie.compaction.payload.class", which preCombine method only return this. @Override public HoodieAvroPayload preCombine(HoodieAvro

Re: Insert will generate at least one file each time when each spark or spark streaming batch?

2019-02-26 Thread kaka chen
Thanks! nishith agarwal 于2019年2月27日周三 下午2:56写道: > Hi Kaka, > > Hudi automatically does file sizing for you. As you ingest more inserts the > existing file will be automatically sized. You can play with a few configs > : > > https://hudi.apache.org/configurations.html#withStorageConfig -> This >

Re: Insert will generate at least one file each time when each spark or spark streaming batch?

2019-02-26 Thread nishith agarwal
Hi Kaka, Hudi automatically does file sizing for you. As you ingest more inserts the existing file will be automatically sized. You can play with a few configs : https://hudi.apache.org/configurations.html#withStorageConfig -> This config allows you to set a max size for your output file. https:/

Re: Migrating issues to jira

2019-02-26 Thread Thomas Weise
Transfer is a bit more complicated compared to just importing the master branch, but here is an example where it was done: https://issues.apache.org/jira/browse/INFRA-17168 If you go that route, you would then push back the asf-site branch after the external repo was moved (and replaced the curren

Insert will generate at least one file each time when each spark or spark streaming batch?

2019-02-26 Thread kaka chen
Hi All, I found Insert will generate at least one file each time when each spark or spark streaming batch. Is it expected result? If it is, how to control these small files, is hudi provide some tools to compact it? Thanks, Frank

Re: AbstractRealtimeRecordReader cannot get the partition field from the hive partition table

2019-02-26 Thread kaka chen
BTW, because it cannot get partition field, after I merged with https://github.com/uber/hudi/pull/569/files, the job can run successfully. Thanks, Frank kaka chen 于2019年2月27日周三 上午10:34写道: > > I have tried two environments(Hive 2.1.1 and Hive 1.1.0-cdh5.15.1) both > cannot get the partition fiel

Re: AbstractRealtimeRecordReader cannot get the partition field from the hive partition table

2019-02-26 Thread kaka chen
I have tried two environments(Hive 2.1.1 and Hive 1.1.0-cdh5.15.1) both cannot get the partition field. And I added simple logs to show the result: LOG.info("schema: " + schema + " partitioningFields: " + partitioningFields); 2019-02-26 19:53:47,855 INFO [main] com.uber.hoodie.hadoop.realtime.Ab

Re: AbstractRealtimeRecordReader cannot get the partition field from the hive partition table

2019-02-26 Thread vbal...@apache.org
Hi Frank, As Vinoth mentioned, can you share your environment (especially Hive/Spark version). Also, Can you paste the table definition as seen in Hive metastore ( desc formatted ) Balaji.V On Tuesday, February 26, 2019, 11:10:16 AM PST, Vinoth Chandar wrote: Hi, Can you share more

Re: AbstractRealtimeRecordReader cannot get the partition field from the hive partition table

2019-02-26 Thread Vinoth Chandar
Hi, Can you share more details about your environment and the full stack trace? Thanks Vinoth On Mon, Feb 25, 2019 at 11:10 PM kaka chen wrote: > Hi All, > > AbstractRealtimeRecordReader cannot get the partition field from the > hive partition table by > String partitionFields = jobConf.get("

Re: Site documentation

2019-02-26 Thread Prasanna
Site looks good folks. Great job. Nice logo :) On Mon, Feb 25, 2019 at 1:01 PM Vinoth Chandar wrote: > Refreshed site https://github.com/apache/incubator-hudi/pull/8 > > @dev review please. > > On Mon, Feb 25, 2019 at 7:11 AM Vinoth Chandar wrote: > > > https://github.com/apache/incubator-hudi/