Re: Hoodie dataset write without partition
Thanks for chipping in :) Keep it coming On Tue, Jun 25, 2019 at 1:23 AM Netsanet Gebretsadkan wrote: > Amarnath, > > Few days ago, i was having the same problem. The hoodie modeled table was > able to be created without any partition key but the hive sync was failing > when you sync up without any partition. > This was happening because the SlashEncodedDayPartionValueExtractor class > was hard-coded to be used inside the DatasourceUtils class ( > > https://github.com/apache/incubator-hudi/blob/master/hoodie-spark/src/main/java/com/uber/hoodie/DataSourceUtils.java#L237 > ), > specifically in the buildHiveSyncConfig method which enables as to > configure the settings for hive sync. Even though, you are passing the > nonpartition class extractor as a config in the properties file, it will > not be able to see the changes. So you need to change that code to the > NonPartitionKey class extractor and compile the code again. Make sure to > provide the following config defined in the properties file to be used by > delta-streamer: > > hoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractor > > It will definitely work for you. > If you don't won't it to be hard coded, you can make further changes. > > Kind regards, > > On Tue, Jun 25, 2019 at 6:54 AM Vinoth Chandar wrote: > > > Amarnath, > > > > Mind sending a PR with updated docs once you get it working? :) might be > > useful for others too. Non partitioned tables have come up few times now > > > > > > > > On Mon, Jun 24, 2019 at 2:57 PM [email protected] > > wrote: > > > > > > > > Hi Amarnath, > > > Apart from changing the partition extractor class, you would need to > > > change the keyGeneratorClass for non-partitioned table. > > > Use this param "--key-generator-class > > > com.uber.hoodie.NonpartitionedKeyGenerator" as part of DeltaStreamer > > > command-line execution. > > > Also, ensure we have the following configs defined in the properties > file > > > used by delta-streamer: > > > > > > hoodie.datasource.write.keygenerator.class=com.uber.hoodie.NonpartitionedKeyGeneratorhoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractorWe > > > will eventually remove the DeltaStreamer CLI and rely on the properties > > > config for uniform handling. > > > > > > Thanks,Balaji.V > > > On Monday, June 24, 2019, 1:55:51 PM PDT, Balaji Varadarajan > > > wrote: > > > > > > Hi Amarnath, > > > I will look into it and reply back by EOD today. > > > Balaji.V > > > On Sunday, June 23, 2019, 8:21:51 AM PDT, Amarnath Venkataswamy < > > > [email protected]> wrote: > > > > > > Hi > > > > > > Is there any option to write the hoodie dataset without any partition? > > > > > > I tried but hive sync is failing when you sync up without any > partition. > > > > > > Delta streamer creates with default as partition when there is no > > > partition column. > > > > > > > > > Sent from my iPhone > > >
Re: Hoodie dataset write without partition
Amarnath, Few days ago, i was having the same problem. The hoodie modeled table was able to be created without any partition key but the hive sync was failing when you sync up without any partition. This was happening because the SlashEncodedDayPartionValueExtractor class was hard-coded to be used inside the DatasourceUtils class ( https://github.com/apache/incubator-hudi/blob/master/hoodie-spark/src/main/java/com/uber/hoodie/DataSourceUtils.java#L237), specifically in the buildHiveSyncConfig method which enables as to configure the settings for hive sync. Even though, you are passing the nonpartition class extractor as a config in the properties file, it will not be able to see the changes. So you need to change that code to the NonPartitionKey class extractor and compile the code again. Make sure to provide the following config defined in the properties file to be used by delta-streamer: hoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractor It will definitely work for you. If you don't won't it to be hard coded, you can make further changes. Kind regards, On Tue, Jun 25, 2019 at 6:54 AM Vinoth Chandar wrote: > Amarnath, > > Mind sending a PR with updated docs once you get it working? :) might be > useful for others too. Non partitioned tables have come up few times now > > > > On Mon, Jun 24, 2019 at 2:57 PM [email protected] > wrote: > > > > > Hi Amarnath, > > Apart from changing the partition extractor class, you would need to > > change the keyGeneratorClass for non-partitioned table. > > Use this param "--key-generator-class > > com.uber.hoodie.NonpartitionedKeyGenerator" as part of DeltaStreamer > > command-line execution. > > Also, ensure we have the following configs defined in the properties file > > used by delta-streamer: > > > hoodie.datasource.write.keygenerator.class=com.uber.hoodie.NonpartitionedKeyGeneratorhoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractorWe > > will eventually remove the DeltaStreamer CLI and rely on the properties > > config for uniform handling. > > > > Thanks,Balaji.V > > On Monday, June 24, 2019, 1:55:51 PM PDT, Balaji Varadarajan > > wrote: > > > > Hi Amarnath, > > I will look into it and reply back by EOD today. > > Balaji.V > > On Sunday, June 23, 2019, 8:21:51 AM PDT, Amarnath Venkataswamy < > > [email protected]> wrote: > > > > Hi > > > > Is there any option to write the hoodie dataset without any partition? > > > > I tried but hive sync is failing when you sync up without any partition. > > > > Delta streamer creates with default as partition when there is no > > partition column. > > > > > > Sent from my iPhone >
Re: Hoodie dataset write without partition
Amarnath, Mind sending a PR with updated docs once you get it working? :) might be useful for others too. Non partitioned tables have come up few times now On Mon, Jun 24, 2019 at 2:57 PM [email protected] wrote: > > Hi Amarnath, > Apart from changing the partition extractor class, you would need to > change the keyGeneratorClass for non-partitioned table. > Use this param "--key-generator-class > com.uber.hoodie.NonpartitionedKeyGenerator" as part of DeltaStreamer > command-line execution. > Also, ensure we have the following configs defined in the properties file > used by delta-streamer: > hoodie.datasource.write.keygenerator.class=com.uber.hoodie.NonpartitionedKeyGeneratorhoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractorWe > will eventually remove the DeltaStreamer CLI and rely on the properties > config for uniform handling. > > Thanks,Balaji.V > On Monday, June 24, 2019, 1:55:51 PM PDT, Balaji Varadarajan > wrote: > > Hi Amarnath, > I will look into it and reply back by EOD today. > Balaji.V > On Sunday, June 23, 2019, 8:21:51 AM PDT, Amarnath Venkataswamy < > [email protected]> wrote: > > Hi > > Is there any option to write the hoodie dataset without any partition? > > I tried but hive sync is failing when you sync up without any partition. > > Delta streamer creates with default as partition when there is no > partition column. > > > Sent from my iPhone
Re: Hoodie dataset write without partition
Hi Amarnath, Apart from changing the partition extractor class, you would need to change the keyGeneratorClass for non-partitioned table. Use this param "--key-generator-class com.uber.hoodie.NonpartitionedKeyGenerator" as part of DeltaStreamer command-line execution. Also, ensure we have the following configs defined in the properties file used by delta-streamer: hoodie.datasource.write.keygenerator.class=com.uber.hoodie.NonpartitionedKeyGeneratorhoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractorWe will eventually remove the DeltaStreamer CLI and rely on the properties config for uniform handling. Thanks,Balaji.V On Monday, June 24, 2019, 1:55:51 PM PDT, Balaji Varadarajan wrote: Hi Amarnath, I will look into it and reply back by EOD today. Balaji.V On Sunday, June 23, 2019, 8:21:51 AM PDT, Amarnath Venkataswamy wrote: Hi Is there any option to write the hoodie dataset without any partition? I tried but hive sync is failing when you sync up without any partition. Delta streamer creates with default as partition when there is no partition column. Sent from my iPhone
Re: Hoodie dataset write without partition
Hi Amarnath, I will look into it and reply back by EOD today. Balaji.V On Sunday, June 23, 2019, 8:21:51 AM PDT, Amarnath Venkataswamy wrote: Hi Is there any option to write the hoodie dataset without any partition? I tried but hive sync is failing when you sync up without any partition. Delta streamer creates with default as partition when there is no partition column. Sent from my iPhone
