Re: Hoodie dataset write without partition

2019-06-26 Thread Vinoth Chandar
Thanks for chipping in :) Keep it coming

On Tue, Jun 25, 2019 at 1:23 AM Netsanet Gebretsadkan 
wrote:

> Amarnath,
>
> Few days ago, i was having the same problem. The hoodie modeled table was
> able to be created without any partition key but the hive sync was failing
> when you sync up without any partition.
> This was happening because the SlashEncodedDayPartionValueExtractor class
> was hard-coded to be used inside the DatasourceUtils class (
>
> https://github.com/apache/incubator-hudi/blob/master/hoodie-spark/src/main/java/com/uber/hoodie/DataSourceUtils.java#L237
> ),
> specifically in the buildHiveSyncConfig method which enables as to
> configure the settings for hive sync. Even though, you are passing the
> nonpartition class extractor as a config in the properties file,  it will
> not be able to see the changes. So you need to change that code to the
> NonPartitionKey class extractor and compile the code again. Make sure to
> provide the following config defined in the properties file to be used by
> delta-streamer:
>
> hoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractor
>
> It will definitely work for you.
> If you don't won't it to be hard coded, you can make further changes.
>
> Kind regards,
>
> On Tue, Jun 25, 2019 at 6:54 AM Vinoth Chandar  wrote:
>
> > Amarnath,
> >
> > Mind sending a PR with updated docs once you get it working? :) might be
> > useful for others too. Non partitioned tables have come up few times now
> >
> >
> >
> > On Mon, Jun 24, 2019 at 2:57 PM [email protected] 
> > wrote:
> >
> > >
> > > Hi Amarnath,
> > > Apart from changing the partition extractor class, you would need to
> > > change the keyGeneratorClass for non-partitioned table.
> > > Use this param "--key-generator-class
> > > com.uber.hoodie.NonpartitionedKeyGenerator" as part of DeltaStreamer
> > > command-line execution.
> > > Also, ensure we have the following configs defined in the properties
> file
> > > used by delta-streamer:
> > >
> >
> hoodie.datasource.write.keygenerator.class=com.uber.hoodie.NonpartitionedKeyGeneratorhoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractorWe
> > > will eventually remove the DeltaStreamer CLI and rely on the properties
> > > config for uniform handling.
> > >
> > > Thanks,Balaji.V
> > > On Monday, June 24, 2019, 1:55:51 PM PDT, Balaji Varadarajan
> > >  wrote:
> > >
> > >   Hi Amarnath,
> > > I will look into it and reply back by EOD today.
> > > Balaji.V
> > > On Sunday, June 23, 2019, 8:21:51 AM PDT, Amarnath Venkataswamy <
> > > [email protected]> wrote:
> > >
> > >  Hi
> > >
> > > Is there any option to write the hoodie dataset without any partition?
> > >
> > > I tried but hive sync is failing when you sync up without any
> partition.
> > >
> > > Delta streamer creates with default as partition when there is no
> > > partition column.
> > >
> > >
> > > Sent from my iPhone
> >
>


Re: Hoodie dataset write without partition

2019-06-25 Thread Netsanet Gebretsadkan
Amarnath,

Few days ago, i was having the same problem. The hoodie modeled table was
able to be created without any partition key but the hive sync was failing
when you sync up without any partition.
This was happening because the SlashEncodedDayPartionValueExtractor class
was hard-coded to be used inside the DatasourceUtils class (
https://github.com/apache/incubator-hudi/blob/master/hoodie-spark/src/main/java/com/uber/hoodie/DataSourceUtils.java#L237),
specifically in the buildHiveSyncConfig method which enables as to
configure the settings for hive sync. Even though, you are passing the
nonpartition class extractor as a config in the properties file,  it will
not be able to see the changes. So you need to change that code to the
NonPartitionKey class extractor and compile the code again. Make sure to
provide the following config defined in the properties file to be used by
delta-streamer:
hoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractor

It will definitely work for you.
If you don't won't it to be hard coded, you can make further changes.

Kind regards,

On Tue, Jun 25, 2019 at 6:54 AM Vinoth Chandar  wrote:

> Amarnath,
>
> Mind sending a PR with updated docs once you get it working? :) might be
> useful for others too. Non partitioned tables have come up few times now
>
>
>
> On Mon, Jun 24, 2019 at 2:57 PM [email protected] 
> wrote:
>
> >
> > Hi Amarnath,
> > Apart from changing the partition extractor class, you would need to
> > change the keyGeneratorClass for non-partitioned table.
> > Use this param "--key-generator-class
> > com.uber.hoodie.NonpartitionedKeyGenerator" as part of DeltaStreamer
> > command-line execution.
> > Also, ensure we have the following configs defined in the properties file
> > used by delta-streamer:
> >
> hoodie.datasource.write.keygenerator.class=com.uber.hoodie.NonpartitionedKeyGeneratorhoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractorWe
> > will eventually remove the DeltaStreamer CLI and rely on the properties
> > config for uniform handling.
> >
> > Thanks,Balaji.V
> > On Monday, June 24, 2019, 1:55:51 PM PDT, Balaji Varadarajan
> >  wrote:
> >
> >   Hi Amarnath,
> > I will look into it and reply back by EOD today.
> > Balaji.V
> > On Sunday, June 23, 2019, 8:21:51 AM PDT, Amarnath Venkataswamy <
> > [email protected]> wrote:
> >
> >  Hi
> >
> > Is there any option to write the hoodie dataset without any partition?
> >
> > I tried but hive sync is failing when you sync up without any partition.
> >
> > Delta streamer creates with default as partition when there is no
> > partition column.
> >
> >
> > Sent from my iPhone
>


Re: Hoodie dataset write without partition

2019-06-24 Thread Vinoth Chandar
Amarnath,

Mind sending a PR with updated docs once you get it working? :) might be
useful for others too. Non partitioned tables have come up few times now



On Mon, Jun 24, 2019 at 2:57 PM [email protected] 
wrote:

>
> Hi Amarnath,
> Apart from changing the partition extractor class, you would need to
> change the keyGeneratorClass for non-partitioned table.
> Use this param "--key-generator-class
> com.uber.hoodie.NonpartitionedKeyGenerator" as part of DeltaStreamer
> command-line execution.
> Also, ensure we have the following configs defined in the properties file
> used by delta-streamer:
> hoodie.datasource.write.keygenerator.class=com.uber.hoodie.NonpartitionedKeyGeneratorhoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractorWe
> will eventually remove the DeltaStreamer CLI and rely on the properties
> config for uniform handling.
>
> Thanks,Balaji.V
> On Monday, June 24, 2019, 1:55:51 PM PDT, Balaji Varadarajan
>  wrote:
>
>   Hi Amarnath,
> I will look into it and reply back by EOD today.
> Balaji.V
> On Sunday, June 23, 2019, 8:21:51 AM PDT, Amarnath Venkataswamy <
> [email protected]> wrote:
>
>  Hi
>
> Is there any option to write the hoodie dataset without any partition?
>
> I tried but hive sync is failing when you sync up without any partition.
>
> Delta streamer creates with default as partition when there is no
> partition column.
>
>
> Sent from my iPhone


Re: Hoodie dataset write without partition

2019-06-24 Thread [email protected]
 
Hi Amarnath,
Apart from changing the partition extractor class, you would need to change the 
keyGeneratorClass for non-partitioned table.
Use this param "--key-generator-class 
com.uber.hoodie.NonpartitionedKeyGenerator" as part of DeltaStreamer 
command-line execution.
Also, ensure we have the following configs defined in the properties file used 
by delta-streamer:
hoodie.datasource.write.keygenerator.class=com.uber.hoodie.NonpartitionedKeyGeneratorhoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractorWe
 will eventually remove the DeltaStreamer CLI and rely on the properties config 
for uniform handling.

Thanks,Balaji.V
On Monday, June 24, 2019, 1:55:51 PM PDT, Balaji Varadarajan 
 wrote:  
 
  Hi Amarnath,
I will look into it and reply back by EOD today.
Balaji.V
    On Sunday, June 23, 2019, 8:21:51 AM PDT, Amarnath Venkataswamy 
 wrote:  
 
 Hi 

Is there any option to write the hoodie dataset without any partition?

I tried but hive sync is failing when you sync up without any partition.

Delta streamer creates with default as partition when there is no partition 
column.


Sent from my iPhone    

Re: Hoodie dataset write without partition

2019-06-24 Thread Balaji Varadarajan
 Hi Amarnath,
I will look into it and reply back by EOD today.
Balaji.V
On Sunday, June 23, 2019, 8:21:51 AM PDT, Amarnath Venkataswamy 
 wrote:  
 
 Hi 

Is there any option to write the hoodie dataset without any partition?

I tried but hive sync is failing when you sync up without any partition.

Delta streamer creates with default as partition when there is no partition 
column.


Sent from my iPhone