[ 
https://issues.apache.org/jira/browse/HUDI-327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-327.
----------------------
    Fix Version/s: 0.5.1
       Resolution: Fixed

Fixed via master: 60fed21dc7e4cb66b154ae9be77dfada0f3071a5

> Introduce "null" supporting KeyGenerator
> ----------------------------------------
>
>                 Key: HUDI-327
>                 URL: https://issues.apache.org/jira/browse/HUDI-327
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>            Reporter: Brandon Scheller
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.5.1
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Customers have been running into issues where they would like to use a 
> record_key from columns that can contain null values. Currently, this will 
> cause Hudi to crash and throw a cryptic exception.(improving error messaging 
> is a separate but related issue)
> We would like to propose a new KeyGenerator based on ComplexKeyGenerator that 
> allows for null record_keys.
> At a basic level, using the key generator without any options would 
> essentially allow a null record_key to be accepted. (It can be replaced with 
> an empty string, null, or some predefined "null" string representation)
> This comes with the negative side effect that all records with a null 
> record_key would then be associated together. To work around this, you would 
> be able to specify a secondary record_key to be used in the case that the 
> first one is null. You would specify this in the same way that you do for the 
> ComplexKeyGenerator as a comma separated list of record_keys. In this case, 
> when the first key is seen as null then the second key will be used instead. 
> We could support any arbitrary limit of record_keys here.
> While we are aware there are many alternatives to avoid using a null 
> record_key. We believe this will act as a usability improvement so that new 
> users are not forced to clean/update their data in order to use Hudi.
> We are hoping to get some feedback on the idea
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to