A few questions:

1) What operators do you envision UUID supporting?  Are there UDFs specific to 
it?  Are there constraints on assuring its uniqueness?

2) A more general form of question 1, what about UUID is different from a 
string or decimal(20, 0) (either of which should be able to store a UUID) that 
requires defining a new type?

3) Is this mostly to make clear to users that this is UUID data and not a 
general string, or bigint, or whatever?

As Edward correctly points out adding types has implications on other users in 
the system who read and write your data.  I’m also worried about proliferating 
new types.  I’m wondering if we could approach this by supporting user defined 
types.

Full on UDTs are complex, but we could start with just the ability to take a 
Hive struct and define it as a UDT in the metadata, with definitions of how to 
convert this value to and from a string.  This would enable storage without 
changing every serde (as we’d store it as a string in the underlying file) and 
allow constant definitions in SQL (since we could convert from a string).  This 
would not enable any constraints or operators for the new type, but those could 
be added later if desired.

Alan.

> On Nov 19, 2016, at 13:11, Juan Delard de Rigoulières <j...@datarepublic.io> 
> wrote:
> 
> Hi,
> We'd like to extend Hive to support a new primitive type. For simplicity 
> sake, think of UUID. 
> (https://en.wikipedia.org/wiki/Universally_unique_identifier)
> UUIDs are string with a particular/simple structure - known regex matchable. 
> (/^[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i)
> We've looked into serde & udf but it doesn't seem elegant enough, so that 
> it's possible to write DDLs like:
> CREATE TABLE `awesome` {
>   users STRING,
>   id UUID
> };
> We are looking to validation of values on ingestion (INSERT); so in the 
> example, values for the second column will get validated as UUID records.
> Thanks in advance.
> 
> Juan
> 

Reply via email to