subject:"Discussion$New feature$ Support Complex Data Type\: Map in Carbon Data"

Re: Discussion(New feature) Support Complex Data Type: Map in Carbon Data

2016-10-22 Thread cenyuhai

I think the map default delimiter should be the same with hive. 



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-New-feature-Support-Complex-Data-Type-Map-in-Carbon-Data-tp1969p2239.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: Discussion(New feature) Support Complex Data Type: Map in Carbon Data

2016-10-17 Thread Vimal Das Kammath

ed”]” will return only one
> > value from the map whose key is “Fixed”2. Filter:-Map data type
> cannot
> > be directly used in a where clause as where clause can operate only on
> > primitive data types. However the map lookup operator can be used in
> where
> > clauses. For example:-“Select name, salary where
> > salary_breakup[“Bonus”]>10,000”*Note: if the value is not of primitive
> > type, further assessor operators need to be used depending on the type of
> > value to arrive at a primitive type for the filter expression to be
> > valid.*
> > 3. Group By:- Just like with filters, maps cannot be directly used in
> > a
> > group by clause, however the lookup operator can be used.
> >
> > 4. Functions:- A size() function can be provided for map types to
> > determine the number of key-value pairs in a map.
> > 4.  Storage changes
> >
> > As Carbon is a columnar data store, Map values will be stored using 3
> > physical columns
> >
> > 1. One Column for representing the Map Data type. Will store the
> > number
> > of fields and start index, just the same way as it is done for Struts and
> > Arrays.
> >
> > 2. One Column for the Key
> >
> > 3. One Column for the value, if the value is of primitive data type,
> > else the value itself will be multiple physical columns depending on the
> > data type of the value.
> >
> > Map<String,Int>
> >
> > Column_1
> >
> > Column_2
> >
> > Column_3
> >
> > Map_Salary_Breakup
> >
> > Map_Salary_Breakup.key
> >
> > Map_Salary_Breakup.value
> >
> > 3,1
> >
> > Fixed
> >
> > 1,00,000
> >
> > Bonus
> >
> > 30,000
> >
> > Stock
> >
> > 40,000
> >
> > 2,4
> >
> > Fixed
> >
> > 1,40,000
> >
> > Bonus
> >
> > 30,000
> >
> > 3,6
> >
> > Fixed
> >
> > 1,20,000
> >
> > Bonus
> >
> > 20,000
> >
> > Stock
> >
> > 30,000
> >
> > Regards
> > Vimal
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Discussion-New-
> feature-Support-Complex-Data-Type-Map-in-Carbon-Data-tp1969p1985.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>

Re: Discussion(New feature) Support Complex Data Type: Map in Carbon Data

2016-10-16 Thread Ravindra Pesala

y where
> > salary_breakup[“Bonus”]>10,000”*Note: if the value is not of primitive
> > type, further assessor operators need to be used depending on the type of
> > value to arrive at a primitive type for the filter expression to be
> > valid.*
> > 3. Group By:- Just like with filters, maps cannot be directly used in
> > a
> > group by clause, however the lookup operator can be used.
> >
> > 4. Functions:- A size() function can be provided for map types to
> > determine the number of key-value pairs in a map.
> > 4.  Storage changes
> >
> > As Carbon is a columnar data store, Map values will be stored using 3
> > physical columns
> >
> > 1. One Column for representing the Map Data type. Will store the
> > number
> > of fields and start index, just the same way as it is done for Struts and
> > Arrays.
> >
> > 2. One Column for the Key
> >
> > 3. One Column for the value, if the value is of primitive data type,
> > else the value itself will be multiple physical columns depending on the
> > data type of the value.
> >
> > Map<String,Int>
> >
> > Column_1
> >
> > Column_2
> >
> > Column_3
> >
> > Map_Salary_Breakup
> >
> > Map_Salary_Breakup.key
> >
> > Map_Salary_Breakup.value
> >
> > 3,1
> >
> > Fixed
> >
> > 1,00,000
> >
> > Bonus
> >
> > 30,000
> >
> > Stock
> >
> > 40,000
> >
> > 2,4
> >
> > Fixed
> >
> > 1,40,000
> >
> > Bonus
> >
> > 30,000
> >
> > 3,6
> >
> > Fixed
> >
> > 1,20,000
> >
> > Bonus
> >
> > 20,000
> >
> > Stock
> >
> > 30,000
> >
> > Regards
> > Vimal
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Discussion-New-
> feature-Support-Complex-Data-Type-Map-in-Carbon-Data-tp1969p1985.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>



-- 
Thanks & Regards,
Ravi

Re: Discussion(New feature) Support Complex Data Type: Map in Carbon Data

2016-10-16 Thread Liang Chen

re the
> number
> of fields and start index, just the same way as it is done for Struts and
> Arrays.
> 
> 2. One Column for the Key
> 
> 3. One Column for the value, if the value is of primitive data type,
> else the value itself will be multiple physical columns depending on the
> data type of the value.
> 
> Map<String,Int>
> 
> Column_1
> 
> Column_2
> 
> Column_3
> 
> Map_Salary_Breakup
> 
> Map_Salary_Breakup.key
> 
> Map_Salary_Breakup.value
> 
> 3,1
> 
> Fixed
> 
> 1,00,000
> 
> Bonus
> 
> 30,000
> 
> Stock
> 
> 40,000
> 
> 2,4
> 
> Fixed
> 
> 1,40,000
> 
> Bonus
> 
> 30,000
> 
> 3,6
> 
> Fixed
> 
> 1,20,000
> 
> Bonus
> 
> 20,000
> 
> Stock
> 
> 30,000
> 
> Regards
> Vimal





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-New-feature-Support-Complex-Data-Type-Map-in-Carbon-Data-tp1969p1985.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Discussion(New feature) Support Complex Data Type: Map in Carbon Data

2016-10-15 Thread Vimal Das Kammath

Hi All,

This discussion is regarding support for Map Data type in Carbon Data.

Carbon Data supports complex and nested data types such as Arrays and
Struts. However, Carbon Data does not support other complex data types such
as Maps and Union which are generally supported by popular opensource file
formats.

Supporting Map data type will require changes/additions to the DDL, Query
Syntax, Data Loading and Storage.

I have hosted the design on google docs for review and discussion.

https://docs.google.com/document/d/1U6wPohvdDHk0B7bONnVHWa6PKG8R9q5-oKMqzMMQHYY/edit?usp=sharing

Below is the same inline.

1. DDL Changes

Maps are key->value data types and where the value can be fetched by
providing the key. Hence we need to restrict keys to primitive data types
whereas values can be of any data type supported in Carbon(primitive and
complex).

Map data types can be defined in the create table DDL as :-

“MAP”

For Example:-

create table example_table (id Int, name String, salary Int, salary_breakup
map, city String)

2. Data Loading Changes

Carbon should be able to support loading data into tables with Map type
columns from csv files. It should be possible to represent maps in a single
row of csv. This will need carbon to support specifying the delimiters for
:-

1. Between two Key-Value pairs

2. Between each Key and Value in a pair

As Carbon already supports Strut and Array Complex types, the data loading
process already provides support for defining delimiters for complex data
types. Carbon provides two Optional parameters for data loading

1. COMPLEX_DELIMITER_LEVEL_1: will define the delimiter between two
Key-Value pairs

OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$')

2. COMPLEX_DELIMITER_LEVEL_2: will define the delimiter between each
Key and Value in a pair

OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':')

With these delimiter options, the below map can be represented in csv:-

Fixed->100,000

Bonus->30,000

Stock->40,000

Fixed:100,000$Bonus:30,000$Stock:40,000 in the csv file.

3. Query Capabilities

A complex datatype like Map will require additional operators to be
supported in the query language to fully utilize the strength of the data
type.

Maps are sequence of key-value pairs, hence should support looking up value
for a given key. Users could use the ColumnName[“key”] syntax to lookup
values in a map column. For example: salary_breakup[“Fixed”] could be used
to fetch only the Fixed component in the salary breakup.

In Addition, we also need to define how maps can be used in existing
constructs such as select, where(filter), group by etc..
1. Select:- Map data type can be directly selected or only the value
for a given key can be selected as per the requirement. For example:-“Select
name, salary, salary_breakup” will return the content of map long with each
row.“Select name, salary, salary_breakup[“Fixed”]” will return only one
value from the map whose key is “Fixed”2. Filter:-Map data type cannot
be directly used in a where clause as where clause can operate only on
primitive data types. However the map lookup operator can be used in where
clauses. For example:-“Select name, salary where
salary_breakup[“Bonus”]>10,000”*Note: if the value is not of primitive
type, further assessor operators need to be used depending on the type of
value to arrive at a primitive type for the filter expression to be valid.*
3. Group By:- Just like with filters, maps cannot be directly used in a
group by clause, however the lookup operator can be used.

4. Functions:- A size() function can be provided for map types to
determine the number of key-value pairs in a map.
4. Storage changes

As Carbon is a columnar data store, Map values will be stored using 3
physical columns

1. One Column for representing the Map Data type. Will store the number
of fields and start index, just the same way as it is done for Struts and
Arrays.

2. One Column for the Key

3. One Column for the value, if the value is of primitive data type,
else the value itself will be multiple physical columns depending on the
data type of the value.

Map

Column_1

Column_2

Column_3

Map_Salary_Breakup

Map_Salary_Breakup.key

Map_Salary_Breakup.value

3,1

Fixed

1,00,000

Bonus

30,000

Stock

40,000

2,4

Fixed

1,40,000

Bonus

30,000

3,6

Fixed

1,20,000

Bonus

20,000

Stock

30,000

Regards
Vimal

Re: Discussion(New feature) Support Complex Data Type: Map in Carbon Data

Re: Discussion(New feature) Support Complex Data Type: Map in Carbon Data

Re: Discussion(New feature) Support Complex Data Type: Map in Carbon Data

Re: Discussion(New feature) Support Complex Data Type: Map in Carbon Data

Discussion(New feature) Support Complex Data Type: Map in Carbon Data

5 matches

Site Navigation

Mail list logo

Footer information