[jira] [Updated] (HIVE-13306) Better Decimal vectorization

2016-12-09 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-13306:
--
Description: 
Decimal Vectorization Requirements

•   Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, 
TimestampColumnVector classes store the data as primitive Java data types long, 
double, or byte arrays for efficiency.
•   DecimalColumnVector is different - it has an array of Object references 
to HiveDecimal objects.
•   The HiveDecimal object uses an internal object BigDecimal for its 
implementation.  Further, BigDecimal itself uses an internal object BigInteger 
for its implementation, and BigInteger uses an int array.  4 objects total.
•   And, HiveDecimal is an immutable object which means arithmetic and 
other operations produce new HiveDecimal object with 3 new objects underneath.
•   A major reason Vectorization is fast is the ColumnVector classes except 
DecimalColumnVector do not have to allocate additional memory per row.   This 
avoids memory fragmentation and pressure on the Java Garbage Collector that 
DecimalColumnVector can generate.  It is very significant.
•   What can be done with DecimalColumnVector to make it much more 
efficient?
o   Design several new decimal classes that allow the caller to manage the 
decimal storage.
o   If it takes 2 long values to store a decimal then a new 
DecimalColumnVector would have a long[] of length 2*1024 (where 1024 is the 
default column vector size).
o   Why store a decimal in separate long values?
•   Java does not support 128 bit integers.
•   Java does not support unsigned integers.
•   Int array representation uses smaller memory, but long array 
representation covers wider value range for fast primitive operations.
•   But really since we do not have unsigned, really you can only do 
multiplications on N-1 bits or 63 bits.
•   So, 2 longs are needed for decimal storage of 38 digits.

Future works
o   It makes sense to have just one algorithm for decimals rather than one 
for HiveDecimal and another for DecimalColumnVector.  So, make HiveDecimal 
store 2 long values, too.
o   A lower level primitive decimal class would accept decimals stored as 
long arrays and produces results into long arrays.  It would be used by 
HiveDecimal and DecimalColumnVector.

  was:
Decimal Vectorization Requirements

•   Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, 
TimestampColumnVector classes store the data as primitive Java data types long, 
double, or byte arrays for efficiency.
•   DecimalColumnVector is different - it has an array of Object references 
to HiveDecimal objects.
•   The HiveDecimal object uses an internal object BigDecimal for its 
implementation.  Further, BigDecimal itself uses an internal object BigInteger 
for its implementation, and BigInteger uses an int array.  4 objects total.
•   And, HiveDecimal is an immutable object which means arithmetic and 
other operations produce new HiveDecimal object with 3 new objects underneath.
•   A major reason Vectorization is fast is the ColumnVector classes except 
DecimalColumnVector do not have to allocate additional memory per row.   This 
avoids memory fragmentation and pressure on the Java Garbage Collector that 
DecimalColumnVector can generate.  It is very significant.
•   What can be done with DecimalColumnVector to make it much more 
efficient?
o   Design several new decimal classes that allow the caller to manage the 
decimal storage.
o   If it takes N int values to store a decimal (e.g. N=1..5), then a new 
DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the 
default column vector size).
o   Why store a decimal in separate int values?
•   Java does not support 128 bit integers.
•   Java does not support unsigned integers.
•   In order to do multiplication of a decimal represented in a long you 
need twice the storage (i.e. 128 bits).  So you need to represent parts in 32 
bit integers.
•   But really since we do not have unsigned, really you can only do 
multiplications on N-1 bits or 31 bits.
•   So, 5 ints are needed for decimal storage... of 38 digits.
o   It makes sense to have just one algorithm for decimals rather than one 
for HiveDecimal and another for DecimalColumnVector.  So, make HiveDecimal 
store N int values, too.
o   A lower level primitive decimal class would accept decimals stored as 
int arrays and produces results into int arrays.  It would be used by 
HiveDecimal and DecimalColumnVector.



> Better Decimal vectorization
> 
>
> Key: HIVE-13306
> URL: https://issues.apache.org/jira/browse/HIVE-13306
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>  

[jira] [Updated] (HIVE-13306) Better Decimal vectorization

2016-12-07 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-13306:
--
Attachment: HIVE-13306.5.patch

> Better Decimal vectorization
> 
>
> Key: HIVE-13306
> URL: https://issues.apache.org/jira/browse/HIVE-13306
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-13306.1.patch, HIVE-13306.2.patch, 
> HIVE-13306.3.patch, HIVE-13306.4.patch, HIVE-13306.5.patch
>
>
> Decimal Vectorization Requirements
> • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, 
> TimestampColumnVector classes store the data as primitive Java data types 
> long, double, or byte arrays for efficiency.
> • DecimalColumnVector is different - it has an array of Object references 
> to HiveDecimal objects.
> • The HiveDecimal object uses an internal object BigDecimal for its 
> implementation.  Further, BigDecimal itself uses an internal object 
> BigInteger for its implementation, and BigInteger uses an int array.  4 
> objects total.
> • And, HiveDecimal is an immutable object which means arithmetic and 
> other operations produce new HiveDecimal object with 3 new objects underneath.
> • A major reason Vectorization is fast is the ColumnVector classes except 
> DecimalColumnVector do not have to allocate additional memory per row.   This 
> avoids memory fragmentation and pressure on the Java Garbage Collector that 
> DecimalColumnVector can generate.  It is very significant.
> • What can be done with DecimalColumnVector to make it much more 
> efficient?
> o Design several new decimal classes that allow the caller to manage the 
> decimal storage.
> o If it takes N int values to store a decimal (e.g. N=1..5), then a new 
> DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the 
> default column vector size).
> o Why store a decimal in separate int values?
> • Java does not support 128 bit integers.
> • Java does not support unsigned integers.
> • In order to do multiplication of a decimal represented in a long you 
> need twice the storage (i.e. 128 bits).  So you need to represent parts in 32 
> bit integers.
> • But really since we do not have unsigned, really you can only do 
> multiplications on N-1 bits or 31 bits.
> • So, 5 ints are needed for decimal storage... of 38 digits.
> o It makes sense to have just one algorithm for decimals rather than one 
> for HiveDecimal and another for DecimalColumnVector.  So, make HiveDecimal 
> store N int values, too.
> o A lower level primitive decimal class would accept decimals stored as 
> int arrays and produces results into int arrays.  It would be used by 
> HiveDecimal and DecimalColumnVector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13306) Better Decimal vectorization

2016-12-06 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-13306:
--
Attachment: HIVE-13306.4.patch

{noformat}
Benchmark Mode  
SamplesScoreError  Units
o.a.h.b.v.VectorizedDecimalBench.DecimalColAdd128ColNewBench.benchavgt  
 10   125432.861 ± 103309.156  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColAdd128ColOldBench.benchavgt  
 10  2232555.450 ± 762572.051  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColAdd64ColNewBench.bench avgt  
 10 4357.643 ±556.718  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColAdd64ColOldBench.bench avgt  
 10   489554.055 ± 149226.021  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColDiv128By16ColNewBench.benchavgt  
 10   181819.546 ±  21990.896  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColDiv128By16ColOldBench.benchavgt  
 10  1526826.250 ±  83937.964  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColDiv128ColNewBench.benchavgt  
 10   368991.791 ±  29543.595  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColDiv128ColOldBench.benchavgt  
 10  1559152.400 ± 102530.203  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColDiv64ColNewBench.bench avgt  
 1036004.327 ±   1297.898  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColDiv64ColOldBench.bench avgt  
 10  1342905.950 ± 258527.407  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColMul128ColNewBench.benchavgt  
 10   150020.394 ±  14490.045  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColMul128ColOldBench.benchavgt  
 10   948766.333 ±  49017.424  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColMul64ColNewBench.bench avgt  
 10 4190.397 ±305.294  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColMul64ColOldBench.bench avgt  
 10  1065696.767 ±  67010.116  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColSub128ColNewBench.benchavgt  
 10   113723.319 ± 112854.654  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColSub128ColOldBench.benchavgt  
 10  1384364.200 ± 103055.925  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColSub64ColNewBench.bench avgt  
 10 4212.439 ±165.751  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalColSub64ColOldBench.bench avgt  
 10   863108.092 ±  59991.382  ns/op
o.a.h.b.v.VectorizedDecimalBench.DecimalToString128ColBench.bench avgt  
 10   883048.582 ± 650952.092  ns/op
{noformat}
This patch passed all unit tests and integration tests on my laptop. 64 bit 
arithmetic operations are 50-250 times faster. 128 bit ones are 5-20 times 
faster. I will see the result in the integration test server. 

> Better Decimal vectorization
> 
>
> Key: HIVE-13306
> URL: https://issues.apache.org/jira/browse/HIVE-13306
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-13306.1.patch, HIVE-13306.2.patch, 
> HIVE-13306.3.patch, HIVE-13306.4.patch
>
>
> Decimal Vectorization Requirements
> • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, 
> TimestampColumnVector classes store the data as primitive Java data types 
> long, double, or byte arrays for efficiency.
> • DecimalColumnVector is different - it has an array of Object references 
> to HiveDecimal objects.
> • The HiveDecimal object uses an internal object BigDecimal for its 
> implementation.  Further, BigDecimal itself uses an internal object 
> BigInteger for its implementation, and BigInteger uses an int array.  4 
> objects total.
> • And, HiveDecimal is an immutable object which means arithmetic and 
> other operations produce new HiveDecimal object with 3 new objects underneath.
> • A major reason Vectorization is fast is the ColumnVector classes except 
> DecimalColumnVector do not have to allocate additional memory per row.   This 
> avoids memory fragmentation and pressure on the Java Garbage Collector that 
> DecimalColumnVector can generate.  It is very significant.
> • What can be done with DecimalColumnVector to make it much more 
> efficient?
> o Design several new decimal classes that allow the caller to manage the 
> decimal storage.
> o If it takes N int values to store a decimal (e.g. N=1..5), then a new 
> DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the 
> default column vector size).
> o Why store a decimal in separate int values?
> • Java does not support 128 bit integers.
> • Java does not support unsigned integers.
> • In order to do multiplication of a decimal represented in 

[jira] [Updated] (HIVE-13306) Better Decimal vectorization

2016-10-05 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-13306:
--
Attachment: HIVE-13306.3.patch

Implemented with long arrays

> Better Decimal vectorization
> 
>
> Key: HIVE-13306
> URL: https://issues.apache.org/jira/browse/HIVE-13306
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-13306.1.patch, HIVE-13306.2.patch, 
> HIVE-13306.3.patch
>
>
> Decimal Vectorization Requirements
> • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, 
> TimestampColumnVector classes store the data as primitive Java data types 
> long, double, or byte arrays for efficiency.
> • DecimalColumnVector is different - it has an array of Object references 
> to HiveDecimal objects.
> • The HiveDecimal object uses an internal object BigDecimal for its 
> implementation.  Further, BigDecimal itself uses an internal object 
> BigInteger for its implementation, and BigInteger uses an int array.  4 
> objects total.
> • And, HiveDecimal is an immutable object which means arithmetic and 
> other operations produce new HiveDecimal object with 3 new objects underneath.
> • A major reason Vectorization is fast is the ColumnVector classes except 
> DecimalColumnVector do not have to allocate additional memory per row.   This 
> avoids memory fragmentation and pressure on the Java Garbage Collector that 
> DecimalColumnVector can generate.  It is very significant.
> • What can be done with DecimalColumnVector to make it much more 
> efficient?
> o Design several new decimal classes that allow the caller to manage the 
> decimal storage.
> o If it takes N int values to store a decimal (e.g. N=1..5), then a new 
> DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the 
> default column vector size).
> o Why store a decimal in separate int values?
> • Java does not support 128 bit integers.
> • Java does not support unsigned integers.
> • In order to do multiplication of a decimal represented in a long you 
> need twice the storage (i.e. 128 bits).  So you need to represent parts in 32 
> bit integers.
> • But really since we do not have unsigned, really you can only do 
> multiplications on N-1 bits or 31 bits.
> • So, 5 ints are needed for decimal storage... of 38 digits.
> o It makes sense to have just one algorithm for decimals rather than one 
> for HiveDecimal and another for DecimalColumnVector.  So, make HiveDecimal 
> store N int values, too.
> o A lower level primitive decimal class would accept decimals stored as 
> int arrays and produces results into int arrays.  It would be used by 
> HiveDecimal and DecimalColumnVector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13306) Better Decimal vectorization

2016-05-22 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-13306:
--
Status: Patch Available  (was: In Progress)

> Better Decimal vectorization
> 
>
> Key: HIVE-13306
> URL: https://issues.apache.org/jira/browse/HIVE-13306
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-13306.1.patch, HIVE-13306.2.patch
>
>
> Decimal Vectorization Requirements
> • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, 
> TimestampColumnVector classes store the data as primitive Java data types 
> long, double, or byte arrays for efficiency.
> • DecimalColumnVector is different - it has an array of Object references 
> to HiveDecimal objects.
> • The HiveDecimal object uses an internal object BigDecimal for its 
> implementation.  Further, BigDecimal itself uses an internal object 
> BigInteger for its implementation, and BigInteger uses an int array.  4 
> objects total.
> • And, HiveDecimal is an immutable object which means arithmetic and 
> other operations produce new HiveDecimal object with 3 new objects underneath.
> • A major reason Vectorization is fast is the ColumnVector classes except 
> DecimalColumnVector do not have to allocate additional memory per row.   This 
> avoids memory fragmentation and pressure on the Java Garbage Collector that 
> DecimalColumnVector can generate.  It is very significant.
> • What can be done with DecimalColumnVector to make it much more 
> efficient?
> o Design several new decimal classes that allow the caller to manage the 
> decimal storage.
> o If it takes N int values to store a decimal (e.g. N=1..5), then a new 
> DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the 
> default column vector size).
> o Why store a decimal in separate int values?
> • Java does not support 128 bit integers.
> • Java does not support unsigned integers.
> • In order to do multiplication of a decimal represented in a long you 
> need twice the storage (i.e. 128 bits).  So you need to represent parts in 32 
> bit integers.
> • But really since we do not have unsigned, really you can only do 
> multiplications on N-1 bits or 31 bits.
> • So, 5 ints are needed for decimal storage... of 38 digits.
> o It makes sense to have just one algorithm for decimals rather than one 
> for HiveDecimal and another for DecimalColumnVector.  So, make HiveDecimal 
> store N int values, too.
> o A lower level primitive decimal class would accept decimals stored as 
> int arrays and produces results into int arrays.  It would be used by 
> HiveDecimal and DecimalColumnVector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13306) Better Decimal vectorization

2016-05-22 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-13306:
--
Attachment: HIVE-13306.2.patch

This patch is more improved implementation of new decimal vectorization. I 
wanted to see whether it passes the integration test.

However, it still needs to be integrated with the execution engine. I will keep 
working on this topic.

> Better Decimal vectorization
> 
>
> Key: HIVE-13306
> URL: https://issues.apache.org/jira/browse/HIVE-13306
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-13306.1.patch, HIVE-13306.2.patch
>
>
> Decimal Vectorization Requirements
> • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, 
> TimestampColumnVector classes store the data as primitive Java data types 
> long, double, or byte arrays for efficiency.
> • DecimalColumnVector is different - it has an array of Object references 
> to HiveDecimal objects.
> • The HiveDecimal object uses an internal object BigDecimal for its 
> implementation.  Further, BigDecimal itself uses an internal object 
> BigInteger for its implementation, and BigInteger uses an int array.  4 
> objects total.
> • And, HiveDecimal is an immutable object which means arithmetic and 
> other operations produce new HiveDecimal object with 3 new objects underneath.
> • A major reason Vectorization is fast is the ColumnVector classes except 
> DecimalColumnVector do not have to allocate additional memory per row.   This 
> avoids memory fragmentation and pressure on the Java Garbage Collector that 
> DecimalColumnVector can generate.  It is very significant.
> • What can be done with DecimalColumnVector to make it much more 
> efficient?
> o Design several new decimal classes that allow the caller to manage the 
> decimal storage.
> o If it takes N int values to store a decimal (e.g. N=1..5), then a new 
> DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the 
> default column vector size).
> o Why store a decimal in separate int values?
> • Java does not support 128 bit integers.
> • Java does not support unsigned integers.
> • In order to do multiplication of a decimal represented in a long you 
> need twice the storage (i.e. 128 bits).  So you need to represent parts in 32 
> bit integers.
> • But really since we do not have unsigned, really you can only do 
> multiplications on N-1 bits or 31 bits.
> • So, 5 ints are needed for decimal storage... of 38 digits.
> o It makes sense to have just one algorithm for decimals rather than one 
> for HiveDecimal and another for DecimalColumnVector.  So, make HiveDecimal 
> store N int values, too.
> o A lower level primitive decimal class would accept decimals stored as 
> int arrays and produces results into int arrays.  It would be used by 
> HiveDecimal and DecimalColumnVector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13306) Better Decimal vectorization

2016-05-16 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-13306:
--
Attachment: HIVE-13306.1.patch

It's a working draft. It shows 70x addition performance, 3x multiplication and 
2x division performance regarding to existing implementations. I will modify 
this code further for wider use cases and more performance and more 
readability. Thanks. :)

{noformat}
# Run complete. Total time: 00:02:30

Benchmark   
   Mode  SamplesScore   Error  Units
o.a.h.b.v.VectorizedArithmeticBench.DecimalColAddDecimalColColumnBench.bench
   avgt2   4012665235.500 ±   NaN  ns/op
o.a.h.b.v.VectorizedArithmeticBench.DecimalColDivideDecimalColColumnBench.bench 
   avgt2  19167315269.000 ±   NaN  ns/op
o.a.h.b.v.VectorizedArithmeticBench.DecimalColMultiplyDecimalColColumnBench.bench
  avgt2   3391096996.500 ±   NaN  ns/op
o.a.h.b.v.VectorizedArithmeticBench.DecimalV2ColAddDecimalColColumnBench.bench  
   avgt2 56848247.500 ±   NaN  ns/op
o.a.h.b.v.VectorizedArithmeticBench.DecimalV2ColDivideDecimalColColumnBench.bench
  avgt2   9162374089.500 ±   NaN  ns/op
o.a.h.b.v.VectorizedArithmeticBench.DecimalV2ColMultiplyDecimalColColumnBench.bench
avgt2   1146261770.500 ±   NaN  ns/op
{noformat}

> Better Decimal vectorization
> 
>
> Key: HIVE-13306
> URL: https://issues.apache.org/jira/browse/HIVE-13306
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-13306.1.patch
>
>
> Decimal Vectorization Requirements
> • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, 
> TimestampColumnVector classes store the data as primitive Java data types 
> long, double, or byte arrays for efficiency.
> • DecimalColumnVector is different - it has an array of Object references 
> to HiveDecimal objects.
> • The HiveDecimal object uses an internal object BigDecimal for its 
> implementation.  Further, BigDecimal itself uses an internal object 
> BigInteger for its implementation, and BigInteger uses an int array.  4 
> objects total.
> • And, HiveDecimal is an immutable object which means arithmetic and 
> other operations produce new HiveDecimal object with 3 new objects underneath.
> • A major reason Vectorization is fast is the ColumnVector classes except 
> DecimalColumnVector do not have to allocate additional memory per row.   This 
> avoids memory fragmentation and pressure on the Java Garbage Collector that 
> DecimalColumnVector can generate.  It is very significant.
> • What can be done with DecimalColumnVector to make it much more 
> efficient?
> o Design several new decimal classes that allow the caller to manage the 
> decimal storage.
> o If it takes N int values to store a decimal (e.g. N=1..5), then a new 
> DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the 
> default column vector size).
> o Why store a decimal in separate int values?
> • Java does not support 128 bit integers.
> • Java does not support unsigned integers.
> • In order to do multiplication of a decimal represented in a long you 
> need twice the storage (i.e. 128 bits).  So you need to represent parts in 32 
> bit integers.
> • But really since we do not have unsigned, really you can only do 
> multiplications on N-1 bits or 31 bits.
> • So, 5 ints are needed for decimal storage... of 38 digits.
> o It makes sense to have just one algorithm for decimals rather than one 
> for HiveDecimal and another for DecimalColumnVector.  So, make HiveDecimal 
> store N int values, too.
> o A lower level primitive decimal class would accept decimals stored as 
> int arrays and produces results into int arrays.  It would be used by 
> HiveDecimal and DecimalColumnVector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)