[jira] [Updated] (HIVE-13306) Better Decimal vectorization
[ https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-13306: -- Description: Decimal Vectorization Requirements • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, TimestampColumnVector classes store the data as primitive Java data types long, double, or byte arrays for efficiency. • DecimalColumnVector is different - it has an array of Object references to HiveDecimal objects. • The HiveDecimal object uses an internal object BigDecimal for its implementation. Further, BigDecimal itself uses an internal object BigInteger for its implementation, and BigInteger uses an int array. 4 objects total. • And, HiveDecimal is an immutable object which means arithmetic and other operations produce new HiveDecimal object with 3 new objects underneath. • A major reason Vectorization is fast is the ColumnVector classes except DecimalColumnVector do not have to allocate additional memory per row. This avoids memory fragmentation and pressure on the Java Garbage Collector that DecimalColumnVector can generate. It is very significant. • What can be done with DecimalColumnVector to make it much more efficient? o Design several new decimal classes that allow the caller to manage the decimal storage. o If it takes 2 long values to store a decimal then a new DecimalColumnVector would have a long[] of length 2*1024 (where 1024 is the default column vector size). o Why store a decimal in separate long values? • Java does not support 128 bit integers. • Java does not support unsigned integers. • Int array representation uses smaller memory, but long array representation covers wider value range for fast primitive operations. • But really since we do not have unsigned, really you can only do multiplications on N-1 bits or 63 bits. • So, 2 longs are needed for decimal storage of 38 digits. Future works o It makes sense to have just one algorithm for decimals rather than one for HiveDecimal and another for DecimalColumnVector. So, make HiveDecimal store 2 long values, too. o A lower level primitive decimal class would accept decimals stored as long arrays and produces results into long arrays. It would be used by HiveDecimal and DecimalColumnVector. was: Decimal Vectorization Requirements • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, TimestampColumnVector classes store the data as primitive Java data types long, double, or byte arrays for efficiency. • DecimalColumnVector is different - it has an array of Object references to HiveDecimal objects. • The HiveDecimal object uses an internal object BigDecimal for its implementation. Further, BigDecimal itself uses an internal object BigInteger for its implementation, and BigInteger uses an int array. 4 objects total. • And, HiveDecimal is an immutable object which means arithmetic and other operations produce new HiveDecimal object with 3 new objects underneath. • A major reason Vectorization is fast is the ColumnVector classes except DecimalColumnVector do not have to allocate additional memory per row. This avoids memory fragmentation and pressure on the Java Garbage Collector that DecimalColumnVector can generate. It is very significant. • What can be done with DecimalColumnVector to make it much more efficient? o Design several new decimal classes that allow the caller to manage the decimal storage. o If it takes N int values to store a decimal (e.g. N=1..5), then a new DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the default column vector size). o Why store a decimal in separate int values? • Java does not support 128 bit integers. • Java does not support unsigned integers. • In order to do multiplication of a decimal represented in a long you need twice the storage (i.e. 128 bits). So you need to represent parts in 32 bit integers. • But really since we do not have unsigned, really you can only do multiplications on N-1 bits or 31 bits. • So, 5 ints are needed for decimal storage... of 38 digits. o It makes sense to have just one algorithm for decimals rather than one for HiveDecimal and another for DecimalColumnVector. So, make HiveDecimal store N int values, too. o A lower level primitive decimal class would accept decimals stored as int arrays and produces results into int arrays. It would be used by HiveDecimal and DecimalColumnVector. > Better Decimal vectorization > > > Key: HIVE-13306 > URL: https://issues.apache.org/jira/browse/HIVE-13306 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >
[jira] [Updated] (HIVE-13306) Better Decimal vectorization
[ https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-13306: -- Attachment: HIVE-13306.5.patch > Better Decimal vectorization > > > Key: HIVE-13306 > URL: https://issues.apache.org/jira/browse/HIVE-13306 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-13306.1.patch, HIVE-13306.2.patch, > HIVE-13306.3.patch, HIVE-13306.4.patch, HIVE-13306.5.patch > > > Decimal Vectorization Requirements > • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, > TimestampColumnVector classes store the data as primitive Java data types > long, double, or byte arrays for efficiency. > • DecimalColumnVector is different - it has an array of Object references > to HiveDecimal objects. > • The HiveDecimal object uses an internal object BigDecimal for its > implementation. Further, BigDecimal itself uses an internal object > BigInteger for its implementation, and BigInteger uses an int array. 4 > objects total. > • And, HiveDecimal is an immutable object which means arithmetic and > other operations produce new HiveDecimal object with 3 new objects underneath. > • A major reason Vectorization is fast is the ColumnVector classes except > DecimalColumnVector do not have to allocate additional memory per row. This > avoids memory fragmentation and pressure on the Java Garbage Collector that > DecimalColumnVector can generate. It is very significant. > • What can be done with DecimalColumnVector to make it much more > efficient? > o Design several new decimal classes that allow the caller to manage the > decimal storage. > o If it takes N int values to store a decimal (e.g. N=1..5), then a new > DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the > default column vector size). > o Why store a decimal in separate int values? > • Java does not support 128 bit integers. > • Java does not support unsigned integers. > • In order to do multiplication of a decimal represented in a long you > need twice the storage (i.e. 128 bits). So you need to represent parts in 32 > bit integers. > • But really since we do not have unsigned, really you can only do > multiplications on N-1 bits or 31 bits. > • So, 5 ints are needed for decimal storage... of 38 digits. > o It makes sense to have just one algorithm for decimals rather than one > for HiveDecimal and another for DecimalColumnVector. So, make HiveDecimal > store N int values, too. > o A lower level primitive decimal class would accept decimals stored as > int arrays and produces results into int arrays. It would be used by > HiveDecimal and DecimalColumnVector. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13306) Better Decimal vectorization
[ https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-13306: -- Attachment: HIVE-13306.4.patch {noformat} Benchmark Mode SamplesScoreError Units o.a.h.b.v.VectorizedDecimalBench.DecimalColAdd128ColNewBench.benchavgt 10 125432.861 ± 103309.156 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColAdd128ColOldBench.benchavgt 10 2232555.450 ± 762572.051 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColAdd64ColNewBench.bench avgt 10 4357.643 ±556.718 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColAdd64ColOldBench.bench avgt 10 489554.055 ± 149226.021 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColDiv128By16ColNewBench.benchavgt 10 181819.546 ± 21990.896 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColDiv128By16ColOldBench.benchavgt 10 1526826.250 ± 83937.964 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColDiv128ColNewBench.benchavgt 10 368991.791 ± 29543.595 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColDiv128ColOldBench.benchavgt 10 1559152.400 ± 102530.203 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColDiv64ColNewBench.bench avgt 1036004.327 ± 1297.898 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColDiv64ColOldBench.bench avgt 10 1342905.950 ± 258527.407 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColMul128ColNewBench.benchavgt 10 150020.394 ± 14490.045 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColMul128ColOldBench.benchavgt 10 948766.333 ± 49017.424 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColMul64ColNewBench.bench avgt 10 4190.397 ±305.294 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColMul64ColOldBench.bench avgt 10 1065696.767 ± 67010.116 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColSub128ColNewBench.benchavgt 10 113723.319 ± 112854.654 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColSub128ColOldBench.benchavgt 10 1384364.200 ± 103055.925 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColSub64ColNewBench.bench avgt 10 4212.439 ±165.751 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalColSub64ColOldBench.bench avgt 10 863108.092 ± 59991.382 ns/op o.a.h.b.v.VectorizedDecimalBench.DecimalToString128ColBench.bench avgt 10 883048.582 ± 650952.092 ns/op {noformat} This patch passed all unit tests and integration tests on my laptop. 64 bit arithmetic operations are 50-250 times faster. 128 bit ones are 5-20 times faster. I will see the result in the integration test server. > Better Decimal vectorization > > > Key: HIVE-13306 > URL: https://issues.apache.org/jira/browse/HIVE-13306 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-13306.1.patch, HIVE-13306.2.patch, > HIVE-13306.3.patch, HIVE-13306.4.patch > > > Decimal Vectorization Requirements > • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, > TimestampColumnVector classes store the data as primitive Java data types > long, double, or byte arrays for efficiency. > • DecimalColumnVector is different - it has an array of Object references > to HiveDecimal objects. > • The HiveDecimal object uses an internal object BigDecimal for its > implementation. Further, BigDecimal itself uses an internal object > BigInteger for its implementation, and BigInteger uses an int array. 4 > objects total. > • And, HiveDecimal is an immutable object which means arithmetic and > other operations produce new HiveDecimal object with 3 new objects underneath. > • A major reason Vectorization is fast is the ColumnVector classes except > DecimalColumnVector do not have to allocate additional memory per row. This > avoids memory fragmentation and pressure on the Java Garbage Collector that > DecimalColumnVector can generate. It is very significant. > • What can be done with DecimalColumnVector to make it much more > efficient? > o Design several new decimal classes that allow the caller to manage the > decimal storage. > o If it takes N int values to store a decimal (e.g. N=1..5), then a new > DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the > default column vector size). > o Why store a decimal in separate int values? > • Java does not support 128 bit integers. > • Java does not support unsigned integers. > • In order to do multiplication of a decimal represented in
[jira] [Updated] (HIVE-13306) Better Decimal vectorization
[ https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-13306: -- Attachment: HIVE-13306.3.patch Implemented with long arrays > Better Decimal vectorization > > > Key: HIVE-13306 > URL: https://issues.apache.org/jira/browse/HIVE-13306 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-13306.1.patch, HIVE-13306.2.patch, > HIVE-13306.3.patch > > > Decimal Vectorization Requirements > • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, > TimestampColumnVector classes store the data as primitive Java data types > long, double, or byte arrays for efficiency. > • DecimalColumnVector is different - it has an array of Object references > to HiveDecimal objects. > • The HiveDecimal object uses an internal object BigDecimal for its > implementation. Further, BigDecimal itself uses an internal object > BigInteger for its implementation, and BigInteger uses an int array. 4 > objects total. > • And, HiveDecimal is an immutable object which means arithmetic and > other operations produce new HiveDecimal object with 3 new objects underneath. > • A major reason Vectorization is fast is the ColumnVector classes except > DecimalColumnVector do not have to allocate additional memory per row. This > avoids memory fragmentation and pressure on the Java Garbage Collector that > DecimalColumnVector can generate. It is very significant. > • What can be done with DecimalColumnVector to make it much more > efficient? > o Design several new decimal classes that allow the caller to manage the > decimal storage. > o If it takes N int values to store a decimal (e.g. N=1..5), then a new > DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the > default column vector size). > o Why store a decimal in separate int values? > • Java does not support 128 bit integers. > • Java does not support unsigned integers. > • In order to do multiplication of a decimal represented in a long you > need twice the storage (i.e. 128 bits). So you need to represent parts in 32 > bit integers. > • But really since we do not have unsigned, really you can only do > multiplications on N-1 bits or 31 bits. > • So, 5 ints are needed for decimal storage... of 38 digits. > o It makes sense to have just one algorithm for decimals rather than one > for HiveDecimal and another for DecimalColumnVector. So, make HiveDecimal > store N int values, too. > o A lower level primitive decimal class would accept decimals stored as > int arrays and produces results into int arrays. It would be used by > HiveDecimal and DecimalColumnVector. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13306) Better Decimal vectorization
[ https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-13306: -- Status: Patch Available (was: In Progress) > Better Decimal vectorization > > > Key: HIVE-13306 > URL: https://issues.apache.org/jira/browse/HIVE-13306 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-13306.1.patch, HIVE-13306.2.patch > > > Decimal Vectorization Requirements > • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, > TimestampColumnVector classes store the data as primitive Java data types > long, double, or byte arrays for efficiency. > • DecimalColumnVector is different - it has an array of Object references > to HiveDecimal objects. > • The HiveDecimal object uses an internal object BigDecimal for its > implementation. Further, BigDecimal itself uses an internal object > BigInteger for its implementation, and BigInteger uses an int array. 4 > objects total. > • And, HiveDecimal is an immutable object which means arithmetic and > other operations produce new HiveDecimal object with 3 new objects underneath. > • A major reason Vectorization is fast is the ColumnVector classes except > DecimalColumnVector do not have to allocate additional memory per row. This > avoids memory fragmentation and pressure on the Java Garbage Collector that > DecimalColumnVector can generate. It is very significant. > • What can be done with DecimalColumnVector to make it much more > efficient? > o Design several new decimal classes that allow the caller to manage the > decimal storage. > o If it takes N int values to store a decimal (e.g. N=1..5), then a new > DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the > default column vector size). > o Why store a decimal in separate int values? > • Java does not support 128 bit integers. > • Java does not support unsigned integers. > • In order to do multiplication of a decimal represented in a long you > need twice the storage (i.e. 128 bits). So you need to represent parts in 32 > bit integers. > • But really since we do not have unsigned, really you can only do > multiplications on N-1 bits or 31 bits. > • So, 5 ints are needed for decimal storage... of 38 digits. > o It makes sense to have just one algorithm for decimals rather than one > for HiveDecimal and another for DecimalColumnVector. So, make HiveDecimal > store N int values, too. > o A lower level primitive decimal class would accept decimals stored as > int arrays and produces results into int arrays. It would be used by > HiveDecimal and DecimalColumnVector. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13306) Better Decimal vectorization
[ https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-13306: -- Attachment: HIVE-13306.2.patch This patch is more improved implementation of new decimal vectorization. I wanted to see whether it passes the integration test. However, it still needs to be integrated with the execution engine. I will keep working on this topic. > Better Decimal vectorization > > > Key: HIVE-13306 > URL: https://issues.apache.org/jira/browse/HIVE-13306 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-13306.1.patch, HIVE-13306.2.patch > > > Decimal Vectorization Requirements > • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, > TimestampColumnVector classes store the data as primitive Java data types > long, double, or byte arrays for efficiency. > • DecimalColumnVector is different - it has an array of Object references > to HiveDecimal objects. > • The HiveDecimal object uses an internal object BigDecimal for its > implementation. Further, BigDecimal itself uses an internal object > BigInteger for its implementation, and BigInteger uses an int array. 4 > objects total. > • And, HiveDecimal is an immutable object which means arithmetic and > other operations produce new HiveDecimal object with 3 new objects underneath. > • A major reason Vectorization is fast is the ColumnVector classes except > DecimalColumnVector do not have to allocate additional memory per row. This > avoids memory fragmentation and pressure on the Java Garbage Collector that > DecimalColumnVector can generate. It is very significant. > • What can be done with DecimalColumnVector to make it much more > efficient? > o Design several new decimal classes that allow the caller to manage the > decimal storage. > o If it takes N int values to store a decimal (e.g. N=1..5), then a new > DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the > default column vector size). > o Why store a decimal in separate int values? > • Java does not support 128 bit integers. > • Java does not support unsigned integers. > • In order to do multiplication of a decimal represented in a long you > need twice the storage (i.e. 128 bits). So you need to represent parts in 32 > bit integers. > • But really since we do not have unsigned, really you can only do > multiplications on N-1 bits or 31 bits. > • So, 5 ints are needed for decimal storage... of 38 digits. > o It makes sense to have just one algorithm for decimals rather than one > for HiveDecimal and another for DecimalColumnVector. So, make HiveDecimal > store N int values, too. > o A lower level primitive decimal class would accept decimals stored as > int arrays and produces results into int arrays. It would be used by > HiveDecimal and DecimalColumnVector. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13306) Better Decimal vectorization
[ https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-13306: -- Attachment: HIVE-13306.1.patch It's a working draft. It shows 70x addition performance, 3x multiplication and 2x division performance regarding to existing implementations. I will modify this code further for wider use cases and more performance and more readability. Thanks. :) {noformat} # Run complete. Total time: 00:02:30 Benchmark Mode SamplesScore Error Units o.a.h.b.v.VectorizedArithmeticBench.DecimalColAddDecimalColColumnBench.bench avgt2 4012665235.500 ± NaN ns/op o.a.h.b.v.VectorizedArithmeticBench.DecimalColDivideDecimalColColumnBench.bench avgt2 19167315269.000 ± NaN ns/op o.a.h.b.v.VectorizedArithmeticBench.DecimalColMultiplyDecimalColColumnBench.bench avgt2 3391096996.500 ± NaN ns/op o.a.h.b.v.VectorizedArithmeticBench.DecimalV2ColAddDecimalColColumnBench.bench avgt2 56848247.500 ± NaN ns/op o.a.h.b.v.VectorizedArithmeticBench.DecimalV2ColDivideDecimalColColumnBench.bench avgt2 9162374089.500 ± NaN ns/op o.a.h.b.v.VectorizedArithmeticBench.DecimalV2ColMultiplyDecimalColColumnBench.bench avgt2 1146261770.500 ± NaN ns/op {noformat} > Better Decimal vectorization > > > Key: HIVE-13306 > URL: https://issues.apache.org/jira/browse/HIVE-13306 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-13306.1.patch > > > Decimal Vectorization Requirements > • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, > TimestampColumnVector classes store the data as primitive Java data types > long, double, or byte arrays for efficiency. > • DecimalColumnVector is different - it has an array of Object references > to HiveDecimal objects. > • The HiveDecimal object uses an internal object BigDecimal for its > implementation. Further, BigDecimal itself uses an internal object > BigInteger for its implementation, and BigInteger uses an int array. 4 > objects total. > • And, HiveDecimal is an immutable object which means arithmetic and > other operations produce new HiveDecimal object with 3 new objects underneath. > • A major reason Vectorization is fast is the ColumnVector classes except > DecimalColumnVector do not have to allocate additional memory per row. This > avoids memory fragmentation and pressure on the Java Garbage Collector that > DecimalColumnVector can generate. It is very significant. > • What can be done with DecimalColumnVector to make it much more > efficient? > o Design several new decimal classes that allow the caller to manage the > decimal storage. > o If it takes N int values to store a decimal (e.g. N=1..5), then a new > DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the > default column vector size). > o Why store a decimal in separate int values? > • Java does not support 128 bit integers. > • Java does not support unsigned integers. > • In order to do multiplication of a decimal represented in a long you > need twice the storage (i.e. 128 bits). So you need to represent parts in 32 > bit integers. > • But really since we do not have unsigned, really you can only do > multiplications on N-1 bits or 31 bits. > • So, 5 ints are needed for decimal storage... of 38 digits. > o It makes sense to have just one algorithm for decimals rather than one > for HiveDecimal and another for DecimalColumnVector. So, make HiveDecimal > store N int values, too. > o A lower level primitive decimal class would accept decimals stored as > int arrays and produces results into int arrays. It would be used by > HiveDecimal and DecimalColumnVector. -- This message was sent by Atlassian JIRA (v6.3.4#6332)