[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide
[ https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968467#comment-15968467 ] Colin Ma edited comment on HIVE-16311 at 4/14/17 1:14 AM: -- [~mmccline], [~xuefuz], any comments for the latest patch? Can you help to commit the patch to the upstream, thanks for your help. was (Author: colinma): [~mmccline], [~xuefuz], any comments for the latest patch? > Improve the performance for FastHiveDecimalImpl.fastDivide > -- > > Key: HIVE-16311 > URL: https://issues.apache.org/jira/browse/HIVE-16311 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Colin Ma >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, > HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.005.patch, > HIVE-16311.006.patch, HIVE-16311.007.patch, HIVE-16311.008.patch, > HIVE-16311.withTrailingZero.patch > > > FastHiveDecimalImpl.fastDivide is poor performance when evaluate the > expression as 12345.67/123.45 > There are 2 points can be improved: > 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the > BigDecimal.divide. > 2. Get the precision for BigInteger in a fast way if possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide
[ https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958971#comment-15958971 ] Matt McCline edited comment on HIVE-16311 at 4/6/17 2:03 PM: - I'm beginning to seriously wonder if there should be any change to HiveDecimal division. Rather, the formula max(6, s1 + p2 + 1) perhaps should be in Hive where [~jdere] multiplication scale changes were and enforced in the result decimal type precision/scale (via enforcePrecisionScale). was (Author: mmccline): I'm beginning to seriously wonder if there should be any change to HiveDecimal division. Rather, the formula max(6, s1 + p2 + 1) perhaps should be in Hive where [~jdere] multiplication scale changes were and enforced in the result decimal type precision/scale. > Improve the performance for FastHiveDecimalImpl.fastDivide > -- > > Key: HIVE-16311 > URL: https://issues.apache.org/jira/browse/HIVE-16311 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Colin Ma >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, > HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.005.patch, > HIVE-16311.withTrailingZero.patch > > > FastHiveDecimalImpl.fastDivide is poor performance when evaluate the > expression as 12345.67/123.45 > There are 2 points can be improved: > 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the > BigDecimal.divide. > 2. Get the precision for BigInteger in a fast way if possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide
[ https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958772#comment-15958772 ] Matt McCline edited comment on HIVE-16311 at 4/6/17 12:52 PM: -- Fundamental to the OldHiveDecimal and new HiveDecimal is the idea of a floating scale during from various calculations (and without any trailing fractional zeroes) and the use of enforcePrecisionScale afterwards to adjust a decimal to fit a precision/scale and later toFormatString to add trailing fractional zeroes in the display. So, trying to "remember" the original number of trailing zeroes is not going to work and not part of the current design. (See HiveDecimal's header comments). So the new member fastTrailingZero needs to be removed. was (Author: mmccline): Fundamental to the OldHiveDecimal and new HiveDecimal is the idea of a floating scale during from various calculations (and without any trailing fractional zeroes) and the use of enforcePrecisionScale afterwards. So, trying to "remember" the original number of trailing zeroes is not going to work and not part of the current design. (See HiveDecimal's header comments). So the new member fastTrailingZero needs to be removed. > Improve the performance for FastHiveDecimalImpl.fastDivide > -- > > Key: HIVE-16311 > URL: https://issues.apache.org/jira/browse/HIVE-16311 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Colin Ma >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, > HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.005.patch, > HIVE-16311.withTrailingZero.patch > > > FastHiveDecimalImpl.fastDivide is poor performance when evaluate the > expression as 12345.67/123.45 > There are 2 points can be improved: > 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the > BigDecimal.divide. > 2. Get the precision for BigInteger in a fast way if possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide
[ https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958772#comment-15958772 ] Matt McCline edited comment on HIVE-16311 at 4/6/17 12:49 PM: -- Fundamental to the OldHiveDecimal and new HiveDecimal is the idea of a floating scale during from various calculations (and without any trailing fractional zeroes) and the use of enforcePrecisionScale afterwards. So, trying to "remember" the original number of trailing zeroes is not going to work and not part of the current design. (See HiveDecimal's header comments). So the new member fastTrailingZero needs to be removed. was (Author: mmccline): Fundamental to the OldHiveDecimal and new HiveDecimal is the idea of a floating scale during from various calculations (and without any trailing fractional zeroes) and the use of enforcePrecisionScale afterwards. So, trying to "remember" the original number of trailing zeroes is not going to work and not part of the current design. (See HiveDecimal's header comments). > Improve the performance for FastHiveDecimalImpl.fastDivide > -- > > Key: HIVE-16311 > URL: https://issues.apache.org/jira/browse/HIVE-16311 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Colin Ma >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, > HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.005.patch, > HIVE-16311.withTrailingZero.patch > > > FastHiveDecimalImpl.fastDivide is poor performance when evaluate the > expression as 12345.67/123.45 > There are 2 points can be improved: > 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the > BigDecimal.divide. > 2. Get the precision for BigInteger in a fast way if possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide
[ https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958772#comment-15958772 ] Matt McCline edited comment on HIVE-16311 at 4/6/17 12:48 PM: -- Fundamental to the OldHiveDecimal and new HiveDecimal is the idea of a floating scale during from various calculations (and without any trailing fractional zeroes) and the use of enforcePrecisionScale afterwards. So, trying to "remember" the original number of trailing zeroes is not going to work and not part of the current design. (See HiveDecimal's header comments). was (Author: mmccline): Fundamental to the OldHiveDecimal and new HiveDecimal is the idea of a floating during from various calculations and use of enforcePrecisionScale afterwards. So, trying to "remember" the original number of trailing zeroes is not going to work. (See HiveDecimal's header comments). > Improve the performance for FastHiveDecimalImpl.fastDivide > -- > > Key: HIVE-16311 > URL: https://issues.apache.org/jira/browse/HIVE-16311 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Colin Ma >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, > HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.005.patch, > HIVE-16311.withTrailingZero.patch > > > FastHiveDecimalImpl.fastDivide is poor performance when evaluate the > expression as 12345.67/123.45 > There are 2 points can be improved: > 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the > BigDecimal.divide. > 2. Get the precision for BigInteger in a fast way if possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide
[ https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958408#comment-15958408 ] Matt McCline edited comment on HIVE-16311 at 4/6/17 6:57 AM: - Ok, it looks like you have added a new member (fastTrailingZeroes) to FastDecimal so I do need to step in and figure out what is going on and why this is necessary. If OldHiveDecimal is not a reference for behavior, then it seems like we need to develop a new behavior standard test data we can test against. Perhaps a triple list of random decimal1, random decimal2, and the standard expected result. was (Author: mmccline): Ok, it looks like you have added a new member (fastTrailingZeroes) to FastDecimal so I do need to step in and figure out what is going on and why this is necessary. > Improve the performance for FastHiveDecimalImpl.fastDivide > -- > > Key: HIVE-16311 > URL: https://issues.apache.org/jira/browse/HIVE-16311 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Colin Ma >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, > HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.005.patch, > HIVE-16311.withTrailingZero.patch > > > FastHiveDecimalImpl.fastDivide is poor performance when evaluate the > expression as 12345.67/123.45 > There are 2 points can be improved: > 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the > BigDecimal.divide. > 2. Get the precision for BigInteger in a fast way if possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide
[ https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956322#comment-15956322 ] Colin Ma edited comment on HIVE-16311 at 4/5/17 7:38 AM: - [~xuefuz], for the cases like 1234567.8901234560/9, here is the reason for these diffs: when create HiveDecimal by HiveDecimal.create("1234567.8901234560"), the trailing zero will be ignored by the following code: https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/common/type/FastHiveDecimalImpl.java#L482 So 1234567.890123456 (*with scale 9*) is used for division, not 1234567.8901234560(*with scale 10*). Without this patch, the scale of result is always *HiveDecimal.MAX_SCALE*, and the result will be *137174.210013717333*, after reset the scale for output, the result will be *137174.210013717333*. With this patch, the scale of result is calculated as *11 by Max(6, 9+1+1)*, and the result is *137174.21001371733*, after reset the scale for output, the result will be *137174.210013717330*. I update the patch to keep the trailing zero, and check if all tests can pass. was (Author: colinma): [~xuefuz], for the cases like 1234567.8901234560/9, here is the reason for these diffs: when create HiveDecimal by HiveDecimal.create("1234567.8901234560"), the trailing zero will be ignored by the following code: https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/common/type/FastHiveDecimalImpl.java#L482 So 1234567.890123456 (*with scale 9*) is used for division, not 1234567.8901234560(*with scale 10*). Without this patch, the scale of result is always *HiveDecimal.MAX_SCALE*, and the result will be *137174.210013717333*, after reset the scale for output, the result will be *137174.210013717333*. With this patch, the scale of result is calculated as *11 by Max(6, 9+1+1)*, and the result is *137174.21001371733*, after reset the scale for output, the result will be *137174.210013717330*. I think it's ok to keep the trailing zero, and update the patch to check if all tests can pass. > Improve the performance for FastHiveDecimalImpl.fastDivide > -- > > Key: HIVE-16311 > URL: https://issues.apache.org/jira/browse/HIVE-16311 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Colin Ma >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, > HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.withTrailingZero.patch > > > FastHiveDecimalImpl.fastDivide is poor performance when evaluate the > expression as 12345.67/123.45 > There are 2 points can be improved: > 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the > BigDecimal.divide. > 2. Get the precision for BigInteger in a fast way if possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide
[ https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955628#comment-15955628 ] Xuefu Zhang edited comment on HIVE-16311 at 4/4/17 6:58 PM: [~colin_mjj], The code changes looks good. However, I'm not sure how the test diff (such as below) came: {code} -1234567.8901234560 137174.210013717333 +1234567.8901234560 137174.210013717330 {code} was (Author: xuefuz): [~colin_mjj], The code changes looks good. However, I'm not sure how the test diff (such as below) came: {code} -1234567.8901234560137174.210013717333 +1234567.8901234560 137174.210013717330 {code} > Improve the performance for FastHiveDecimalImpl.fastDivide > -- > > Key: HIVE-16311 > URL: https://issues.apache.org/jira/browse/HIVE-16311 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Colin Ma >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, > HIVE-16311.003.patch, HIVE-16311.004.patch > > > FastHiveDecimalImpl.fastDivide is poor performance when evaluate the > expression as 12345.67/123.45 > There are 2 points can be improved: > 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the > BigDecimal.divide. > 2. Get the precision for BigInteger in a fast way if possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide
[ https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947130#comment-15947130 ] Matt McCline edited comment on HIVE-16311 at 3/29/17 1:33 PM: -- Well, now wait a minute. That document references a very old JIRA. I haven't seen this document before. I'm drawing in [~jdere] because he recently worked with precision / scale issues with multiplication. I'd like to understand why the scale was being limited to 6. I'm not willing to blindly go back without understanding why the change was made when BigDecimal was presumably introduced. The old HiveDecimal introduced with Apr 9, 2013 HIVE-4271 : :Limit precision of decimal type (Gunther Hagleitner via Ashutosh Chauhan)" that began using BigDecimal. I haven't found the older version yet. was (Author: mmccline): Well, now wait a minute. That document references a very old JIRA. I haven't seen this document before. I'm drawing in [~jdere] because he recently worked with precision / scale issues with multiplication. I'd like to understand why the scale was being limited to 6. I'm not willing to blindly go back without understanding why the change was made when BigDecimal was presumably introduced. The old HiveDecimal introduced with Apr 9, 2013 HIVE-4271 : :Limit precision of decimal type (Gunther Hagleitner via A… …shutosh Chauhan)" that began using BigDecimal. I haven't found the older version yet. > Improve the performance for FastHiveDecimalImpl.fastDivide > -- > > Key: HIVE-16311 > URL: https://issues.apache.org/jira/browse/HIVE-16311 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Colin Ma >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch > > > FastHiveDecimalImpl.fastDivide is poor performance when evaluate the > expression as 12345.67/123.45 > There are 2 points can be improved: > 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the > BigDecimal.divide. > 2. Get the precision for BigInteger in a fast way if possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide
[ https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945102#comment-15945102 ] Xuefu Zhang edited comment on HIVE-16311 at 3/28/17 1:01 PM: - [~colin_mjj], for decimal devision, I don't think your change is equivalent to the original. FYI, the precision/scale rule for decimal division in Hive is shown at https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf. The other change seems good to me though. was (Author: xuefuz): [~colin_mjj], for decimal devision, I don't think your change is equivalent to the original. FYI, the precision/scale rule for decimal division in Hive is shown at https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf. > Improve the performance for FastHiveDecimalImpl.fastDivide > -- > > Key: HIVE-16311 > URL: https://issues.apache.org/jira/browse/HIVE-16311 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Colin Ma >Assignee: Colin Ma > Fix For: 2.2.0 > > Attachments: HIVE-16311.001.patch > > > FastHiveDecimalImpl.fastDivide is poor performance when evaluate the > expression as 12345.67/123.45 > There are 2 points can be improved: > 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the > BigDecimal.divide. > 2. Get the precision for BigInteger in a fast way if possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide
[ https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944685#comment-15944685 ] Colin Ma edited comment on HIVE-16311 at 3/28/17 7:33 AM: -- The initial patch is uploaded. Do the simple test, 12345.67/123.45 with FastHiveDecimalImpl.fastDivide 50 times, the result shows 1s(without patch) vs 0.1s(with patch). Also test the patch with q06 of TPCx-BB which has the following divide expression: {code} sum( case when (d_year = 2001) THEN (((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ELSE 0 END) first_year_total {code} The cluster includes 6 nodes, 128G memory/per node, CPU is Intel(R) Xeon(R) E5-2680, 1G network. With the 1T data scale and spark as executor engine, the following is the result: || ||without patch||with patch||improvement|| |disable vectorization|214s|164s|23.36%| |enable vectorization(Parquet file format)|252s|125s|50.4%| was (Author: colinma): The initial patch is uploaded. Do the simple test, 12345.67/123.45 with FastHiveDecimalImpl.fastDivide 50 times, the result shows 1s(without patch) vs 0.1s(with patch). Also test the patch with q06 of TPCx-BB which has the following divide expression: {code} sum( case when (d_year = 2001) THEN (((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ELSE 0 END) first_year_total {code} The cluster includes 6 nodes, 128G memory/per node, CPU is Intel(R) Xeon(R) E5-2680, 1G network. With the 1T data scale and spark as executor engine, the following is the result: || ||without patch||with patch||improvement|| |disable vectorization|214s|164s|23.36%| |enable vectorization|252s|125s|50.4%| > Improve the performance for FastHiveDecimalImpl.fastDivide > -- > > Key: HIVE-16311 > URL: https://issues.apache.org/jira/browse/HIVE-16311 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Colin Ma >Assignee: Colin Ma > Fix For: 2.2.0 > > Attachments: HIVE-16311.001.patch > > > FastHiveDecimalImpl.fastDivide is poor performance when evaluate the > expression as 12345.67/123.45 > There are 2 points can be improved: > 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the > BigDecimal.divide. > 2. Get the precision for BigInteger in a fast way if possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)