[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide

2017-04-13 Thread Colin Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968467#comment-15968467
 ] 

Colin Ma edited comment on HIVE-16311 at 4/14/17 1:14 AM:
--

[~mmccline], [~xuefuz], any comments for the latest patch? Can you help to 
commit the patch to the upstream, thanks for your help.


was (Author: colinma):
[~mmccline], [~xuefuz], any comments for the latest patch?

> Improve the performance for FastHiveDecimalImpl.fastDivide
> --
>
> Key: HIVE-16311
> URL: https://issues.apache.org/jira/browse/HIVE-16311
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, 
> HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.005.patch, 
> HIVE-16311.006.patch, HIVE-16311.007.patch, HIVE-16311.008.patch, 
> HIVE-16311.withTrailingZero.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the 
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the 
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide

2017-04-06 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958971#comment-15958971
 ] 

Matt McCline edited comment on HIVE-16311 at 4/6/17 2:03 PM:
-

I'm beginning to seriously wonder if there should be any change to HiveDecimal 
division.  Rather, the formula max(6, s1 + p2 + 1) perhaps should be in Hive 
where [~jdere] multiplication scale changes were and enforced in the result 
decimal type precision/scale (via enforcePrecisionScale).


was (Author: mmccline):
I'm beginning to seriously wonder if there should be any change to HiveDecimal 
division.  Rather, the formula max(6, s1 + p2 + 1) perhaps should be in Hive 
where [~jdere] multiplication scale changes were and enforced in the result 
decimal type precision/scale.

> Improve the performance for FastHiveDecimalImpl.fastDivide
> --
>
> Key: HIVE-16311
> URL: https://issues.apache.org/jira/browse/HIVE-16311
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, 
> HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.005.patch, 
> HIVE-16311.withTrailingZero.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the 
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the 
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide

2017-04-06 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958772#comment-15958772
 ] 

Matt McCline edited comment on HIVE-16311 at 4/6/17 12:52 PM:
--

Fundamental to the OldHiveDecimal and new HiveDecimal is the idea of a floating 
scale during from various calculations (and without any trailing fractional 
zeroes) and the use of enforcePrecisionScale afterwards to adjust a decimal to 
fit a precision/scale and later toFormatString to add trailing fractional 
zeroes in the display.  So, trying to "remember" the original number of 
trailing zeroes is not going to work and not part of the current design.  (See 
HiveDecimal's header comments).

So the new member fastTrailingZero needs to be removed.


was (Author: mmccline):
Fundamental to the OldHiveDecimal and new HiveDecimal is the idea of a floating 
scale during from various calculations (and without any trailing fractional 
zeroes) and the use of enforcePrecisionScale afterwards.  So, trying to 
"remember" the original number of trailing zeroes is not going to work and not 
part of the current design.  (See HiveDecimal's header comments).

So the new member fastTrailingZero needs to be removed.

> Improve the performance for FastHiveDecimalImpl.fastDivide
> --
>
> Key: HIVE-16311
> URL: https://issues.apache.org/jira/browse/HIVE-16311
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, 
> HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.005.patch, 
> HIVE-16311.withTrailingZero.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the 
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the 
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide

2017-04-06 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958772#comment-15958772
 ] 

Matt McCline edited comment on HIVE-16311 at 4/6/17 12:49 PM:
--

Fundamental to the OldHiveDecimal and new HiveDecimal is the idea of a floating 
scale during from various calculations (and without any trailing fractional 
zeroes) and the use of enforcePrecisionScale afterwards.  So, trying to 
"remember" the original number of trailing zeroes is not going to work and not 
part of the current design.  (See HiveDecimal's header comments).

So the new member fastTrailingZero needs to be removed.


was (Author: mmccline):
Fundamental to the OldHiveDecimal and new HiveDecimal is the idea of a floating 
scale during from various calculations (and without any trailing fractional 
zeroes) and the use of enforcePrecisionScale afterwards.  So, trying to 
"remember" the original number of trailing zeroes is not going to work and not 
part of the current design.  (See HiveDecimal's header comments).

> Improve the performance for FastHiveDecimalImpl.fastDivide
> --
>
> Key: HIVE-16311
> URL: https://issues.apache.org/jira/browse/HIVE-16311
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, 
> HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.005.patch, 
> HIVE-16311.withTrailingZero.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the 
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the 
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide

2017-04-06 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958772#comment-15958772
 ] 

Matt McCline edited comment on HIVE-16311 at 4/6/17 12:48 PM:
--

Fundamental to the OldHiveDecimal and new HiveDecimal is the idea of a floating 
scale during from various calculations (and without any trailing fractional 
zeroes) and the use of enforcePrecisionScale afterwards.  So, trying to 
"remember" the original number of trailing zeroes is not going to work and not 
part of the current design.  (See HiveDecimal's header comments).


was (Author: mmccline):
Fundamental to the OldHiveDecimal and new HiveDecimal is the idea of a floating 
during from various calculations and use of enforcePrecisionScale afterwards.  
So, trying to "remember" the original number of trailing zeroes is not going to 
work.  (See HiveDecimal's header comments).

> Improve the performance for FastHiveDecimalImpl.fastDivide
> --
>
> Key: HIVE-16311
> URL: https://issues.apache.org/jira/browse/HIVE-16311
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, 
> HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.005.patch, 
> HIVE-16311.withTrailingZero.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the 
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the 
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide

2017-04-06 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958408#comment-15958408
 ] 

Matt McCline edited comment on HIVE-16311 at 4/6/17 6:57 AM:
-

Ok, it looks like you have added a new member (fastTrailingZeroes) to 
FastDecimal so I do need to step in and figure out what is going on and why 
this is necessary.

If OldHiveDecimal is not a reference for behavior, then it seems like we need 
to develop a new behavior standard test data we can test against.  Perhaps a 
triple list of random decimal1, random decimal2, and the standard expected 
result.


was (Author: mmccline):
Ok, it looks like you have added a new member (fastTrailingZeroes) to 
FastDecimal so I do need to step in and figure out what is going on and why 
this is necessary.

> Improve the performance for FastHiveDecimalImpl.fastDivide
> --
>
> Key: HIVE-16311
> URL: https://issues.apache.org/jira/browse/HIVE-16311
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, 
> HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.005.patch, 
> HIVE-16311.withTrailingZero.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the 
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the 
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide

2017-04-05 Thread Colin Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956322#comment-15956322
 ] 

Colin Ma edited comment on HIVE-16311 at 4/5/17 7:38 AM:
-

[~xuefuz], for the cases like 1234567.8901234560/9, here is the reason for 
these diffs: when create HiveDecimal by 
HiveDecimal.create("1234567.8901234560"), the trailing zero will be ignored by 
the following code:
https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/common/type/FastHiveDecimalImpl.java#L482
So 1234567.890123456 (*with scale 9*) is used for division, not 
1234567.8901234560(*with scale 10*).
Without this patch, the scale of result is always *HiveDecimal.MAX_SCALE*, and 
the result will be *137174.210013717333*, after reset the 
scale for output, the result will be *137174.210013717333*.
With this patch, the scale of result is calculated as *11 by Max(6, 9+1+1)*, 
and the result is *137174.21001371733*, after reset the scale for output, the 
result will be *137174.210013717330*.
I update the patch to keep the trailing zero, and check if all tests can pass.


was (Author: colinma):
[~xuefuz], for the cases like 1234567.8901234560/9, here is the reason for 
these diffs: when create HiveDecimal by 
HiveDecimal.create("1234567.8901234560"), the trailing zero will be ignored by 
the following code:
https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/common/type/FastHiveDecimalImpl.java#L482
So 1234567.890123456 (*with scale 9*) is used for division, not 
1234567.8901234560(*with scale 10*).
Without this patch, the scale of result is always *HiveDecimal.MAX_SCALE*, and 
the result will be *137174.210013717333*, after reset the 
scale for output, the result will be *137174.210013717333*.
With this patch, the scale of result is calculated as *11 by Max(6, 9+1+1)*, 
and the result is *137174.21001371733*, after reset the scale for output, the 
result will be *137174.210013717330*.
I think it's ok to keep the trailing zero, and update the patch to check if all 
tests can pass.

> Improve the performance for FastHiveDecimalImpl.fastDivide
> --
>
> Key: HIVE-16311
> URL: https://issues.apache.org/jira/browse/HIVE-16311
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, 
> HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.withTrailingZero.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the 
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the 
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide

2017-04-04 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955628#comment-15955628
 ] 

Xuefu Zhang edited comment on HIVE-16311 at 4/4/17 6:58 PM:


[~colin_mjj], The code changes looks good. However, I'm not sure how the test 
diff (such as below) came:
{code}
-1234567.8901234560 137174.210013717333
+1234567.8901234560 137174.210013717330
{code}


was (Author: xuefuz):
[~colin_mjj], The code changes looks good. However, I'm not sure how the test 
diff (such as below) came:
{code}
 -1234567.8901234560137174.210013717333
+1234567.8901234560 137174.210013717330
{code}

> Improve the performance for FastHiveDecimalImpl.fastDivide
> --
>
> Key: HIVE-16311
> URL: https://issues.apache.org/jira/browse/HIVE-16311
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch, 
> HIVE-16311.003.patch, HIVE-16311.004.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the 
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the 
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide

2017-03-29 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947130#comment-15947130
 ] 

Matt McCline edited comment on HIVE-16311 at 3/29/17 1:33 PM:
--

Well, now wait a minute.  That document references a very old JIRA.  I haven't 
seen this document before.  I'm drawing in [~jdere] because he recently worked 
with precision / scale issues with multiplication.  I'd like to understand why 
the scale was being limited to 6.  I'm not willing to blindly go back without 
understanding why the change was made when BigDecimal was presumably introduced.

The old HiveDecimal introduced with Apr 9, 2013 HIVE-4271 : :Limit precision of 
decimal type (Gunther Hagleitner via Ashutosh Chauhan)" that began using 
BigDecimal.  I haven't found the older version yet.


was (Author: mmccline):
Well, now wait a minute.  That document references a very old JIRA.  I haven't 
seen this document before.  I'm drawing in [~jdere] because he recently worked 
with precision / scale issues with multiplication.  I'd like to understand why 
the scale was being limited to 6.  I'm not willing to blindly go back without 
understanding why the change was made when BigDecimal was presumably introduced.

The old HiveDecimal introduced with Apr 9, 2013 HIVE-4271 : :Limit precision of 
decimal type (Gunther Hagleitner via A…
…shutosh Chauhan)" that began using BigDecimal.  I haven't found the older 
version yet.

> Improve the performance for FastHiveDecimalImpl.fastDivide
> --
>
> Key: HIVE-16311
> URL: https://issues.apache.org/jira/browse/HIVE-16311
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the 
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the 
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide

2017-03-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945102#comment-15945102
 ] 

Xuefu Zhang edited comment on HIVE-16311 at 3/28/17 1:01 PM:
-

[~colin_mjj], for decimal devision, I don't think your change is equivalent to 
the original. FYI, the precision/scale rule  for decimal division in Hive is 
shown at 
https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf.
 

The other change seems good to me though.


was (Author: xuefuz):
[~colin_mjj], for decimal devision, I don't think your change is equivalent to 
the original. FYI, the precision/scale rule  for decimal division in Hive is 
shown at 
https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf.
 

> Improve the performance for FastHiveDecimalImpl.fastDivide
> --
>
> Key: HIVE-16311
> URL: https://issues.apache.org/jira/browse/HIVE-16311
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 2.2.0
>
> Attachments: HIVE-16311.001.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the 
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the 
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16311) Improve the performance for FastHiveDecimalImpl.fastDivide

2017-03-28 Thread Colin Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944685#comment-15944685
 ] 

Colin Ma edited comment on HIVE-16311 at 3/28/17 7:33 AM:
--

The initial patch is uploaded.
Do the simple test, 12345.67/123.45  with FastHiveDecimalImpl.fastDivide 50 
times, the result shows 1s(without patch) vs  0.1s(with patch).
Also test the patch with q06 of TPCx-BB which has the following divide 
expression:  
{code}
sum( case when (d_year = 2001)   THEN 
(((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
   ELSE 0 END) first_year_total
{code}
The cluster includes 6 nodes, 128G memory/per node, CPU is Intel(R) Xeon(R) 
E5-2680, 1G network.
With the 1T data scale and spark as executor engine, the following is the 
result:
|| ||without patch||with patch||improvement||
|disable vectorization|214s|164s|23.36%|
|enable vectorization(Parquet file format)|252s|125s|50.4%|


was (Author: colinma):
The initial patch is uploaded.
Do the simple test, 12345.67/123.45  with FastHiveDecimalImpl.fastDivide 50 
times, the result shows 1s(without patch) vs  0.1s(with patch).
Also test the patch with q06 of TPCx-BB which has the following divide 
expression:  
{code}
sum( case when (d_year = 2001)   THEN 
(((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
   ELSE 0 END) first_year_total
{code}
The cluster includes 6 nodes, 128G memory/per node, CPU is Intel(R) Xeon(R) 
E5-2680, 1G network.
With the 1T data scale and spark as executor engine, the following is the 
result:
|| ||without patch||with patch||improvement||
|disable vectorization|214s|164s|23.36%|
|enable vectorization|252s|125s|50.4%|

> Improve the performance for FastHiveDecimalImpl.fastDivide
> --
>
> Key: HIVE-16311
> URL: https://issues.apache.org/jira/browse/HIVE-16311
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 2.2.0
>
> Attachments: HIVE-16311.001.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the 
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the 
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)