subject:"\[jira\] \[Commented\] \(DRILL\-4834\) decimal implementation is vulnerable to overflow errors, and extremely complex"

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-05-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465537#comment-16465537
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

vvysotskyi commented on issue #570: DRILL-4834 decimal implementation is 
vulnerable to overflow errors, and extremely complex
URL: https://github.com/apache/drill/pull/570#issuecomment-386979265
 
 
   Closing this PR since it was fixed in the scope of DRILL-6094


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
>Priority: Major
> Fix For: 1.14.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-05-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465539#comment-16465539
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

vvysotskyi commented on issue #570: DRILL-4834 decimal implementation is 
vulnerable to overflow errors, and extremely complex
URL: https://github.com/apache/drill/pull/570#issuecomment-386975060
 
 
   @daveoshinsky, could you please close this PR, since it was fixed in the 
scope of DRILL-6094


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
>Priority: Major
> Fix For: 1.14.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-05-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465511#comment-16465511
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

vvysotskyi commented on issue #570: DRILL-4834 decimal implementation is 
vulnerable to overflow errors, and extremely complex
URL: https://github.com/apache/drill/pull/570#issuecomment-386975060
 
 
   @daveoshinsky, could you please close this PR, since it was fixed in the 
scope of DRILL-6094


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
>Priority: Major
> Fix For: 1.14.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-02-06 Thread Dave Oshinsky (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354367#comment-16354367
 ] 

Dave Oshinsky commented on DRILL-4834:
--

[~priteshm] the issues with this Jira (and DRILL-4184) are not yet resolved, so 
the Jira should remain.  On the other hand, one could view this Jira as now 
superseded by DRILL-6094.  Would resolving this Jira cause any problems with 
using the changes in the corresponding PR 570 in the resolution for DRILL-6094?

> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
>Priority: Major
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-02-06 Thread Pritesh Maker (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354280#comment-16354280
 ] 

Pritesh Maker commented on DRILL-4834:
--

[~daveoshinsky] since the work will continue with DRILL-6094, should we resolve 
this Jira?

> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
>Priority: Major
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331053#comment-16331053
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on the issue:

https://github.com/apache/drill/pull/570
  
The new VARDECIMAL one-size-fits-all decimal type, which this PR 
implements, will now be incorporated into the following new JIRA with 
additional changes and fixes for Drill 1.13:
https://issues.apache.org/jira/browse/DRILL-6094

So, development on this PR will now cease.  But VARDECIMAL lives on
Dave Oshinsky


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
>Priority: Major
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326610#comment-16326610
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161630462
  
--- Diff: exec/java-exec/src/main/codegen/templates/SqlAccessors.java ---
@@ -127,6 +127,25 @@ public String getString(int index) {
 }
   <#break>
 
+<#case "VarDecimal">
+
+@Override
+public String getString(int index){
+<#if mode=="Nullable">
+if(ac.isNull(index)){
+  return null;
+}
+
+try {
--- End diff --

When I remove the try/catch here, the build fails as follows.  So, I'm 
leaving it as is for the time being.
[ERROR] 
C:\apache\Drill20180111\rebase6\drill\exec\java-exec\target\generated-sources\org\apache\drill\exec\vector\accessor\VarDecimalAccessor.java:[144,35]
 error: unreported exception InvalidAccessException; must be caught or declared 
to be thrown



> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
>Priority: Major
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322909#comment-16322909
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161069079
  
--- Diff: 
common/src/main/java/org/apache/drill/common/util/CoreDecimalUtility.java ---
@@ -32,6 +32,14 @@ public static long getDecimal18FromBigDecimal(BigDecimal 
input, int scale, int p
   }
 
   public static int getMaxPrecision(TypeProtos.MinorType decimalType) {
+/*
--- End diff --

The two DAO comments are removed in my local merge environment.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322898#comment-16322898
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161067313
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastDecimalVarchar.java ---
@@ -150,6 +150,14 @@ public void setup() {
 
 public void eval() {
 
+<#if type.from.contains("VarDecimal")>
+java.math.BigDecimal bigDecimal = 
org.apache.drill.exec.util.DecimalUtility.getBigDecimalFromDrillBuf(in.buffer, 
in.start, in.end - in.start, in.scale);
+String str = bigDecimal.toString();
+out.buffer = buffer;
+out.start = 0;
+out.end = Math.min((int)len.value, str.length());
+out.buffer.setBytes(0, str.getBytes());
--- End diff --

I take that back.  Your change ensures it doesn't copy too many bytes.  I'm 
including it in the remerged file.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322897#comment-16322897
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161066659
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastDecimalVarchar.java ---
@@ -150,6 +150,14 @@ public void setup() {
 
 public void eval() {
 
+<#if type.from.contains("VarDecimal")>
+java.math.BigDecimal bigDecimal = 
org.apache.drill.exec.util.DecimalUtility.getBigDecimalFromDrillBuf(in.buffer, 
in.start, in.end - in.start, in.scale);
+String str = bigDecimal.toString();
+out.buffer = buffer;
+out.start = 0;
+out.end = Math.min((int)len.value, str.length());
+out.buffer.setBytes(0, str.getBytes());
--- End diff --

The Math.min use here ensures that we don't overrun the output buffer.  It 
the string doesn't fit, it's truncated to the length available in the output 
buffer.  Your change makes the code harder to understand than what I had, and 
your change is not necessary (the Math.min again).  I am forced to manually 
remerge quite a few of my changes because of the way git rebase didn't merge to 
master the way I had wished, so am now working on this file.  Leaving the line 
as I had it for now...


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322592#comment-16322592
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160958360
  
--- Diff: exec/vector/src/main/codegen/templates/ComplexWriters.java ---
@@ -99,7 +99,7 @@ public void write(Nullable${minor.class?cap_first}Holder 
h) {
 
   <#if !(minor.class == "Decimal9" || minor.class == "Decimal18" || 
minor.class == "Decimal28Sparse" || minor.class == "Decimal38Sparse" || 
minor.class == "Decimal28Dense" || minor.class == "Decimal38Dense")>
   public void write${minor.class}(<#list fields as field>${field.type} 
${field.name}<#if field_has_next>, ) {
-mutator.addSafe(idx(), <#list fields as field>${field.name}<#if 
field_has_next>, );
+mutator.addSafe(idx(), <#list fields as field><#if field.name == 
"scale"><#break>${field.name}<#if field_has_next && 
fields[field_index+1].name != "scale" >, );
--- End diff --

I meant to replace this line by something like this: 
```
<#if minor.class == "VarDecimal">
mutator.addSafe(idx()<#list fields as field><#if field.include!true>, 
${field.name}); // or something better
<#else>
mutator.addSafe(idx(), <#list fields as field>${field.name}<#if 
field_has_next>, );

```
Since `field.include` is used only for Decimal types, we can replace all 
this code by this line:
```
mutator.addSafe(idx()<#list fields as field><#if field.include!true>, 
${field.name});
```
But do we use this method for VarDecimal somewhere? If not, it would be 
better to add VarDecimal `minor.class` to the if condition with other decimal 
types above.

BTW, please modify similar changes below in this class and in 
`RepeatedValueVectors.java`


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322587#comment-16322587
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160941530
  
--- Diff: exec/java-exec/src/main/codegen/templates/SqlAccessors.java ---
@@ -127,6 +127,25 @@ public String getString(int index) {
 }
   <#break>
 
+<#case "VarDecimal">
+
+@Override
+public String getString(int index){
+<#if mode=="Nullable">
+if(ac.isNull(index)){
+  return null;
+}
+
+try {
--- End diff --

I think it would be better just throw the exception. Then it is easier to 
detect a case when data was stored in the vector incorrectly, or if it has come 
from the stored data, inform a user that data was corrupted.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322588#comment-16322588
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160939410
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/DecimalFunctions.java ---
@@ -102,7 +111,578 @@
 <#-- For each DECIMAL... type (in DecimalTypes.tdd) ... -->
 <#list comparisonTypesDecimal.decimalTypes as type>
 
-<#if type.name.endsWith("Sparse")>
+<#if type.name.endsWith("VarDecimal")>
+
+<@pp.changeOutputFile 
name="/org/apache/drill/exec/expr/fn/impl/${type.name}Functions.java" />
+
+<#include "/@includes/license.ftl" />
+
+package org.apache.drill.exec.expr.fn.impl;
+
+<#include "/@includes/vv_imports.ftl" />
+
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import 
org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.annotations.Workspace;
+import org.apache.drill.exec.expr.fn.FunctionGenerationHelper;
+import org.apache.drill.exec.expr.holders.*;
+import org.apache.drill.exec.record.RecordBatch;
+
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.DrillBuf;
+
+import java.nio.ByteBuffer;
+
+@SuppressWarnings("unused")
+public class ${type.name}Functions {
+private static void initBuffer(DrillBuf buffer) {
+// for VarDecimal, this method of setting initial size is actually 
only a very rough heuristic.
+int size = (${type.storage} * 
(org.apache.drill.exec.util.DecimalUtility.INTEGER_SIZE));
+buffer = buffer.reallocIfNeeded(size);
+ }
+
+@FunctionTemplate(name = "subtract", scope = 
FunctionTemplate.FunctionScope.DECIMAL_ADD_SCALE, nulls = 
NullHandling.NULL_IF_NULL)
--- End diff --

By default `checkPrecision` is false, so it is not required to specify it 
explicitly in this case. 


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322594#comment-16322594
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160963518
  
--- Diff: exec/vector/src/main/codegen/templates/NullableValueVectors.java 
---
@@ -327,13 +327,17 @@ public Mutator getMutator(){
 return v;
   }
 
-  public void copyFrom(int fromIndex, int thisIndex, 
Nullable${minor.class}Vector from){
+  protected void copyFromUnsafe(int fromIndex, int thisIndex, 
Nullable${minor.class}Vector from){
--- End diff --

Since we already have `copyFromSafe()` method in this class, `copyFrom()` 
is supposed to be unsafe.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322591#comment-16322591
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160980556
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -539,7 +553,12 @@ public void setValueLengthSafe(int index, int length) {
 }
 
 
-public void setSafe(int index, int start, int end, DrillBuf buffer){
+<#if type.minor == "VarDecimal">
--- End diff --

Sorry, perhaps I made mistake, yes, it looks OK. 
But do we need to pass `scale` for `VarDecimal` even if we don't use it in 
this method? 
I suppose should be made a change in `ComplexWriters.java` to avoid the 
usage of this method with the additional argument.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322593#comment-16322593
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161006318
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastVarCharDecimal.java ---
@@ -85,6 +85,30 @@ public void eval() {
 
 // Assign the scale and precision
 out.scale = (int) scale.value;
+
+<#if type.to.endsWith("VarDecimal")>
+
+// VarDecimal gets its own cast logic
+int readIndex = in.start;
+int endIndex  = in.end;
+StringBuffer sb = new StringBuffer();
--- End diff --

I think it would be easier to receive string using this code:
```
byte[] buf = new byte[in.end - in.start];
buffer.getBytes(in.start, buf, 0, in.end - in.start);
String s = new String(buf, Charsets.UTF_8);
```


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322590#comment-16322590
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160987506
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastDecimalVarchar.java ---
@@ -150,6 +150,14 @@ public void setup() {
 
 public void eval() {
 
+<#if type.from.contains("VarDecimal")>
+java.math.BigDecimal bigDecimal = 
org.apache.drill.exec.util.DecimalUtility.getBigDecimalFromDrillBuf(in.buffer, 
in.start, in.end - in.start, in.scale);
+String str = bigDecimal.toString();
+out.buffer = buffer;
+out.start = 0;
+out.end = Math.min((int)len.value, str.length());
+out.buffer.setBytes(0, str.getBytes());
--- End diff --

I guess we should do the same thing as for other decimals:
```
out.buffer.setBytes(0, String.valueOf(str.substring(0, 
out.end)).getBytes());
```


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322586#comment-16322586
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161004122
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastIntDecimal.java ---
@@ -68,15 +68,31 @@ public void setup() {
 
 public void eval() {
 out.scale = (int) scale.value;
+
+<#if !type.to.endsWith("VarDecimal")>
 out.precision = (int) precision.value;
+
 
-<#if type.to == "Decimal9" || type.to == "Decimal18">
+<#if type.to.endsWith("VarDecimal")>
+out.start = 0;
+out.buffer = buffer;
+String s = Long.toString((long)in.value);
+for (int i = 0; i < out.scale; ++i) {  // add 0's to get unscaled 
integer
+s += "0";
+}
+java.math.BigInteger bi = new java.math.BigInteger(s);
+java.math.BigDecimal bd = new java.math.BigDecimal(bi, out.scale);
+//System.out.println("CastIntDecimal in " + in.value + " scale " + 
out.scale + " string " + s + " bd " + bd);
--- End diff --

Please remove this comment.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322595#comment-16322595
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161002913
  
--- Diff: 
common/src/main/java/org/apache/drill/common/util/CoreDecimalUtility.java ---
@@ -32,6 +32,14 @@ public static long getDecimal18FromBigDecimal(BigDecimal 
input, int scale, int p
   }
 
   public static int getMaxPrecision(TypeProtos.MinorType decimalType) {
+/*
--- End diff --

This method is used in `TypeCastRules.getCost()` and in 
`CoreDecimalUtility.getPrecisionRange()`.
Regarding its usage in `TypeCastRules.getCost()`, we may return 
`RelDataTypeSystem.getMaxNumericPrecision()` for VarDecimal. 
But regarding its usage in `CoreDecimalUtility.getPrecisionRange()` method, 
`getPrecisionRange()` is used in DecimalScalePrecisionFunction classes for 
calculating resulting precision. I suppose these methods should be rewritten in 
the same way as it is implemented in Calcite `RelDataTypeFactoryImpl`. But I 
think it would be better to make these changes in separate Jira.

But now please remove these comments. 


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322589#comment-16322589
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161015751
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -418,6 +422,16 @@ public void get(int index, 
Nullable${minor.class}Holder holder){
 
 
 <#switch minor.class>
+<#case "VarDecimal">
+@Override
+public ${friendlyType} getObject(int index) {
+  byte[] b = get(index);
+  BigInteger bi = new BigInteger(b);
+  BigDecimal bd = new BigDecimal(bi, getField().getScale());
+  //System.out.println("VarDecimal getObject " + index + " scale " + 
getField().getScale() + " len " + b.length + " bd " + bd);
--- End diff --

Please remove this comment


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
> Fix For: 1.13.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321054#comment-16321054
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160792114
  
--- Diff: 
exec/vector/src/main/java/org/apache/drill/exec/util/DecimalUtility.java ---
@@ -159,9 +159,20 @@ public static BigDecimal 
getBigDecimalFromSparse(DrillBuf data, int startIndex,
 }
 
 public static BigDecimal getBigDecimalFromDrillBuf(DrillBuf bytebuf, 
int start, int length, int scale) {
+  if (length <= 0) {
+// if the length is somehow non-positive, interpret this as zero
+//System.out.println("getBigDecimal forces 0 with start " + start 
+ " len " + length);
+  try {
+  throw new Exception("hi there");
--- End diff --

Yes, I'll remove the friendly "hi there".  It was for debugging, I guess.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321049#comment-16321049
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160791971
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -539,7 +553,12 @@ public void setValueLengthSafe(int index, int length) {
 }
 
 
-public void setSafe(int index, int start, int end, DrillBuf buffer){
+<#if type.minor == "VarDecimal">
--- End diff --

Is this what you're talking about?  It looks OK to me.
<#if type.minor == "VarDecimal">
public void setSafe(int index, int start, int end, DrillBuf buffer, int 
scale)
<#else>
public void setSafe(int index, int start, int end, DrillBuf buffer)
 <#-- type.minor -->


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321043#comment-16321043
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160791435
  
--- Diff: exec/vector/src/main/codegen/templates/NullableValueVectors.java 
---
@@ -327,13 +327,17 @@ public Mutator getMutator(){
 return v;
   }
 
-  public void copyFrom(int fromIndex, int thisIndex, 
Nullable${minor.class}Vector from){
+  protected void copyFromUnsafe(int fromIndex, int thisIndex, 
Nullable${minor.class}Vector from){
--- End diff --

I added a copyFromSafe() method elsewhere, which is a "safe copy" that 
should not throw an exception due to overwriting the end of the buffer.  
Because I did that, and realizing that copyFrom() is actually UNSAFE (can throw 
the exception, which new copyFromSafe() is designed to avoid), I decided for 
code clarity to change the name of this function.  It has been over a year 
since I wrote this code, and that is what I recall now.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321036#comment-16321036
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160790684
  
--- Diff: exec/vector/src/main/codegen/templates/ComplexWriters.java ---
@@ -99,7 +99,7 @@ public void write(Nullable${minor.class?cap_first}Holder 
h) {
 
   <#if !(minor.class == "Decimal9" || minor.class == "Decimal18" || 
minor.class == "Decimal28Sparse" || minor.class == "Decimal38Sparse" || 
minor.class == "Decimal28Dense" || minor.class == "Decimal38Dense")>
   public void write${minor.class}(<#list fields as field>${field.type} 
${field.name}<#if field_has_next>, ) {
-mutator.addSafe(idx(), <#list fields as field>${field.name}<#if 
field_has_next>, );
+mutator.addSafe(idx(), <#list fields as field><#if field.name == 
"scale"><#break>${field.name}<#if field_has_next && 
fields[field_index+1].name != "scale" >, );
--- End diff --

I had a hard time understanding how this codegen stuff works, so some of 
the code I wrote was due to my struggling to  understand how it is SUPPOSED to 
work.  As such, I found that it needed to "break" out of code generation just 
after the "scale" argument, to avoid compilation failure of the generated code 
(IIRC, it was quite a while ago), and that is why I coded it this way.  What 
exact change are you suggesting?  To AND the field.name check=="scale" check 
with a check that minor.class=="VarDecimal"?


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321001#comment-16321001
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160786416
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/StringOutputRecordWriter.java ---
@@ -146,7 +146,7 @@ public void writeField() throws IOException {
 // TODO: error check
 addField(fieldId, reader.readObject().toString());
 
-  <#elseif minor.class == "VarChar" || minor.class == "Var16Char" || 
minor.class == "VarBinary">
+  <#elseif minor.class == "VarChar" || minor.class == "Var16Char" || 
minor.class == "VarBinary" || minor.class == "VarDecimal">
--- End diff --

Yes, that works


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320989#comment-16320989
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160785721
  
--- Diff: exec/java-exec/src/main/codegen/templates/SqlAccessors.java ---
@@ -127,6 +127,25 @@ public String getString(int index) {
 }
   <#break>
 
+<#case "VarDecimal">
+
+@Override
+public String getString(int index){
+<#if mode=="Nullable">
+if(ac.isNull(index)){
+  return null;
+}
+
+try {
--- End diff --

The getString method can throw NumberFormatException for BigDecimal, 
according to javadoc 
https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html.  Should it 
just throw the exception?  If yes, I';ll remove the try/catch.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320945#comment-16320945
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160779892
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/DecimalFunctions.java ---
@@ -102,7 +111,578 @@
 <#-- For each DECIMAL... type (in DecimalTypes.tdd) ... -->
 <#list comparisonTypesDecimal.decimalTypes as type>
 
-<#if type.name.endsWith("Sparse")>
+<#if type.name.endsWith("VarDecimal")>
+
+<@pp.changeOutputFile 
name="/org/apache/drill/exec/expr/fn/impl/${type.name}Functions.java" />
+
+<#include "/@includes/license.ftl" />
+
+package org.apache.drill.exec.expr.fn.impl;
+
+<#include "/@includes/vv_imports.ftl" />
+
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import 
org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.annotations.Workspace;
+import org.apache.drill.exec.expr.fn.FunctionGenerationHelper;
+import org.apache.drill.exec.expr.holders.*;
+import org.apache.drill.exec.record.RecordBatch;
+
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.DrillBuf;
+
+import java.nio.ByteBuffer;
+
+@SuppressWarnings("unused")
+public class ${type.name}Functions {
+private static void initBuffer(DrillBuf buffer) {
+// for VarDecimal, this method of setting initial size is actually 
only a very rough heuristic.
+int size = (${type.storage} * 
(org.apache.drill.exec.util.DecimalUtility.INTEGER_SIZE));
+buffer = buffer.reallocIfNeeded(size);
+ }
+
+@FunctionTemplate(name = "subtract", scope = 
FunctionTemplate.FunctionScope.DECIMAL_ADD_SCALE, nulls = 
NullHandling.NULL_IF_NULL)
--- End diff --

I will try adding that with checkPrecision=false (one size fits all 
precisions), e.g.:
@FunctionTemplate(name = "add",
scope = FunctionTemplate.FunctionScope.SIMPLE,
returnType = FunctionTemplate.ReturnType.DECIMAL_ADD_SCALE,
nulls = NullHandling.NULL_IF_NULL,
checkPrecisionRange = false)



> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320932#comment-16320932
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160777281
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastIntDecimal.java ---
@@ -68,15 +68,31 @@ public void setup() {
 
 public void eval() {
 out.scale = (int) scale.value;
+
+<#if !type.to.endsWith("VarDecimal")>
 out.precision = (int) precision.value;
+
 
-<#if type.to == "Decimal9" || type.to == "Decimal18">
+<#if type.to.endsWith("VarDecimal")>
+out.start = 0;
+out.buffer = buffer;
+String s = Long.toString((long)in.value);
+for (int i = 0; i < out.scale; ++i) {  // add 0's to get unscaled 
integer
+s += "0";
--- End diff --

Agreed


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320930#comment-16320930
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160776544
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/DecimalAggrTypeFunctions2.java
 ---
@@ -108,9 +108,12 @@ public void output() {
 out.buffer = buffer;
 out.start  = 0;
 out.scale = outputScale.value;
-out.precision = 38;
 java.math.BigDecimal average = 
((java.math.BigDecimal)(value.obj)).divide(java.math.BigDecimal.valueOf(count.value,
 0), out.scale, java.math.BigDecimal.ROUND_HALF_UP);
+<#if type.inputType.contains("VarDecimal")>
--- End diff --

Yes, I will try "<#if !type.inputType.contains("VarDecimal")>"


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318924#comment-16318924
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160458242
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastDecimalVarchar.java ---
@@ -150,6 +150,14 @@ public void setup() {
 
 public void eval() {
 
+<#if type.from.contains("VarDecimal")>
+java.math.BigDecimal bigDecimal = 
org.apache.drill.exec.util.DecimalUtility.getBigDecimalFromDrillBuf(in.buffer, 
in.start, in.end - in.start, in.scale);
+String str = bigDecimal.toString();
+out.buffer = buffer;
+out.start = 0;
+out.end = Math.min((int)len.value, str.length());
+out.buffer.setBytes(0, str.getBytes());
--- End diff --

What about the case when `str` has a length greater than `len`? I think it 
would be better to use `setBytes(int index, byte[] src, int srcIndex, int 
length)` method here.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318928#comment-16318928
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160492087
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -539,7 +553,12 @@ public void setValueLengthSafe(int index, int length) {
 }
 
 
-public void setSafe(int index, int start, int end, DrillBuf buffer){
+<#if type.minor == "VarDecimal">
--- End diff --

I suppose condition should be negated or if and else should be swapped


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318929#comment-16318929
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160481004
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/StringOutputRecordWriter.java ---
@@ -146,7 +146,7 @@ public void writeField() throws IOException {
 // TODO: error check
 addField(fieldId, reader.readObject().toString());
 
-  <#elseif minor.class == "VarChar" || minor.class == "Var16Char" || 
minor.class == "VarBinary">
+  <#elseif minor.class == "VarChar" || minor.class == "Var16Char" || 
minor.class == "VarBinary" || minor.class == "VarDecimal">
--- End diff --

May these types be moved into previous `elseif`?
  


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318930#comment-16318930
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160489866
  
--- Diff: exec/vector/src/main/codegen/templates/ComplexWriters.java ---
@@ -99,7 +99,7 @@ public void write(Nullable${minor.class?cap_first}Holder 
h) {
 
   <#if !(minor.class == "Decimal9" || minor.class == "Decimal18" || 
minor.class == "Decimal28Sparse" || minor.class == "Decimal38Sparse" || 
minor.class == "Decimal28Dense" || minor.class == "Decimal38Dense")>
   public void write${minor.class}(<#list fields as field>${field.type} 
${field.name}<#if field_has_next>, ) {
-mutator.addSafe(idx(), <#list fields as field>${field.name}<#if 
field_has_next>, );
+mutator.addSafe(idx(), <#list fields as field><#if field.name == 
"scale"><#break>${field.name}<#if field_has_next && 
fields[field_index+1].name != "scale" >, );
--- End diff --

I think it would be better to replace these checks by a check for 
`minor.class`


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318932#comment-16318932
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160490947
  
--- Diff: exec/vector/src/main/codegen/templates/NullableValueVectors.java 
---
@@ -327,13 +327,17 @@ public Mutator getMutator(){
 return v;
   }
 
-  public void copyFrom(int fromIndex, int thisIndex, 
Nullable${minor.class}Vector from){
+  protected void copyFromUnsafe(int fromIndex, int thisIndex, 
Nullable${minor.class}Vector from){
--- End diff --

Are changes in this class necessary?


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318933#comment-16318933
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160473858
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/DecimalAggrTypeFunctions2.java
 ---
@@ -108,9 +108,12 @@ public void output() {
 out.buffer = buffer;
 out.start  = 0;
 out.scale = outputScale.value;
-out.precision = 38;
 java.math.BigDecimal average = 
((java.math.BigDecimal)(value.obj)).divide(java.math.BigDecimal.valueOf(count.value,
 0), out.scale, java.math.BigDecimal.ROUND_HALF_UP);
+<#if type.inputType.contains("VarDecimal")>
--- End diff --

It would be better to swap if and else and negate if condition for better 
readability.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318926#comment-16318926
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160492409
  
--- Diff: 
exec/vector/src/main/java/org/apache/drill/exec/util/DecimalUtility.java ---
@@ -159,9 +159,20 @@ public static BigDecimal 
getBigDecimalFromSparse(DrillBuf data, int startIndex,
 }
 
 public static BigDecimal getBigDecimalFromDrillBuf(DrillBuf bytebuf, 
int start, int length, int scale) {
+  if (length <= 0) {
+// if the length is somehow non-positive, interpret this as zero
+//System.out.println("getBigDecimal forces 0 with start " + start 
+ " len " + length);
+  try {
+  throw new Exception("hi there");
--- End diff --

Please remove


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318925#comment-16318925
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160465652
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastIntDecimal.java ---
@@ -68,15 +68,31 @@ public void setup() {
 
 public void eval() {
 out.scale = (int) scale.value;
+
+<#if !type.to.endsWith("VarDecimal")>
 out.precision = (int) precision.value;
+
 
-<#if type.to == "Decimal9" || type.to == "Decimal18">
+<#if type.to.endsWith("VarDecimal")>
+out.start = 0;
+out.buffer = buffer;
+String s = Long.toString((long)in.value);
+for (int i = 0; i < out.scale; ++i) {  // add 0's to get unscaled 
integer
+s += "0";
--- End diff --

I think the usage of `StringBuilder` will be more suitable here.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318927#comment-16318927
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160480509
  
--- Diff: exec/java-exec/src/main/codegen/templates/SqlAccessors.java ---
@@ -127,6 +127,25 @@ public String getString(int index) {
 }
   <#break>
 
+<#case "VarDecimal">
+
+@Override
+public String getString(int index){
+<#if mode=="Nullable">
+if(ac.isNull(index)){
+  return null;
+}
+
+try {
--- End diff --

I don't see the necessity of try catch there. Could you please explain?


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2017-02-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888423#comment-15888423
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on the issue:

https://github.com/apache/drill/pull/570
  
Would someone please review this?  Or estimate when this can be started?  
It has now been months, and I just merged some conflicts that were introduced 
because it has been months since I completed the initial set of changes.  The 
decimal implementation definitely could use some TLC


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2016-09-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15467615#comment-15467615
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on the issue:

https://github.com/apache/drill/pull/570
  
Please review "one-size-fits-all" VARDECIMAL implementation in this pull 
request.  Initially, it exists alongside all of the other decimal types, and is 
only used for variable-width BINARY decimal in Parquet files.  If reviews look 
good, we can extend it to eventually replace other decimal types.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2016-08-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422915#comment-15422915
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

GitHub user daveoshinsky opened a pull request:

https://github.com/apache/drill/pull/570

DRILL-4834

decimal implementation is vulnerable to overflow errors, and extremely 
complex

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/daveoshinsky/drill DRILL-4834

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #570


commit 9a47ca52125139d88adf39b5d894a02f870f37d9
Author: U-COMMVAULT-NJ\doshinsky 
Date:   2016-02-09T22:37:47Z

DRILL-4184: support variable length decimal fields in parquet

commit dec00a808c99554f008e23fd21b944b858aa9ae0
Author: daveoshinsky 
Date:   2016-02-09T22:56:28Z

DRILL-4184: changes to support variable length decimal fields in parquet

commit 39bedc5f460fa4ca6158038eb568436a1b0fe3fb
Author: daveoshinsky 
Date:   2016-02-11T15:41:45Z

remove trailing whitespace

commit a41f8d72c538776624d7e1ef79d58ab7e5961e94
Author: daveoshinsky 
Date:   2016-02-11T17:53:04Z

try removing trailing whitespace, again

commit 2a4b57c67bc5e9ad9863e5c1f5693fca9f42e706
Author: Dave Oshinsky 
Date:   2016-03-07T20:56:02Z

centralize decimalLengths access

move declaration to top

commit 68c6c6d72a6bb59362baefa38689f70a02bc0178
Author: Dave Oshinsky 
Date:   2016-03-07T21:52:46Z

change comment

commit 04f7879b44e388c0253b2b641d8ffc0194980cf1
Author: Dave Oshinsky 
Date:   2016-03-07T23:01:38Z

the infamous tab character made a reappearance

commit 838a85747f48e1ba587d2da7d3151d0b1ea295b4
Author: Dave Oshinsky 
Date:   2016-03-16T17:27:52Z

Merge remote-tracking branch 'refs/remotes/apache/master'

commit 4dbf150b17c4a3cc3fc622734f786f575ce87841
Author: Dave Oshinsky 
Date:   2016-06-09T20:28:48Z

Merge remote-tracking branch 'refs/remotes/apache/master'

commit d2372a892216269a71425175c2b2ad9b02941261
Author: Dave Oshinsky 
Date:   2016-08-16T15:27:12Z

Merge remote-tracking branch 'refs/remotes/apache/master'




> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This

41 matches

Mail list logo