[jira] [Commented] (ORC-161) Create a new column type that run-length-encodes decimals

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435647#comment-16435647
 ] 

ASF GitHub Bot commented on ORC-161:


Github user majetideepak commented on a diff in the pull request:

https://github.com/apache/orc/pull/245#discussion_r181096194
  
--- Diff: site/_docs/file-tail.md ---
@@ -249,12 +249,25 @@ For booleans, the statistics include the count of 
false and true values.
 }
 ```
 
-For decimals, the minimum, maximum, and sum are stored.
+For decimals, the minimum, maximum, and sum are stored. In ORC 2.0,
+string representation is deprecated and DecimalStatistics uses integers
+which have better performance.
 
 ```message DecimalStatistics {
  optional string minimum = 1;
  optional string maximum = 2;
  optional string sum = 3;
+  message Int128 {
+   repeated sint64 highBits = 1;
+   repeated uint64 lowBits = 2;
--- End diff --

shouldn't this be sint64 as well since we are using uint64 for the 
SECONDARY stream?


> Create a new column type that run-length-encodes decimals
> -
>
> Key: ORC-161
> URL: https://issues.apache.org/jira/browse/ORC-161
> Project: ORC
>  Issue Type: Wish
>  Components: encoding
>Reporter: Douglas Drinka
>Priority: Major
>
> I'm storing prices in ORC format, and have made the following observations 
> about the current decimal implementation:
> - The encoding is inefficient: my prices are a walking-random set, plus or 
> minus a few pennies per data point. This would encode beautifully with a 
> patched base encoding.  Instead I'm averaging 4 bytes per data point, after 
> Zlib.
> - Everyone acknowledges that it's nice to be able to store huge numbers in 
> decimal columns, but that you probably won't.  Presto, for instance, has a 
> fast-path which engages for precision of 18 or less, and decodes to 64-bit 
> longs, and then a slow path which uses BigInt.  I anticipate the majority of 
> implementations fit the decimal(18,6) use case.
> - The whole concept of precision/scale, along with a dedicated scale per data 
> point is messy.  Sometimes it's checked on data ingest, other times its an 
> error on reading, or else it's cast (and rounded?)
> I don't propose eliminating the current column type.  It's nice to know 
> there's a way to store really big numbers (or really accurate numbers) if I 
> need that in the future.
> But I'd like to see a new column that uses the existing Run Length Encoding 
> functionality, and is limited to 63+1 bit numbers, with a fixed precision and 
> scale for ingest and query.
> I think one could call this FixedPoint.  Every number is stored as a long, 
> and scaled by a column constant.  Ingest from decimal would scale and throw 
> or round, configurably.  Precision would be fixed at 18, or made configurable 
> and verified at ingest.  Stats would use longs (scaled with the column) 
> rather than strings.
> Anyone can opt in to faster, smaller data sets, if they're ok with 63+1 bits 
> of precision.  Or they can keep using decimal if they need 128 bits.  Win/win?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ORC-338) Workaround C++ compiler bug in newest clang including xcode 9.3

2018-04-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/ORC-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned ORC-338:
-


> Workaround C++ compiler bug in newest clang including xcode 9.3
> ---
>
> Key: ORC-338
> URL: https://issues.apache.org/jira/browse/ORC-338
> Project: ORC
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> The ColumnStatistics.intColumnStatistics test fails in the xcode 9.3 if you 
> use the release build, but passes in the debug build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-338) Workaround C++ compiler bug in newest clang including xcode 9.3

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435704#comment-16435704
 ] 

ASF GitHub Bot commented on ORC-338:


GitHub user omalley opened a pull request:

https://github.com/apache/orc/pull/246

ORC-338. Workaround C++ compiler bug in xcode 9.3 by removing an inline

function.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/orc orc-338

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #246






> Workaround C++ compiler bug in newest clang including xcode 9.3
> ---
>
> Key: ORC-338
> URL: https://issues.apache.org/jira/browse/ORC-338
> Project: ORC
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> The ColumnStatistics.intColumnStatistics test fails in the xcode 9.3 if you 
> use the release build, but passes in the debug build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-161) Create a new column type that run-length-encodes decimals

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435904#comment-16435904
 ] 

ASF GitHub Bot commented on ORC-161:


Github user xndai commented on a diff in the pull request:

https://github.com/apache/orc/pull/245#discussion_r181149073
  
--- Diff: site/_docs/encodings.md ---
@@ -123,6 +127,41 @@ DIRECT_V2 | PRESENT | Yes  | Boolean 
RLE
   | DATA| No   | Unbounded base 128 varints
   | SECONDARY   | No   | Unsigned Integer RLE v2
 
+In ORC 2.0, DECIMAL and DECIMAL_V2 encodings are introduced and scale
+stream is totally removed as all decimal values use the same scale.
+There are two difference cases: precision<=18 and precision>18.
+
+### Decimal Encoding for precision <= 18
+
+When precision is no greater than 18, decimal values can be fully
+represented by 64-bit signed integers which are stored in DATA stream
+and use signed integer RLE.
+
+Encoding  | Stream Kind | Optional | Contents
+: | :-- | :--- | :---
+DECIMAL   | PRESENT | Yes  | Boolean RLE
+  | DATA| No   | Signed Integer RLE v1
+DECIMAL_V2| PRESENT | Yes  | Boolean RLE
+  | DATA| No   | Signed Integer RLE v2
--- End diff --

I think we should keep RLE v1 as an option. The C++ writer currently does 
not support RLE v2 (we are working on it). We don't want the new decimal writer 
to have dependency on that.


> Create a new column type that run-length-encodes decimals
> -
>
> Key: ORC-161
> URL: https://issues.apache.org/jira/browse/ORC-161
> Project: ORC
>  Issue Type: Wish
>  Components: encoding
>Reporter: Douglas Drinka
>Priority: Major
>
> I'm storing prices in ORC format, and have made the following observations 
> about the current decimal implementation:
> - The encoding is inefficient: my prices are a walking-random set, plus or 
> minus a few pennies per data point. This would encode beautifully with a 
> patched base encoding.  Instead I'm averaging 4 bytes per data point, after 
> Zlib.
> - Everyone acknowledges that it's nice to be able to store huge numbers in 
> decimal columns, but that you probably won't.  Presto, for instance, has a 
> fast-path which engages for precision of 18 or less, and decodes to 64-bit 
> longs, and then a slow path which uses BigInt.  I anticipate the majority of 
> implementations fit the decimal(18,6) use case.
> - The whole concept of precision/scale, along with a dedicated scale per data 
> point is messy.  Sometimes it's checked on data ingest, other times its an 
> error on reading, or else it's cast (and rounded?)
> I don't propose eliminating the current column type.  It's nice to know 
> there's a way to store really big numbers (or really accurate numbers) if I 
> need that in the future.
> But I'd like to see a new column that uses the existing Run Length Encoding 
> functionality, and is limited to 63+1 bit numbers, with a fixed precision and 
> scale for ingest and query.
> I think one could call this FixedPoint.  Every number is stored as a long, 
> and scaled by a column constant.  Ingest from decimal would scale and throw 
> or round, configurably.  Precision would be fixed at 18, or made configurable 
> and verified at ingest.  Stats would use longs (scaled with the column) 
> rather than strings.
> Anyone can opt in to faster, smaller data sets, if they're ok with 63+1 bits 
> of precision.  Or they can keep using decimal if they need 128 bits.  Win/win?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-161) Create a new column type that run-length-encodes decimals

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435961#comment-16435961
 ] 

ASF GitHub Bot commented on ORC-161:


Github user majetideepak commented on a diff in the pull request:

https://github.com/apache/orc/pull/245#discussion_r181155751
  
--- Diff: site/_docs/encodings.md ---
@@ -123,6 +127,41 @@ DIRECT_V2 | PRESENT | Yes  | Boolean 
RLE
   | DATA| No   | Unbounded base 128 varints
   | SECONDARY   | No   | Unsigned Integer RLE v2
 
+In ORC 2.0, DECIMAL and DECIMAL_V2 encodings are introduced and scale
+stream is totally removed as all decimal values use the same scale.
+There are two difference cases: precision<=18 and precision>18.
+
+### Decimal Encoding for precision <= 18
+
+When precision is no greater than 18, decimal values can be fully
+represented by 64-bit signed integers which are stored in DATA stream
+and use signed integer RLE.
+
+Encoding  | Stream Kind | Optional | Contents
+: | :-- | :--- | :---
+DECIMAL   | PRESENT | Yes  | Boolean RLE
+  | DATA| No   | Signed Integer RLE v1
+DECIMAL_V2| PRESENT | Yes  | Boolean RLE
+  | DATA| No   | Signed Integer RLE v2
--- End diff --

@xndai Vertica is interested in getting RLE v2 for C++ as well. Do you 
think we can collaborate on getting this in quickly?


> Create a new column type that run-length-encodes decimals
> -
>
> Key: ORC-161
> URL: https://issues.apache.org/jira/browse/ORC-161
> Project: ORC
>  Issue Type: Wish
>  Components: encoding
>Reporter: Douglas Drinka
>Priority: Major
>
> I'm storing prices in ORC format, and have made the following observations 
> about the current decimal implementation:
> - The encoding is inefficient: my prices are a walking-random set, plus or 
> minus a few pennies per data point. This would encode beautifully with a 
> patched base encoding.  Instead I'm averaging 4 bytes per data point, after 
> Zlib.
> - Everyone acknowledges that it's nice to be able to store huge numbers in 
> decimal columns, but that you probably won't.  Presto, for instance, has a 
> fast-path which engages for precision of 18 or less, and decodes to 64-bit 
> longs, and then a slow path which uses BigInt.  I anticipate the majority of 
> implementations fit the decimal(18,6) use case.
> - The whole concept of precision/scale, along with a dedicated scale per data 
> point is messy.  Sometimes it's checked on data ingest, other times its an 
> error on reading, or else it's cast (and rounded?)
> I don't propose eliminating the current column type.  It's nice to know 
> there's a way to store really big numbers (or really accurate numbers) if I 
> need that in the future.
> But I'd like to see a new column that uses the existing Run Length Encoding 
> functionality, and is limited to 63+1 bit numbers, with a fixed precision and 
> scale for ingest and query.
> I think one could call this FixedPoint.  Every number is stored as a long, 
> and scaled by a column constant.  Ingest from decimal would scale and throw 
> or round, configurably.  Precision would be fixed at 18, or made configurable 
> and verified at ingest.  Stats would use longs (scaled with the column) 
> rather than strings.
> Anyone can opt in to faster, smaller data sets, if they're ok with 63+1 bits 
> of precision.  Or they can keep using decimal if they need 128 bits.  Win/win?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-161) Create a new column type that run-length-encodes decimals

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435974#comment-16435974
 ] 

ASF GitHub Bot commented on ORC-161:


Github user wgtmac commented on a diff in the pull request:

https://github.com/apache/orc/pull/245#discussion_r181157700
  
--- Diff: site/_docs/encodings.md ---
@@ -123,6 +127,41 @@ DIRECT_V2 | PRESENT | Yes  | Boolean 
RLE
   | DATA| No   | Unbounded base 128 varints
   | SECONDARY   | No   | Unsigned Integer RLE v2
 
+In ORC 2.0, DECIMAL and DECIMAL_V2 encodings are introduced and scale
+stream is totally removed as all decimal values use the same scale.
+There are two difference cases: precision<=18 and precision>18.
+
+### Decimal Encoding for precision <= 18
+
+When precision is no greater than 18, decimal values can be fully
+represented by 64-bit signed integers which are stored in DATA stream
+and use signed integer RLE.
+
+Encoding  | Stream Kind | Optional | Contents
+: | :-- | :--- | :---
+DECIMAL   | PRESENT | Yes  | Boolean RLE
+  | DATA| No   | Signed Integer RLE v1
+DECIMAL_V2| PRESENT | Yes  | Boolean RLE
+  | DATA| No   | Signed Integer RLE v2
--- End diff --

@majetideepak We are already working on it and doing test & benchmark. Will 
contribute back  but may not be that soon.


> Create a new column type that run-length-encodes decimals
> -
>
> Key: ORC-161
> URL: https://issues.apache.org/jira/browse/ORC-161
> Project: ORC
>  Issue Type: Wish
>  Components: encoding
>Reporter: Douglas Drinka
>Priority: Major
>
> I'm storing prices in ORC format, and have made the following observations 
> about the current decimal implementation:
> - The encoding is inefficient: my prices are a walking-random set, plus or 
> minus a few pennies per data point. This would encode beautifully with a 
> patched base encoding.  Instead I'm averaging 4 bytes per data point, after 
> Zlib.
> - Everyone acknowledges that it's nice to be able to store huge numbers in 
> decimal columns, but that you probably won't.  Presto, for instance, has a 
> fast-path which engages for precision of 18 or less, and decodes to 64-bit 
> longs, and then a slow path which uses BigInt.  I anticipate the majority of 
> implementations fit the decimal(18,6) use case.
> - The whole concept of precision/scale, along with a dedicated scale per data 
> point is messy.  Sometimes it's checked on data ingest, other times its an 
> error on reading, or else it's cast (and rounded?)
> I don't propose eliminating the current column type.  It's nice to know 
> there's a way to store really big numbers (or really accurate numbers) if I 
> need that in the future.
> But I'd like to see a new column that uses the existing Run Length Encoding 
> functionality, and is limited to 63+1 bit numbers, with a fixed precision and 
> scale for ingest and query.
> I think one could call this FixedPoint.  Every number is stored as a long, 
> and scaled by a column constant.  Ingest from decimal would scale and throw 
> or round, configurably.  Precision would be fixed at 18, or made configurable 
> and verified at ingest.  Stats would use longs (scaled with the column) 
> rather than strings.
> Anyone can opt in to faster, smaller data sets, if they're ok with 63+1 bits 
> of precision.  Or they can keep using decimal if they need 128 bits.  Win/win?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-161) Create a new column type that run-length-encodes decimals

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435979#comment-16435979
 ] 

ASF GitHub Bot commented on ORC-161:


Github user wgtmac commented on a diff in the pull request:

https://github.com/apache/orc/pull/245#discussion_r181158484
  
--- Diff: site/_docs/file-tail.md ---
@@ -249,12 +249,25 @@ For booleans, the statistics include the count of 
false and true values.
 }
 ```
 
-For decimals, the minimum, maximum, and sum are stored.
+For decimals, the minimum, maximum, and sum are stored. In ORC 2.0,
+string representation is deprecated and DecimalStatistics uses integers
+which have better performance.
 
 ```message DecimalStatistics {
  optional string minimum = 1;
  optional string maximum = 2;
  optional string sum = 3;
+  message Int128 {
+   repeated sint64 highBits = 1;
+   repeated uint64 lowBits = 2;
--- End diff --

Here I was aligning with C++ orc::Int128's implementation to avoid many 
casts.


> Create a new column type that run-length-encodes decimals
> -
>
> Key: ORC-161
> URL: https://issues.apache.org/jira/browse/ORC-161
> Project: ORC
>  Issue Type: Wish
>  Components: encoding
>Reporter: Douglas Drinka
>Priority: Major
>
> I'm storing prices in ORC format, and have made the following observations 
> about the current decimal implementation:
> - The encoding is inefficient: my prices are a walking-random set, plus or 
> minus a few pennies per data point. This would encode beautifully with a 
> patched base encoding.  Instead I'm averaging 4 bytes per data point, after 
> Zlib.
> - Everyone acknowledges that it's nice to be able to store huge numbers in 
> decimal columns, but that you probably won't.  Presto, for instance, has a 
> fast-path which engages for precision of 18 or less, and decodes to 64-bit 
> longs, and then a slow path which uses BigInt.  I anticipate the majority of 
> implementations fit the decimal(18,6) use case.
> - The whole concept of precision/scale, along with a dedicated scale per data 
> point is messy.  Sometimes it's checked on data ingest, other times its an 
> error on reading, or else it's cast (and rounded?)
> I don't propose eliminating the current column type.  It's nice to know 
> there's a way to store really big numbers (or really accurate numbers) if I 
> need that in the future.
> But I'd like to see a new column that uses the existing Run Length Encoding 
> functionality, and is limited to 63+1 bit numbers, with a fixed precision and 
> scale for ingest and query.
> I think one could call this FixedPoint.  Every number is stored as a long, 
> and scaled by a column constant.  Ingest from decimal would scale and throw 
> or round, configurably.  Precision would be fixed at 18, or made configurable 
> and verified at ingest.  Stats would use longs (scaled with the column) 
> rather than strings.
> Anyone can opt in to faster, smaller data sets, if they're ok with 63+1 bits 
> of precision.  Or they can keep using decimal if they need 128 bits.  Win/win?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-161) Create a new column type that run-length-encodes decimals

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436017#comment-16436017
 ] 

ASF GitHub Bot commented on ORC-161:


Github user dain commented on a diff in the pull request:

https://github.com/apache/orc/pull/245#discussion_r181164570
  
--- Diff: site/_docs/encodings.md ---
@@ -109,10 +109,20 @@ DIRECT_V2 | PRESENT | Yes  | Boolean 
RLE
 Decimal was introduced in Hive 0.11 with infinite precision (the total
 number of digits). In Hive 0.13, the definition was change to limit
 the precision to a maximum of 38 digits, which conveniently uses 127
-bits plus a sign bit. The current encoding of decimal columns stores
-the integer representation of the value as an unbounded length zigzag
-encoded base 128 varint. The scale is stored in the SECONDARY stream
-as an signed integer.
+bits plus a sign bit.
+
+DIRECT and DIRECT_V2 encodings of decimal columns stores the integer
+representation of the value as an unbounded length zigzag encoded base
+128 varint. The scale is stored in the SECONDARY stream as an signed
+integer.
+
+In ORC 2.0, DECIMAL encoding is introduced and totally remove scale
+stream as all decimal values use the same scale. When precision is
+no greater than 18, decimal values can be fully represented by DATA
+stream which stores 64-bit signed integers. When precision is greater
+than 18, we use a 128-bit signed integer to store the decimal value.
+DATA stream stores the higher 64 bits and SECONDARY stream holds the
+lower 64 bits. Both streams use signed integer RLE v2.
--- End diff --

Why split the data across two streams?  This means 2 IOs (or one large 
coalesced IO) to read the values (assuming no nulls).  Instead, can't we put 
all 128 bits in one stream?


> Create a new column type that run-length-encodes decimals
> -
>
> Key: ORC-161
> URL: https://issues.apache.org/jira/browse/ORC-161
> Project: ORC
>  Issue Type: Wish
>  Components: encoding
>Reporter: Douglas Drinka
>Priority: Major
>
> I'm storing prices in ORC format, and have made the following observations 
> about the current decimal implementation:
> - The encoding is inefficient: my prices are a walking-random set, plus or 
> minus a few pennies per data point. This would encode beautifully with a 
> patched base encoding.  Instead I'm averaging 4 bytes per data point, after 
> Zlib.
> - Everyone acknowledges that it's nice to be able to store huge numbers in 
> decimal columns, but that you probably won't.  Presto, for instance, has a 
> fast-path which engages for precision of 18 or less, and decodes to 64-bit 
> longs, and then a slow path which uses BigInt.  I anticipate the majority of 
> implementations fit the decimal(18,6) use case.
> - The whole concept of precision/scale, along with a dedicated scale per data 
> point is messy.  Sometimes it's checked on data ingest, other times its an 
> error on reading, or else it's cast (and rounded?)
> I don't propose eliminating the current column type.  It's nice to know 
> there's a way to store really big numbers (or really accurate numbers) if I 
> need that in the future.
> But I'd like to see a new column that uses the existing Run Length Encoding 
> functionality, and is limited to 63+1 bit numbers, with a fixed precision and 
> scale for ingest and query.
> I think one could call this FixedPoint.  Every number is stored as a long, 
> and scaled by a column constant.  Ingest from decimal would scale and throw 
> or round, configurably.  Precision would be fixed at 18, or made configurable 
> and verified at ingest.  Stats would use longs (scaled with the column) 
> rather than strings.
> Anyone can opt in to faster, smaller data sets, if they're ok with 63+1 bits 
> of precision.  Or they can keep using decimal if they need 128 bits.  Win/win?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-161) Create a new column type that run-length-encodes decimals

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436038#comment-16436038
 ] 

ASF GitHub Bot commented on ORC-161:


Github user wgtmac commented on a diff in the pull request:

https://github.com/apache/orc/pull/245#discussion_r181168352
  
--- Diff: site/_docs/encodings.md ---
@@ -109,10 +109,20 @@ DIRECT_V2 | PRESENT | Yes  | Boolean 
RLE
 Decimal was introduced in Hive 0.11 with infinite precision (the total
 number of digits). In Hive 0.13, the definition was change to limit
 the precision to a maximum of 38 digits, which conveniently uses 127
-bits plus a sign bit. The current encoding of decimal columns stores
-the integer representation of the value as an unbounded length zigzag
-encoded base 128 varint. The scale is stored in the SECONDARY stream
-as an signed integer.
+bits plus a sign bit.
+
+DIRECT and DIRECT_V2 encodings of decimal columns stores the integer
+representation of the value as an unbounded length zigzag encoded base
+128 varint. The scale is stored in the SECONDARY stream as an signed
+integer.
+
+In ORC 2.0, DECIMAL encoding is introduced and totally remove scale
+stream as all decimal values use the same scale. When precision is
+no greater than 18, decimal values can be fully represented by DATA
+stream which stores 64-bit signed integers. When precision is greater
+than 18, we use a 128-bit signed integer to store the decimal value.
+DATA stream stores the higher 64 bits and SECONDARY stream holds the
+lower 64 bits. Both streams use signed integer RLE v2.
--- End diff --

The main problem is that we don't have 128-bit integer RLE on hand.


> Create a new column type that run-length-encodes decimals
> -
>
> Key: ORC-161
> URL: https://issues.apache.org/jira/browse/ORC-161
> Project: ORC
>  Issue Type: Wish
>  Components: encoding
>Reporter: Douglas Drinka
>Priority: Major
>
> I'm storing prices in ORC format, and have made the following observations 
> about the current decimal implementation:
> - The encoding is inefficient: my prices are a walking-random set, plus or 
> minus a few pennies per data point. This would encode beautifully with a 
> patched base encoding.  Instead I'm averaging 4 bytes per data point, after 
> Zlib.
> - Everyone acknowledges that it's nice to be able to store huge numbers in 
> decimal columns, but that you probably won't.  Presto, for instance, has a 
> fast-path which engages for precision of 18 or less, and decodes to 64-bit 
> longs, and then a slow path which uses BigInt.  I anticipate the majority of 
> implementations fit the decimal(18,6) use case.
> - The whole concept of precision/scale, along with a dedicated scale per data 
> point is messy.  Sometimes it's checked on data ingest, other times its an 
> error on reading, or else it's cast (and rounded?)
> I don't propose eliminating the current column type.  It's nice to know 
> there's a way to store really big numbers (or really accurate numbers) if I 
> need that in the future.
> But I'd like to see a new column that uses the existing Run Length Encoding 
> functionality, and is limited to 63+1 bit numbers, with a fixed precision and 
> scale for ingest and query.
> I think one could call this FixedPoint.  Every number is stored as a long, 
> and scaled by a column constant.  Ingest from decimal would scale and throw 
> or round, configurably.  Precision would be fixed at 18, or made configurable 
> and verified at ingest.  Stats would use longs (scaled with the column) 
> rather than strings.
> Anyone can opt in to faster, smaller data sets, if they're ok with 63+1 bits 
> of precision.  Or they can keep using decimal if they need 128 bits.  Win/win?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-161) Create a new column type that run-length-encodes decimals

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436270#comment-16436270
 ] 

ASF GitHub Bot commented on ORC-161:


Github user wgtmac commented on the issue:

https://github.com/apache/orc/pull/245
  
Will provide them after comprehensive benchmark.


> Create a new column type that run-length-encodes decimals
> -
>
> Key: ORC-161
> URL: https://issues.apache.org/jira/browse/ORC-161
> Project: ORC
>  Issue Type: Wish
>  Components: encoding
>Reporter: Douglas Drinka
>Priority: Major
>
> I'm storing prices in ORC format, and have made the following observations 
> about the current decimal implementation:
> - The encoding is inefficient: my prices are a walking-random set, plus or 
> minus a few pennies per data point. This would encode beautifully with a 
> patched base encoding.  Instead I'm averaging 4 bytes per data point, after 
> Zlib.
> - Everyone acknowledges that it's nice to be able to store huge numbers in 
> decimal columns, but that you probably won't.  Presto, for instance, has a 
> fast-path which engages for precision of 18 or less, and decodes to 64-bit 
> longs, and then a slow path which uses BigInt.  I anticipate the majority of 
> implementations fit the decimal(18,6) use case.
> - The whole concept of precision/scale, along with a dedicated scale per data 
> point is messy.  Sometimes it's checked on data ingest, other times its an 
> error on reading, or else it's cast (and rounded?)
> I don't propose eliminating the current column type.  It's nice to know 
> there's a way to store really big numbers (or really accurate numbers) if I 
> need that in the future.
> But I'd like to see a new column that uses the existing Run Length Encoding 
> functionality, and is limited to 63+1 bit numbers, with a fixed precision and 
> scale for ingest and query.
> I think one could call this FixedPoint.  Every number is stored as a long, 
> and scaled by a column constant.  Ingest from decimal would scale and throw 
> or round, configurably.  Precision would be fixed at 18, or made configurable 
> and verified at ingest.  Stats would use longs (scaled with the column) 
> rather than strings.
> Anyone can opt in to faster, smaller data sets, if they're ok with 63+1 bits 
> of precision.  Or they can keep using decimal if they need 128 bits.  Win/win?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-161) Create a new column type that run-length-encodes decimals

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436216#comment-16436216
 ] 

ASF GitHub Bot commented on ORC-161:


Github user t3rmin4t0r commented on a diff in the pull request:

https://github.com/apache/orc/pull/245#discussion_r181203617
  
--- Diff: site/_docs/encodings.md ---
@@ -123,6 +127,41 @@ DIRECT_V2 | PRESENT | Yes  | Boolean 
RLE
   | DATA| No   | Unbounded base 128 varints
   | SECONDARY   | No   | Unsigned Integer RLE v2
 
+In ORC 2.0, DECIMAL and DECIMAL_V2 encodings are introduced and scale
+stream is totally removed as all decimal values use the same scale.
+There are two difference cases: precision<=18 and precision>18.
+
+### Decimal Encoding for precision <= 18
+
+When precision is no greater than 18, decimal values can be fully
+represented by 64-bit signed integers which are stored in DATA stream
+and use signed integer RLE.
+
+Encoding  | Stream Kind | Optional | Contents
+: | :-- | :--- | :---
+DECIMAL   | PRESENT | Yes  | Boolean RLE
+  | DATA| No   | Signed Integer RLE v1
+DECIMAL_V2| PRESENT | Yes  | Boolean RLE
+  | DATA| No   | Signed Integer RLE v2
--- End diff --

Some part of this discussion is about the new ORC format and existing 
reader compatibility is not a requirement, until we switch to the new format as 
a default.


> Create a new column type that run-length-encodes decimals
> -
>
> Key: ORC-161
> URL: https://issues.apache.org/jira/browse/ORC-161
> Project: ORC
>  Issue Type: Wish
>  Components: encoding
>Reporter: Douglas Drinka
>Priority: Major
>
> I'm storing prices in ORC format, and have made the following observations 
> about the current decimal implementation:
> - The encoding is inefficient: my prices are a walking-random set, plus or 
> minus a few pennies per data point. This would encode beautifully with a 
> patched base encoding.  Instead I'm averaging 4 bytes per data point, after 
> Zlib.
> - Everyone acknowledges that it's nice to be able to store huge numbers in 
> decimal columns, but that you probably won't.  Presto, for instance, has a 
> fast-path which engages for precision of 18 or less, and decodes to 64-bit 
> longs, and then a slow path which uses BigInt.  I anticipate the majority of 
> implementations fit the decimal(18,6) use case.
> - The whole concept of precision/scale, along with a dedicated scale per data 
> point is messy.  Sometimes it's checked on data ingest, other times its an 
> error on reading, or else it's cast (and rounded?)
> I don't propose eliminating the current column type.  It's nice to know 
> there's a way to store really big numbers (or really accurate numbers) if I 
> need that in the future.
> But I'd like to see a new column that uses the existing Run Length Encoding 
> functionality, and is limited to 63+1 bit numbers, with a fixed precision and 
> scale for ingest and query.
> I think one could call this FixedPoint.  Every number is stored as a long, 
> and scaled by a column constant.  Ingest from decimal would scale and throw 
> or round, configurably.  Precision would be fixed at 18, or made configurable 
> and verified at ingest.  Stats would use longs (scaled with the column) 
> rather than strings.
> Anyone can opt in to faster, smaller data sets, if they're ok with 63+1 bits 
> of precision.  Or they can keep using decimal if they need 128 bits.  Win/win?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-161) Create a new column type that run-length-encodes decimals

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436214#comment-16436214
 ] 

ASF GitHub Bot commented on ORC-161:


Github user t3rmin4t0r commented on a diff in the pull request:

https://github.com/apache/orc/pull/245#discussion_r181202668
  
--- Diff: site/_docs/encodings.md ---
@@ -109,10 +109,20 @@ DIRECT_V2 | PRESENT | Yes  | Boolean 
RLE
 Decimal was introduced in Hive 0.11 with infinite precision (the total
 number of digits). In Hive 0.13, the definition was change to limit
 the precision to a maximum of 38 digits, which conveniently uses 127
-bits plus a sign bit. The current encoding of decimal columns stores
-the integer representation of the value as an unbounded length zigzag
-encoded base 128 varint. The scale is stored in the SECONDARY stream
-as an signed integer.
+bits plus a sign bit.
+
+DIRECT and DIRECT_V2 encodings of decimal columns stores the integer
+representation of the value as an unbounded length zigzag encoded base
+128 varint. The scale is stored in the SECONDARY stream as an signed
+integer.
+
+In ORC 2.0, DECIMAL encoding is introduced and totally remove scale
+stream as all decimal values use the same scale. When precision is
+no greater than 18, decimal values can be fully represented by DATA
+stream which stores 64-bit signed integers. When precision is greater
+than 18, we use a 128-bit signed integer to store the decimal value.
+DATA stream stores the higher 64 bits and SECONDARY stream holds the
+lower 64 bits. Both streams use signed integer RLE v2.
--- End diff --

The multiple-stream + row-group stride problems for IO were discussed by 
Owen.

The disk layout is what matters for IO, not the logical stream separation.


> Create a new column type that run-length-encodes decimals
> -
>
> Key: ORC-161
> URL: https://issues.apache.org/jira/browse/ORC-161
> Project: ORC
>  Issue Type: Wish
>  Components: encoding
>Reporter: Douglas Drinka
>Priority: Major
>
> I'm storing prices in ORC format, and have made the following observations 
> about the current decimal implementation:
> - The encoding is inefficient: my prices are a walking-random set, plus or 
> minus a few pennies per data point. This would encode beautifully with a 
> patched base encoding.  Instead I'm averaging 4 bytes per data point, after 
> Zlib.
> - Everyone acknowledges that it's nice to be able to store huge numbers in 
> decimal columns, but that you probably won't.  Presto, for instance, has a 
> fast-path which engages for precision of 18 or less, and decodes to 64-bit 
> longs, and then a slow path which uses BigInt.  I anticipate the majority of 
> implementations fit the decimal(18,6) use case.
> - The whole concept of precision/scale, along with a dedicated scale per data 
> point is messy.  Sometimes it's checked on data ingest, other times its an 
> error on reading, or else it's cast (and rounded?)
> I don't propose eliminating the current column type.  It's nice to know 
> there's a way to store really big numbers (or really accurate numbers) if I 
> need that in the future.
> But I'd like to see a new column that uses the existing Run Length Encoding 
> functionality, and is limited to 63+1 bit numbers, with a fixed precision and 
> scale for ingest and query.
> I think one could call this FixedPoint.  Every number is stored as a long, 
> and scaled by a column constant.  Ingest from decimal would scale and throw 
> or round, configurably.  Precision would be fixed at 18, or made configurable 
> and verified at ingest.  Stats would use longs (scaled with the column) 
> rather than strings.
> Anyone can opt in to faster, smaller data sets, if they're ok with 63+1 bits 
> of precision.  Or they can keep using decimal if they need 128 bits.  Win/win?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-161) Create a new column type that run-length-encodes decimals

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436183#comment-16436183
 ] 

ASF GitHub Bot commented on ORC-161:


Github user prasanthj commented on the issue:

https://github.com/apache/orc/pull/245
  
"we found RLEv1 + zstd may be the best combination than others in terms of 
both compression ration and encoding/decoding speed."

do you have experimental numbers for this?


> Create a new column type that run-length-encodes decimals
> -
>
> Key: ORC-161
> URL: https://issues.apache.org/jira/browse/ORC-161
> Project: ORC
>  Issue Type: Wish
>  Components: encoding
>Reporter: Douglas Drinka
>Priority: Major
>
> I'm storing prices in ORC format, and have made the following observations 
> about the current decimal implementation:
> - The encoding is inefficient: my prices are a walking-random set, plus or 
> minus a few pennies per data point. This would encode beautifully with a 
> patched base encoding.  Instead I'm averaging 4 bytes per data point, after 
> Zlib.
> - Everyone acknowledges that it's nice to be able to store huge numbers in 
> decimal columns, but that you probably won't.  Presto, for instance, has a 
> fast-path which engages for precision of 18 or less, and decodes to 64-bit 
> longs, and then a slow path which uses BigInt.  I anticipate the majority of 
> implementations fit the decimal(18,6) use case.
> - The whole concept of precision/scale, along with a dedicated scale per data 
> point is messy.  Sometimes it's checked on data ingest, other times its an 
> error on reading, or else it's cast (and rounded?)
> I don't propose eliminating the current column type.  It's nice to know 
> there's a way to store really big numbers (or really accurate numbers) if I 
> need that in the future.
> But I'd like to see a new column that uses the existing Run Length Encoding 
> functionality, and is limited to 63+1 bit numbers, with a fixed precision and 
> scale for ingest and query.
> I think one could call this FixedPoint.  Every number is stored as a long, 
> and scaled by a column constant.  Ingest from decimal would scale and throw 
> or round, configurably.  Precision would be fixed at 18, or made configurable 
> and verified at ingest.  Stats would use longs (scaled with the column) 
> rather than strings.
> Anyone can opt in to faster, smaller data sets, if they're ok with 63+1 bits 
> of precision.  Or they can keep using decimal if they need 128 bits.  Win/win?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ORC-318) Change HadoopShims.KeyProvider to separate createLocalKey and decryptLocalKey

2018-04-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/ORC-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved ORC-318.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

> Change HadoopShims.KeyProvider to separate createLocalKey and decryptLocalKey
> -
>
> Key: ORC-318
> URL: https://issues.apache.org/jira/browse/ORC-318
> Project: ORC
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
> Fix For: 1.5.0
>
>
> Looking through the [AWS 
> KMS|https://docs.aws.amazon.com/kms/latest/APIReference/Welcome.html] docs, 
> to be compatible we should probably separate creating a local key from 
> decrypting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-339) Reorganize ORC specification

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436363#comment-16436363
 ] 

ASF GitHub Bot commented on ORC-339:


GitHub user omalley opened a pull request:

https://github.com/apache/orc/pull/247

ORC-339. Reorganize the ORC file format specification.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/orc orc-339

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/247.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #247


commit 5c56d74d948a73f5c456e0e80ff0622505d6c1cf
Author: Owen O'Malley 
Date:   2018-04-12T22:03:00Z

ORC-339. Reorganize the ORC file format specification.




> Reorganize ORC specification
> 
>
> Key: ORC-339
> URL: https://issues.apache.org/jira/browse/ORC-339
> Project: ORC
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> Currently we've put the ORC format specification in the documentation. Now 
> that we are starting the work to design ORCv2, it will be more convenient to 
> have each file format version as a separate page. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ORC-339) Reorganize ORC specification

2018-04-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/ORC-339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned ORC-339:
-


> Reorganize ORC specification
> 
>
> Key: ORC-339
> URL: https://issues.apache.org/jira/browse/ORC-339
> Project: ORC
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> Currently we've put the ORC format specification in the documentation. Now 
> that we are starting the work to design ORCv2, it will be more convenient to 
> have each file format version as a separate page. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-339) Reorganize ORC specification

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436395#comment-16436395
 ] 

ASF GitHub Bot commented on ORC-339:


Github user wgtmac commented on a diff in the pull request:

https://github.com/apache/orc/pull/247#discussion_r181239251
  
--- Diff: site/specification/ORCv2.md ---
@@ -0,0 +1,1032 @@
+---
+layout: page
+title: Evolving Draft for ORC Specification v2
+---
+
+This specification is rapidly evolving and should only be used for
+developers on the project.
+
+# TO DO items
+
+The list of things that we plan to change:
+
+* Create a decimal representation with fixed scale using rle.
+* Create a better float/double encoding that splits mantissa and
+  exponent.
+* Create a dictionary encoding for float, double, and decimal.
+* Create RLEv3:
+   * 64 and 128 bit variants
+   * Zero suppression
+   * Evaluate the rle subformats
+* Group stripe data into stripelets to enable Async IO for reads.
+* Reorder stripe data into (stripe metadata, index, dictionary, data)
+* Stop sorting dictionaries and record the sort order separately in the 
index.
+* Remove use of RLEv1 and RLEv2.
+* Remove non-utf8 bloom filter.
+* Use numeric value for decimal bloom filter.
--- End diff --

We may also use numeric value for decimal column statistics


> Reorganize ORC specification
> 
>
> Key: ORC-339
> URL: https://issues.apache.org/jira/browse/ORC-339
> Project: ORC
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> Currently we've put the ORC format specification in the documentation. Now 
> that we are starting the work to design ORCv2, it will be more convenient to 
> have each file format version as a separate page. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-161) Create a new column type that run-length-encodes decimals

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436156#comment-16436156
 ] 

ASF GitHub Bot commented on ORC-161:


Github user wgtmac commented on the issue:

https://github.com/apache/orc/pull/245
  
After second thought, I added back DECIMAL_V1 to support RLE v1 in decimal 
encoding. The reason is that in our testing, we found RLEv1 + zstd may be the 
best combination than others in terms of both compression ration and 
encoding/decoding speed.


> Create a new column type that run-length-encodes decimals
> -
>
> Key: ORC-161
> URL: https://issues.apache.org/jira/browse/ORC-161
> Project: ORC
>  Issue Type: Wish
>  Components: encoding
>Reporter: Douglas Drinka
>Priority: Major
>
> I'm storing prices in ORC format, and have made the following observations 
> about the current decimal implementation:
> - The encoding is inefficient: my prices are a walking-random set, plus or 
> minus a few pennies per data point. This would encode beautifully with a 
> patched base encoding.  Instead I'm averaging 4 bytes per data point, after 
> Zlib.
> - Everyone acknowledges that it's nice to be able to store huge numbers in 
> decimal columns, but that you probably won't.  Presto, for instance, has a 
> fast-path which engages for precision of 18 or less, and decodes to 64-bit 
> longs, and then a slow path which uses BigInt.  I anticipate the majority of 
> implementations fit the decimal(18,6) use case.
> - The whole concept of precision/scale, along with a dedicated scale per data 
> point is messy.  Sometimes it's checked on data ingest, other times its an 
> error on reading, or else it's cast (and rounded?)
> I don't propose eliminating the current column type.  It's nice to know 
> there's a way to store really big numbers (or really accurate numbers) if I 
> need that in the future.
> But I'd like to see a new column that uses the existing Run Length Encoding 
> functionality, and is limited to 63+1 bit numbers, with a fixed precision and 
> scale for ingest and query.
> I think one could call this FixedPoint.  Every number is stored as a long, 
> and scaled by a column constant.  Ingest from decimal would scale and throw 
> or round, configurably.  Precision would be fixed at 18, or made configurable 
> and verified at ingest.  Stats would use longs (scaled with the column) 
> rather than strings.
> Anyone can opt in to faster, smaller data sets, if they're ok with 63+1 bits 
> of precision.  Or they can keep using decimal if they need 128 bits.  Win/win?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-318) Change HadoopShims.KeyProvider to separate createLocalKey and decryptLocalKey

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435737#comment-16435737
 ] 

ASF GitHub Bot commented on ORC-318:


Github user asfgit closed the pull request at:

https://github.com/apache/orc/pull/227


> Change HadoopShims.KeyProvider to separate createLocalKey and decryptLocalKey
> -
>
> Key: ORC-318
> URL: https://issues.apache.org/jira/browse/ORC-318
> Project: ORC
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> Looking through the [AWS 
> KMS|https://docs.aws.amazon.com/kms/latest/APIReference/Welcome.html] docs, 
> to be compatible we should probably separate creating a local key from 
> decrypting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ORC-338) Workaround C++ compiler bug in newest clang including xcode 9.3

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435740#comment-16435740
 ] 

ASF GitHub Bot commented on ORC-338:


Github user asfgit closed the pull request at:

https://github.com/apache/orc/pull/246


> Workaround C++ compiler bug in newest clang including xcode 9.3
> ---
>
> Key: ORC-338
> URL: https://issues.apache.org/jira/browse/ORC-338
> Project: ORC
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>
> The ColumnStatistics.intColumnStatistics test fails in the xcode 9.3 if you 
> use the release build, but passes in the debug build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ORC-338) Workaround C++ compiler bug in newest clang including xcode 9.3

2018-04-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/ORC-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved ORC-338.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

> Workaround C++ compiler bug in newest clang including xcode 9.3
> ---
>
> Key: ORC-338
> URL: https://issues.apache.org/jira/browse/ORC-338
> Project: ORC
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
> Fix For: 1.5.0
>
>
> The ColumnStatistics.intColumnStatistics test fails in the xcode 9.3 if you 
> use the release build, but passes in the debug build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)