[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451381#comment-16451381
 ] 

ASF GitHub Bot commented on PARQUET-968:


chawlakunal commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-384106975
 
 
   When can this be expected to be merged to master and released?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450178#comment-16450178
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r183800256
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -77,63 +77,116 @@ static LogicalTypeAnnotation 
fromOriginalType(OriginalType originalType, Decimal
 }
 switch (originalType) {
   case UTF8:
-return StringLogicalTypeAnnotation.create();
+return stringType();
   case MAP:
-return MapLogicalTypeAnnotation.create();
+return mapType();
   case DECIMAL:
 int scale = (decimalMetadata == null ? 0 : decimalMetadata.getScale());
 int precision = (decimalMetadata == null ? 0 : 
decimalMetadata.getPrecision());
-return DecimalLogicalTypeAnnotation.create(scale, precision);
+return decimalType(scale, precision);
   case LIST:
-return ListLogicalTypeAnnotation.create();
+return listType();
   case DATE:
-return DateLogicalTypeAnnotation.create();
+return dateType();
   case INTERVAL:
-return IntervalLogicalTypeAnnotation.create();
+return intervalType();
   case TIMESTAMP_MILLIS:
-return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+return timestampType(true, LogicalTypeAnnotation.TimeUnit.MILLIS);
   case TIMESTAMP_MICROS:
-return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+return timestampType(true, LogicalTypeAnnotation.TimeUnit.MICROS);
   case TIME_MILLIS:
-return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+return timeType(true, LogicalTypeAnnotation.TimeUnit.MILLIS);
   case TIME_MICROS:
-return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+return timeType(true, LogicalTypeAnnotation.TimeUnit.MICROS);
   case UINT_8:
-return IntLogicalTypeAnnotation.create(8, false);
+return intType(8, false);
   case UINT_16:
-return IntLogicalTypeAnnotation.create(16, false);
+return intType(16, false);
   case UINT_32:
-return IntLogicalTypeAnnotation.create(32, false);
+return intType(32, false);
   case UINT_64:
-return IntLogicalTypeAnnotation.create(64, false);
+return intType(64, false);
   case INT_8:
-return IntLogicalTypeAnnotation.create(8, true);
+return intType(8, true);
   case INT_16:
-return IntLogicalTypeAnnotation.create(16, true);
+return intType(16, true);
   case INT_32:
-return IntLogicalTypeAnnotation.create(32, true);
+return intType(32, true);
   case INT_64:
-return IntLogicalTypeAnnotation.create(64, true);
+return intType(64, true);
   case ENUM:
-return EnumLogicalTypeAnnotation.create();
+return enumType();
   case JSON:
-return JsonLogicalTypeAnnotation.create();
+return jsonType();
   case BSON:
-return BsonLogicalTypeAnnotation.create();
+return bsonType();
   case MAP_KEY_VALUE:
-return MapKeyValueTypeAnnotation.create();
+return mapKeyValueType();
   default:
 throw new RuntimeException("Can't convert original type to logical 
type, unknown original type " + originalType);
 }
   }
 
+
+  static StringLogicalTypeAnnotation stringType() {
+return StringLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static MapLogicalTypeAnnotation mapType() {
+return MapLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static ListLogicalTypeAnnotation listType() {
+return ListLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static EnumLogicalTypeAnnotation enumType() {
+return EnumLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static DecimalLogicalTypeAnnotation decimalType(final int scale, final int 
precision) {
+return new DecimalLogicalTypeAnnotation(scale, precision);
+  }
+
+  static DateLogicalTypeAnnotation dateType() {
+return DateLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static TimeLogicalTypeAnnotation timeType(final boolean isAdjustedToUTC, 
final TimeUnit unit) {
+return new TimeLogicalTypeAnnotation(isAdjustedToUTC, unit);
+  }
+
+  static TimestampLogicalTypeAnnotation timestampType(final boolean 
isAdjustedToUTC, final TimeUnit unit) {
+return new TimestampLogicalTypeAnnotation(isAdjustedToUTC, unit);
+  }
+
+  static IntLogicalTypeAnnotation intType(final int bitWidth, final boolean 
isSigned) {
+Preconditions.checkArgument(
+  bitWidth == 8 || bitWidth == 

[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450169#comment-16450169
 ] 

ASF GitHub Bot commented on PARQUET-968:


julienledem commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-383997195
 
 
   This looks good.
   Thank you for this collaborative effort!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1281) Jackson dependency

2018-04-24 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450163#comment-16450163
 ] 

Julien Le Dem commented on PARQUET-1281:


parquet-hadoop should have its build include shading like parquet thrift:

https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml#L174

> Jackson dependency
> --
>
> Key: PARQUET-1281
> URL: https://issues.apache.org/jira/browse/PARQUET-1281
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Qinghui Xu
>Priority: Major
>
> Currently we shaded jackson in parquet-jackson module (org.codehaus.jackon 
> --> shaded.parquet.org.codehaus.jackson), but in fact we do not use the 
> shaded jackson in parquet-hadoop code. Is that a mistake? (see 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ParquetMetadata.java#L26)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Parquet sync

2018-04-24 Thread Julien Le Dem
Happening now:
https://meet.google.com/esu-yiit-mun


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450040#comment-16450040
 ] 

ASF GitHub Bot commented on PARQUET-968:


lukasnalezenec commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-383969694
 
 
   Hi, I already did. 
   There is one typo in comment and it is little bit harder to read - i wanted 
to check flow once more. I think we can commit it as it is.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1246) Ignore float/double statistics in case of NaN

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449841#comment-16449841
 ] 

ASF GitHub Bot commented on PARQUET-1246:
-

gszadovszky opened a new pull request #468: PARQUET-1246: Ignore float/double 
statistics in case of NaN
URL: https://github.com/apache/parquet-mr/pull/468
 
 
   Because of the ambigous sorting order of float/double the following changes 
made at the reading path of the related statistics:
   - Ignoring statistics in case of it contains a NaN value.
   - Using -0.0 as min value and +0.0 as max value independently from which 0.0 
value was saved in the statistics.
   
   Author: Gabor Szadovszky 
   
   Closes #461 from gszadovszky/PARQUET-1246 and squashes the following commits:
   
   20e9332 [Gabor Szadovszky] PARQUET-1246: Changes according to zi's comments
   3447938 [Gabor Szadovszky] PARQUET-1246: Ignore float/double statistics in 
case of NaN
   
   This change is based on 0a86429939075984edce5e3b8195dfb7f9e3ab6b but is not 
a clean cherry-pick.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ignore float/double statistics in case of NaN
> -
>
> Key: PARQUET-1246
> URL: https://issues.apache.org/jira/browse/PARQUET-1246
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: 1.10.0
>
>
> The sorting order of the floating point values are not properly specified, 
> therefore NaN values can cause skipping valid values when filtering. See 
> PARQUET-1222 for more info.
> This issue is for ignoring statistics for float/double if it contains NaN to 
> prevent data loss at the read path when filtering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1217) Incorrect handling of missing values in Statistics

2018-04-24 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky updated PARQUET-1217:
--
Fix Version/s: 1.8.3

> Incorrect handling of missing values in Statistics
> --
>
> Key: PARQUET-1217
> URL: https://issues.apache.org/jira/browse/PARQUET-1217
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.9.0, 1.10.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: 1.10.0, 1.8.3
>
>
> As per the parquet-format specs the min/max values in statistics are 
> optional. Therefore, it is possible to have {{numNulls}} in {{Statistics}} 
> while we don't have min/max values. In {{StatisticsFilter}} we rely on the 
> method 
> [StatisticsFilter.isAllNulls(ColumnChunkMetaData)|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java#L90]
>  to handle the case of {{null}} min/max values which is not correct due to 
> the described scenario. 
>  We shall check {{Statistics.hasNonNullValue()}} any time before using the 
> actual min/max values.
> In addition we don't check if the {{null_count}} is set or not when reading 
> from the parquet file. We simply use the value which is {{0}} in case of 
> unset. In the parquet-mr side the {{Statistics}} object uses the value {{0}} 
> to sign that the {{num_nulls}} is unset. It is incorrect if we are searching 
> for null values and we falsely drop a column chunk thinking there are no null 
> values but the field in the statistics was simply unset.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-852) Slowly ramp up sizes of byte[] in ByteBasedBitPackingEncoder

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449832#comment-16449832
 ] 

ASF GitHub Bot commented on PARQUET-852:


zivanfi closed pull request #467: Revert "PARQUET-852: Slowly ramp up sizes of 
byte[] in ByteBasedBitPackingEncoder"
URL: https://github.com/apache/parquet-mr/pull/467
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/parquet-encoding/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBasedBitPackingEncoder.java
 
b/parquet-encoding/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBasedBitPackingEncoder.java
index 0bc8b3023..cc23e8f87 100644
--- 
a/parquet-encoding/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBasedBitPackingEncoder.java
+++ 
b/parquet-encoding/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBasedBitPackingEncoder.java
@@ -1,4 +1,4 @@
-/*
+/* 
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -6,9 +6,9 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- *
+ * 
  *   http://www.apache.org/licenses/LICENSE-2.0
- *
+ * 
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -39,14 +39,11 @@
   private static final Logger LOG = 
LoggerFactory.getLogger(ByteBasedBitPackingEncoder.class);
 
   private static final int VALUES_WRITTEN_AT_A_TIME = 8;
-  private static final int MAX_SLAB_SIZE_MULT = 64 * 1024;
-  private static final int INITIAL_SLAB_SIZE_MULT = 1024;
 
   private final int bitWidth;
   private final BytePacker packer;
   private final int[] input = new int[VALUES_WRITTEN_AT_A_TIME];
-  private int slabSize;
-  private long totalFullSlabSize;
+  private final int slabSize;
   private int inputSize;
   private byte[] packed;
   private int packedPosition;
@@ -59,9 +56,8 @@
   public ByteBasedBitPackingEncoder(int bitWidth, Packer packer) {
 this.bitWidth = bitWidth;
 this.inputSize = 0;
-this.totalFullSlabSize = 0;
 // must be a multiple of bitWidth
-this.slabSize = (bitWidth == 0) ? 1 : (bitWidth * INITIAL_SLAB_SIZE_MULT);
+this.slabSize = bitWidth * 64 * 1024;
 initPackedSlab();
 this.packer = packer.newBytePacker(bitWidth);
   }
@@ -79,10 +75,6 @@ public void writeInt(int value) throws IOException {
   pack();
   if (packedPosition == slabSize) {
 slabs.add(BytesInput.from(packed));
-totalFullSlabSize += slabSize;
-if (slabSize < bitWidth * MAX_SLAB_SIZE_MULT) {
-  slabSize *= 2;
-}
 initPackedSlab();
   }
 }
@@ -107,7 +99,7 @@ private void initPackedSlab() {
   public BytesInput toBytes() throws IOException {
 int packedByteLength = packedPosition + 
BytesUtils.paddedByteCountFromBits(inputSize * bitWidth);
 
-LOG.debug("writing {} bytes", (totalFullSlabSize + packedByteLength));
+LOG.debug("writing {} bytes", (slabs.size() * slabSize + 
packedByteLength));
 if (inputSize > 0) {
   for (int i = inputSize; i < input.length; i++) {
 input[i] = 0;
@@ -121,24 +113,18 @@ public BytesInput toBytes() throws IOException {
* @return size of the data as it would be written
*/
   public long getBufferSize() {
-return BytesUtils.paddedByteCountFromBits((totalValues + inputSize) * 
bitWidth);
+return BytesUtils.paddedByteCountFromBits(totalValues * bitWidth);
   }
 
   /**
* @return total memory allocated
*/
   public long getAllocatedSize() {
-return totalFullSlabSize + packed.length + input.length * 4;
+return (slabs.size() * slabSize) + packed.length + input.length * 4;
   }
 
   public String memUsageString(String prefix) {
 return String.format("%s ByteBitPacking %d slabs, %d bytes", prefix, 
slabs.size(), getAllocatedSize());
   }
 
-  /**
-   * @return number of full slabs along with the current slab (debug aid)
-   */
-  int getNumSlabs() {
-return slabs.size() + 1;
-  }
 }
diff --git 
a/parquet-encoding/src/test/java/org/apache/parquet/column/values/bitpacking/TestByteBasedBitPackingEncoder.java
 
b/parquet-encoding/src/test/java/org/apache/parquet/column/values/bitpacking/TestByteBasedBitPackingEncoder.java
index b49595b43..293b961f0 100644
--- 
a/parquet-encoding/src/test/java/org/apache/parquet/column/values/bitpacking/TestByteBasedBitPackingEncoder.java
+++ 

[jira] [Commented] (PARQUET-1217) Incorrect handling of missing values in Statistics

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449830#comment-16449830
 ] 

ASF GitHub Bot commented on PARQUET-1217:
-

zivanfi closed pull request #465: PARQUET-1217: Incorrect handling of missing 
values in Statistics
URL: https://github.com/apache/parquet-mr/pull/465
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java
 
b/parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java
index 30153c074..26c14c135 100644
--- 
a/parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java
+++ 
b/parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java
@@ -31,6 +31,44 @@
  */
 public abstract class Statistics> {
 
+  /**
+   * Builder class to build Statistics objects. Used to read the statistics 
from the Parquet file.
+   */
+  public static class Builder {
+private final PrimitiveTypeName type;
+private byte[] min;
+private byte[] max;
+private long numNulls = -1;
+
+private Builder(PrimitiveTypeName type) {
+  this.type = type;
+}
+
+public Builder withMin(byte[] min) {
+  this.min = min;
+  return this;
+}
+
+public Builder withMax(byte[] max) {
+  this.max = max;
+  return this;
+}
+
+public Builder withNumNulls(long numNulls) {
+  this.numNulls = numNulls;
+  return this;
+}
+
+public Statistics build() {
+  Statistics stats = getStatsBasedOnType(type);
+  if (min != null && max != null) {
+stats.setMinMaxFromBytes(min, max);
+  }
+  stats.num_nulls = this.numNulls;
+  return stats;
+}
+  }
+
   private boolean hasNonNullValue;
   private long num_nulls;
 
@@ -67,6 +105,17 @@ public static Statistics 
getStatsBasedOnType(PrimitiveTypeName type) {
 }
   }
 
+  /**
+   * Returns a builder to create new statistics object. Used to read the 
statistics from the parquet file.
+   *
+   * @param type
+   *  type of the column
+   * @return builder to create new statistics object
+   */
+  public static Builder getBuilder(PrimitiveTypeName type) {
+return new Builder(type);
+  }
+
   /**
* updates statistics min and max using the passed value
* @param value value to use to update min and max
@@ -172,7 +221,9 @@ public void mergeStatistics(Statistics stats) {
* Abstract method to set min and max values from byte arrays.
* @param minBytes byte array to set the min value to
* @param maxBytes byte array to set the max value to
+   * @deprecated will be removed in 2.0.0. Use {@link 
#getBuilder(PrimitiveType)} instead.
*/
+  @Deprecated
   abstract public void setMinMaxFromBytes(byte[] minBytes, byte[] maxBytes);
 
   abstract public T genericGetMin();
@@ -221,7 +272,7 @@ public void incrementNumNulls(long increment) {
 
   /**
* Returns the null count
-   * @return null count
+   * @return null count or {@code -1} if the null count is not set
*/
   public long getNumNulls() {
 return num_nulls;
@@ -229,8 +280,12 @@ public long getNumNulls() {
 
   /**
* Sets the number of nulls to the parameter value
-   * @param nulls null count to set the count to
+   *
+   * @param nulls
+   *  null count to set the count to
+   * @deprecated will be removed in 2.0.0. Use {@link 
#getBuilder(PrimitiveType)} instead.
*/
+  @Deprecated
   public void setNumNulls(long nulls) {
 num_nulls = nulls;
   }
@@ -241,7 +296,7 @@ public void setNumNulls(long nulls) {
* @return true if object is empty, false otherwise
*/
   public boolean isEmpty() {
-return !hasNonNullValue && num_nulls == 0;
+return !hasNonNullValue && !isNumNullsSet();
   }
 
   /**
@@ -251,6 +306,13 @@ public boolean hasNonNullValue() {
 return hasNonNullValue;
   }
  
+  /**
+   * @return whether numNulls is set and can be used
+   */
+  public boolean isNumNullsSet() {
+return num_nulls >= 0;
+  }
+
   /**
* Sets the page/column as having a valid non-null value
* kind of misnomer here
diff --git 
a/parquet-column/src/test/java/org/apache/parquet/column/statistics/TestStatistics.java
 
b/parquet-column/src/test/java/org/apache/parquet/column/statistics/TestStatistics.java
index 128acb49f..cf4bf59af 100644
--- 
a/parquet-column/src/test/java/org/apache/parquet/column/statistics/TestStatistics.java
+++ 
b/parquet-column/src/test/java/org/apache/parquet/column/statistics/TestStatistics.java
@@ -37,6 +37,7 @@
   @Test
   public void testNumNulls() {
 IntStatistics stats = new IntStatistics();
+assertTrue(stats.isNumNullsSet());
 

[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449697#comment-16449697
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-383903265
 
 
   Hello @lukasnalezenec, have you had time to have a look? Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1280) [parquet-protobuf] Use maven protoc plugin

2018-04-24 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449692#comment-16449692
 ] 

Lukas Nalezenec commented on PARQUET-1280:
--

Good idea, we planned to use some maven protobuf plugin.

> [parquet-protobuf] Use maven protoc plugin
> --
>
> Key: PARQUET-1280
> URL: https://issues.apache.org/jira/browse/PARQUET-1280
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Qinghui Xu
>Priority: Minor
>
> Currently the build of parquet-protobuf requires protoc to be installed in 
> your environment. By using maven protoc plugin, we can have a build 
> independent of the environment (no need to install protoc), and more easy to 
> change the version of protoc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1281) Jackson dependency

2018-04-24 Thread Qinghui Xu (JIRA)
Qinghui Xu created PARQUET-1281:
---

 Summary: Jackson dependency
 Key: PARQUET-1281
 URL: https://issues.apache.org/jira/browse/PARQUET-1281
 Project: Parquet
  Issue Type: Improvement
Reporter: Qinghui Xu


Currently we shaded jackson in parquet-jackson module (org.codehaus.jackon --> 
shaded.parquet.org.codehaus.jackson), but in fact we do not use the shaded 
jackson in parquet-hadoop code. Is that a mistake? (see 
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ParquetMetadata.java#L26)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1280) [parquet-protobuf] Use maven protoc plugin

2018-04-24 Thread Qinghui Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qinghui Xu updated PARQUET-1280:

Issue Type: Improvement  (was: New Feature)

> [parquet-protobuf] Use maven protoc plugin
> --
>
> Key: PARQUET-1280
> URL: https://issues.apache.org/jira/browse/PARQUET-1280
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Qinghui Xu
>Priority: Minor
>
> Currently the build of parquet-protobuf requires protoc to be installed in 
> your environment. By using maven protoc plugin, we can have a build 
> independent of the environment (no need to install protoc), and more easy to 
> change the version of protoc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1280) [parquet-protobuf] Use maven protoc plugin

2018-04-24 Thread Qinghui Xu (JIRA)
Qinghui Xu created PARQUET-1280:
---

 Summary: [parquet-protobuf] Use maven protoc plugin
 Key: PARQUET-1280
 URL: https://issues.apache.org/jira/browse/PARQUET-1280
 Project: Parquet
  Issue Type: New Feature
Reporter: Qinghui Xu


Currently the build of parquet-protobuf requires protoc to be installed in your 
environment. By using maven protoc plugin, we can have a build independent of 
the environment (no need to install protoc), and more easy to change the 
version of protoc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1231) Not able to load the LocalFileSystem class

2018-04-24 Thread Qinghui Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449660#comment-16449660
 ] 

Qinghui Xu commented on PARQUET-1231:
-

It sounds like a classpath/packaging problem. The 
org.apache.hadoop.fs.LocalFileSystem is not in your runtime classpath. Try to 
add hadoop-common into your runtime classpath, this might solve your problem.
This is not a problem related to parquet.

> Not able to load the LocalFileSystem class
> --
>
> Key: PARQUET-1231
> URL: https://issues.apache.org/jira/browse/PARQUET-1231
> Project: Parquet
>  Issue Type: Bug
>Reporter: Persistent NGP
>Priority: Blocker
>
> When we are running the code for converting parquet file to csv locally on 
> eclipse, it runs successfully and convert the parque file to csv but when we 
> are running in our UI environment then it is failing saying 
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> Class org.apache.hadoop.fs.LocalFileSystem not found.
>  
> Please help us on it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)