[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet
[ https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451381#comment-16451381 ] ASF GitHub Bot commented on PARQUET-968: chawlakunal commented on issue #411: PARQUET-968 Add Hive/Presto support in ProtoParquet URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-384106975 When can this be expected to be merged to master and released? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Hive/Presto support in ProtoParquet > --- > > Key: PARQUET-968 > URL: https://issues.apache.org/jira/browse/PARQUET-968 > Project: Parquet > Issue Type: Task >Reporter: Constantin Muraru >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1253) Support for new logical type representation
[ https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450178#comment-16450178 ] ASF GitHub Bot commented on PARQUET-1253: - gszadovszky commented on a change in pull request #463: PARQUET-1253: Support for new logical type representation URL: https://github.com/apache/parquet-mr/pull/463#discussion_r183800256 ## File path: parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java ## @@ -77,63 +77,116 @@ static LogicalTypeAnnotation fromOriginalType(OriginalType originalType, Decimal } switch (originalType) { case UTF8: -return StringLogicalTypeAnnotation.create(); +return stringType(); case MAP: -return MapLogicalTypeAnnotation.create(); +return mapType(); case DECIMAL: int scale = (decimalMetadata == null ? 0 : decimalMetadata.getScale()); int precision = (decimalMetadata == null ? 0 : decimalMetadata.getPrecision()); -return DecimalLogicalTypeAnnotation.create(scale, precision); +return decimalType(scale, precision); case LIST: -return ListLogicalTypeAnnotation.create(); +return listType(); case DATE: -return DateLogicalTypeAnnotation.create(); +return dateType(); case INTERVAL: -return IntervalLogicalTypeAnnotation.create(); +return intervalType(); case TIMESTAMP_MILLIS: -return TimestampLogicalTypeAnnotation.create(true, LogicalTypeAnnotation.TimeUnit.MILLIS); +return timestampType(true, LogicalTypeAnnotation.TimeUnit.MILLIS); case TIMESTAMP_MICROS: -return TimestampLogicalTypeAnnotation.create(true, LogicalTypeAnnotation.TimeUnit.MICROS); +return timestampType(true, LogicalTypeAnnotation.TimeUnit.MICROS); case TIME_MILLIS: -return TimeLogicalTypeAnnotation.create(true, LogicalTypeAnnotation.TimeUnit.MILLIS); +return timeType(true, LogicalTypeAnnotation.TimeUnit.MILLIS); case TIME_MICROS: -return TimeLogicalTypeAnnotation.create(true, LogicalTypeAnnotation.TimeUnit.MICROS); +return timeType(true, LogicalTypeAnnotation.TimeUnit.MICROS); case UINT_8: -return IntLogicalTypeAnnotation.create(8, false); +return intType(8, false); case UINT_16: -return IntLogicalTypeAnnotation.create(16, false); +return intType(16, false); case UINT_32: -return IntLogicalTypeAnnotation.create(32, false); +return intType(32, false); case UINT_64: -return IntLogicalTypeAnnotation.create(64, false); +return intType(64, false); case INT_8: -return IntLogicalTypeAnnotation.create(8, true); +return intType(8, true); case INT_16: -return IntLogicalTypeAnnotation.create(16, true); +return intType(16, true); case INT_32: -return IntLogicalTypeAnnotation.create(32, true); +return intType(32, true); case INT_64: -return IntLogicalTypeAnnotation.create(64, true); +return intType(64, true); case ENUM: -return EnumLogicalTypeAnnotation.create(); +return enumType(); case JSON: -return JsonLogicalTypeAnnotation.create(); +return jsonType(); case BSON: -return BsonLogicalTypeAnnotation.create(); +return bsonType(); case MAP_KEY_VALUE: -return MapKeyValueTypeAnnotation.create(); +return mapKeyValueType(); default: throw new RuntimeException("Can't convert original type to logical type, unknown original type " + originalType); } } + + static StringLogicalTypeAnnotation stringType() { +return StringLogicalTypeAnnotation.INSTANCE; + } + + static MapLogicalTypeAnnotation mapType() { +return MapLogicalTypeAnnotation.INSTANCE; + } + + static ListLogicalTypeAnnotation listType() { +return ListLogicalTypeAnnotation.INSTANCE; + } + + static EnumLogicalTypeAnnotation enumType() { +return EnumLogicalTypeAnnotation.INSTANCE; + } + + static DecimalLogicalTypeAnnotation decimalType(final int scale, final int precision) { +return new DecimalLogicalTypeAnnotation(scale, precision); + } + + static DateLogicalTypeAnnotation dateType() { +return DateLogicalTypeAnnotation.INSTANCE; + } + + static TimeLogicalTypeAnnotation timeType(final boolean isAdjustedToUTC, final TimeUnit unit) { +return new TimeLogicalTypeAnnotation(isAdjustedToUTC, unit); + } + + static TimestampLogicalTypeAnnotation timestampType(final boolean isAdjustedToUTC, final TimeUnit unit) { +return new TimestampLogicalTypeAnnotation(isAdjustedToUTC, unit); + } + + static IntLogicalTypeAnnotation intType(final int bitWidth, final boolean isSigned) { +Preconditions.checkArgument( + bitWidth == 8 || bitWidth ==
[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet
[ https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450169#comment-16450169 ] ASF GitHub Bot commented on PARQUET-968: julienledem commented on issue #411: PARQUET-968 Add Hive/Presto support in ProtoParquet URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-383997195 This looks good. Thank you for this collaborative effort! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Hive/Presto support in ProtoParquet > --- > > Key: PARQUET-968 > URL: https://issues.apache.org/jira/browse/PARQUET-968 > Project: Parquet > Issue Type: Task >Reporter: Constantin Muraru >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1281) Jackson dependency
[ https://issues.apache.org/jira/browse/PARQUET-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450163#comment-16450163 ] Julien Le Dem commented on PARQUET-1281: parquet-hadoop should have its build include shading like parquet thrift: https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml#L174 > Jackson dependency > -- > > Key: PARQUET-1281 > URL: https://issues.apache.org/jira/browse/PARQUET-1281 > Project: Parquet > Issue Type: Improvement >Reporter: Qinghui Xu >Priority: Major > > Currently we shaded jackson in parquet-jackson module (org.codehaus.jackon > --> shaded.parquet.org.codehaus.jackson), but in fact we do not use the > shaded jackson in parquet-hadoop code. Is that a mistake? (see > https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ParquetMetadata.java#L26) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Parquet sync
Happening now: https://meet.google.com/esu-yiit-mun
[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet
[ https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450040#comment-16450040 ] ASF GitHub Bot commented on PARQUET-968: lukasnalezenec commented on issue #411: PARQUET-968 Add Hive/Presto support in ProtoParquet URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-383969694 Hi, I already did. There is one typo in comment and it is little bit harder to read - i wanted to check flow once more. I think we can commit it as it is. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Hive/Presto support in ProtoParquet > --- > > Key: PARQUET-968 > URL: https://issues.apache.org/jira/browse/PARQUET-968 > Project: Parquet > Issue Type: Task >Reporter: Constantin Muraru >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1246) Ignore float/double statistics in case of NaN
[ https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449841#comment-16449841 ] ASF GitHub Bot commented on PARQUET-1246: - gszadovszky opened a new pull request #468: PARQUET-1246: Ignore float/double statistics in case of NaN URL: https://github.com/apache/parquet-mr/pull/468 Because of the ambigous sorting order of float/double the following changes made at the reading path of the related statistics: - Ignoring statistics in case of it contains a NaN value. - Using -0.0 as min value and +0.0 as max value independently from which 0.0 value was saved in the statistics. Author: Gabor SzadovszkyCloses #461 from gszadovszky/PARQUET-1246 and squashes the following commits: 20e9332 [Gabor Szadovszky] PARQUET-1246: Changes according to zi's comments 3447938 [Gabor Szadovszky] PARQUET-1246: Ignore float/double statistics in case of NaN This change is based on 0a86429939075984edce5e3b8195dfb7f9e3ab6b but is not a clean cherry-pick. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Ignore float/double statistics in case of NaN > - > > Key: PARQUET-1246 > URL: https://issues.apache.org/jira/browse/PARQUET-1246 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.8.1 >Reporter: Gabor Szadovszky >Assignee: Gabor Szadovszky >Priority: Major > Fix For: 1.10.0 > > > The sorting order of the floating point values are not properly specified, > therefore NaN values can cause skipping valid values when filtering. See > PARQUET-1222 for more info. > This issue is for ignoring statistics for float/double if it contains NaN to > prevent data loss at the read path when filtering. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1217) Incorrect handling of missing values in Statistics
[ https://issues.apache.org/jira/browse/PARQUET-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky updated PARQUET-1217: -- Fix Version/s: 1.8.3 > Incorrect handling of missing values in Statistics > -- > > Key: PARQUET-1217 > URL: https://issues.apache.org/jira/browse/PARQUET-1217 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.9.0, 1.10.0 >Reporter: Gabor Szadovszky >Assignee: Gabor Szadovszky >Priority: Major > Fix For: 1.10.0, 1.8.3 > > > As per the parquet-format specs the min/max values in statistics are > optional. Therefore, it is possible to have {{numNulls}} in {{Statistics}} > while we don't have min/max values. In {{StatisticsFilter}} we rely on the > method > [StatisticsFilter.isAllNulls(ColumnChunkMetaData)|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java#L90] > to handle the case of {{null}} min/max values which is not correct due to > the described scenario. > We shall check {{Statistics.hasNonNullValue()}} any time before using the > actual min/max values. > In addition we don't check if the {{null_count}} is set or not when reading > from the parquet file. We simply use the value which is {{0}} in case of > unset. In the parquet-mr side the {{Statistics}} object uses the value {{0}} > to sign that the {{num_nulls}} is unset. It is incorrect if we are searching > for null values and we falsely drop a column chunk thinking there are no null > values but the field in the statistics was simply unset. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-852) Slowly ramp up sizes of byte[] in ByteBasedBitPackingEncoder
[ https://issues.apache.org/jira/browse/PARQUET-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449832#comment-16449832 ] ASF GitHub Bot commented on PARQUET-852: zivanfi closed pull request #467: Revert "PARQUET-852: Slowly ramp up sizes of byte[] in ByteBasedBitPackingEncoder" URL: https://github.com/apache/parquet-mr/pull/467 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/parquet-encoding/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBasedBitPackingEncoder.java b/parquet-encoding/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBasedBitPackingEncoder.java index 0bc8b3023..cc23e8f87 100644 --- a/parquet-encoding/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBasedBitPackingEncoder.java +++ b/parquet-encoding/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBasedBitPackingEncoder.java @@ -1,4 +1,4 @@ -/* +/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information @@ -6,9 +6,9 @@ * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at - * + * * http://www.apache.org/licenses/LICENSE-2.0 - * + * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -39,14 +39,11 @@ private static final Logger LOG = LoggerFactory.getLogger(ByteBasedBitPackingEncoder.class); private static final int VALUES_WRITTEN_AT_A_TIME = 8; - private static final int MAX_SLAB_SIZE_MULT = 64 * 1024; - private static final int INITIAL_SLAB_SIZE_MULT = 1024; private final int bitWidth; private final BytePacker packer; private final int[] input = new int[VALUES_WRITTEN_AT_A_TIME]; - private int slabSize; - private long totalFullSlabSize; + private final int slabSize; private int inputSize; private byte[] packed; private int packedPosition; @@ -59,9 +56,8 @@ public ByteBasedBitPackingEncoder(int bitWidth, Packer packer) { this.bitWidth = bitWidth; this.inputSize = 0; -this.totalFullSlabSize = 0; // must be a multiple of bitWidth -this.slabSize = (bitWidth == 0) ? 1 : (bitWidth * INITIAL_SLAB_SIZE_MULT); +this.slabSize = bitWidth * 64 * 1024; initPackedSlab(); this.packer = packer.newBytePacker(bitWidth); } @@ -79,10 +75,6 @@ public void writeInt(int value) throws IOException { pack(); if (packedPosition == slabSize) { slabs.add(BytesInput.from(packed)); -totalFullSlabSize += slabSize; -if (slabSize < bitWidth * MAX_SLAB_SIZE_MULT) { - slabSize *= 2; -} initPackedSlab(); } } @@ -107,7 +99,7 @@ private void initPackedSlab() { public BytesInput toBytes() throws IOException { int packedByteLength = packedPosition + BytesUtils.paddedByteCountFromBits(inputSize * bitWidth); -LOG.debug("writing {} bytes", (totalFullSlabSize + packedByteLength)); +LOG.debug("writing {} bytes", (slabs.size() * slabSize + packedByteLength)); if (inputSize > 0) { for (int i = inputSize; i < input.length; i++) { input[i] = 0; @@ -121,24 +113,18 @@ public BytesInput toBytes() throws IOException { * @return size of the data as it would be written */ public long getBufferSize() { -return BytesUtils.paddedByteCountFromBits((totalValues + inputSize) * bitWidth); +return BytesUtils.paddedByteCountFromBits(totalValues * bitWidth); } /** * @return total memory allocated */ public long getAllocatedSize() { -return totalFullSlabSize + packed.length + input.length * 4; +return (slabs.size() * slabSize) + packed.length + input.length * 4; } public String memUsageString(String prefix) { return String.format("%s ByteBitPacking %d slabs, %d bytes", prefix, slabs.size(), getAllocatedSize()); } - /** - * @return number of full slabs along with the current slab (debug aid) - */ - int getNumSlabs() { -return slabs.size() + 1; - } } diff --git a/parquet-encoding/src/test/java/org/apache/parquet/column/values/bitpacking/TestByteBasedBitPackingEncoder.java b/parquet-encoding/src/test/java/org/apache/parquet/column/values/bitpacking/TestByteBasedBitPackingEncoder.java index b49595b43..293b961f0 100644 --- a/parquet-encoding/src/test/java/org/apache/parquet/column/values/bitpacking/TestByteBasedBitPackingEncoder.java +++
[jira] [Commented] (PARQUET-1217) Incorrect handling of missing values in Statistics
[ https://issues.apache.org/jira/browse/PARQUET-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449830#comment-16449830 ] ASF GitHub Bot commented on PARQUET-1217: - zivanfi closed pull request #465: PARQUET-1217: Incorrect handling of missing values in Statistics URL: https://github.com/apache/parquet-mr/pull/465 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java b/parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java index 30153c074..26c14c135 100644 --- a/parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java +++ b/parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java @@ -31,6 +31,44 @@ */ public abstract class Statistics> { + /** + * Builder class to build Statistics objects. Used to read the statistics from the Parquet file. + */ + public static class Builder { +private final PrimitiveTypeName type; +private byte[] min; +private byte[] max; +private long numNulls = -1; + +private Builder(PrimitiveTypeName type) { + this.type = type; +} + +public Builder withMin(byte[] min) { + this.min = min; + return this; +} + +public Builder withMax(byte[] max) { + this.max = max; + return this; +} + +public Builder withNumNulls(long numNulls) { + this.numNulls = numNulls; + return this; +} + +public Statistics build() { + Statistics stats = getStatsBasedOnType(type); + if (min != null && max != null) { +stats.setMinMaxFromBytes(min, max); + } + stats.num_nulls = this.numNulls; + return stats; +} + } + private boolean hasNonNullValue; private long num_nulls; @@ -67,6 +105,17 @@ public static Statistics getStatsBasedOnType(PrimitiveTypeName type) { } } + /** + * Returns a builder to create new statistics object. Used to read the statistics from the parquet file. + * + * @param type + * type of the column + * @return builder to create new statistics object + */ + public static Builder getBuilder(PrimitiveTypeName type) { +return new Builder(type); + } + /** * updates statistics min and max using the passed value * @param value value to use to update min and max @@ -172,7 +221,9 @@ public void mergeStatistics(Statistics stats) { * Abstract method to set min and max values from byte arrays. * @param minBytes byte array to set the min value to * @param maxBytes byte array to set the max value to + * @deprecated will be removed in 2.0.0. Use {@link #getBuilder(PrimitiveType)} instead. */ + @Deprecated abstract public void setMinMaxFromBytes(byte[] minBytes, byte[] maxBytes); abstract public T genericGetMin(); @@ -221,7 +272,7 @@ public void incrementNumNulls(long increment) { /** * Returns the null count - * @return null count + * @return null count or {@code -1} if the null count is not set */ public long getNumNulls() { return num_nulls; @@ -229,8 +280,12 @@ public long getNumNulls() { /** * Sets the number of nulls to the parameter value - * @param nulls null count to set the count to + * + * @param nulls + * null count to set the count to + * @deprecated will be removed in 2.0.0. Use {@link #getBuilder(PrimitiveType)} instead. */ + @Deprecated public void setNumNulls(long nulls) { num_nulls = nulls; } @@ -241,7 +296,7 @@ public void setNumNulls(long nulls) { * @return true if object is empty, false otherwise */ public boolean isEmpty() { -return !hasNonNullValue && num_nulls == 0; +return !hasNonNullValue && !isNumNullsSet(); } /** @@ -251,6 +306,13 @@ public boolean hasNonNullValue() { return hasNonNullValue; } + /** + * @return whether numNulls is set and can be used + */ + public boolean isNumNullsSet() { +return num_nulls >= 0; + } + /** * Sets the page/column as having a valid non-null value * kind of misnomer here diff --git a/parquet-column/src/test/java/org/apache/parquet/column/statistics/TestStatistics.java b/parquet-column/src/test/java/org/apache/parquet/column/statistics/TestStatistics.java index 128acb49f..cf4bf59af 100644 --- a/parquet-column/src/test/java/org/apache/parquet/column/statistics/TestStatistics.java +++ b/parquet-column/src/test/java/org/apache/parquet/column/statistics/TestStatistics.java @@ -37,6 +37,7 @@ @Test public void testNumNulls() { IntStatistics stats = new IntStatistics(); +assertTrue(stats.isNumNullsSet());
[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet
[ https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449697#comment-16449697 ] ASF GitHub Bot commented on PARQUET-968: BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in ProtoParquet URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-383903265 Hello @lukasnalezenec, have you had time to have a look? Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Hive/Presto support in ProtoParquet > --- > > Key: PARQUET-968 > URL: https://issues.apache.org/jira/browse/PARQUET-968 > Project: Parquet > Issue Type: Task >Reporter: Constantin Muraru >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1280) [parquet-protobuf] Use maven protoc plugin
[ https://issues.apache.org/jira/browse/PARQUET-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449692#comment-16449692 ] Lukas Nalezenec commented on PARQUET-1280: -- Good idea, we planned to use some maven protobuf plugin. > [parquet-protobuf] Use maven protoc plugin > -- > > Key: PARQUET-1280 > URL: https://issues.apache.org/jira/browse/PARQUET-1280 > Project: Parquet > Issue Type: Improvement >Reporter: Qinghui Xu >Priority: Minor > > Currently the build of parquet-protobuf requires protoc to be installed in > your environment. By using maven protoc plugin, we can have a build > independent of the environment (no need to install protoc), and more easy to > change the version of protoc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1281) Jackson dependency
Qinghui Xu created PARQUET-1281: --- Summary: Jackson dependency Key: PARQUET-1281 URL: https://issues.apache.org/jira/browse/PARQUET-1281 Project: Parquet Issue Type: Improvement Reporter: Qinghui Xu Currently we shaded jackson in parquet-jackson module (org.codehaus.jackon --> shaded.parquet.org.codehaus.jackson), but in fact we do not use the shaded jackson in parquet-hadoop code. Is that a mistake? (see https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ParquetMetadata.java#L26) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1280) [parquet-protobuf] Use maven protoc plugin
[ https://issues.apache.org/jira/browse/PARQUET-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qinghui Xu updated PARQUET-1280: Issue Type: Improvement (was: New Feature) > [parquet-protobuf] Use maven protoc plugin > -- > > Key: PARQUET-1280 > URL: https://issues.apache.org/jira/browse/PARQUET-1280 > Project: Parquet > Issue Type: Improvement >Reporter: Qinghui Xu >Priority: Minor > > Currently the build of parquet-protobuf requires protoc to be installed in > your environment. By using maven protoc plugin, we can have a build > independent of the environment (no need to install protoc), and more easy to > change the version of protoc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1280) [parquet-protobuf] Use maven protoc plugin
Qinghui Xu created PARQUET-1280: --- Summary: [parquet-protobuf] Use maven protoc plugin Key: PARQUET-1280 URL: https://issues.apache.org/jira/browse/PARQUET-1280 Project: Parquet Issue Type: New Feature Reporter: Qinghui Xu Currently the build of parquet-protobuf requires protoc to be installed in your environment. By using maven protoc plugin, we can have a build independent of the environment (no need to install protoc), and more easy to change the version of protoc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1231) Not able to load the LocalFileSystem class
[ https://issues.apache.org/jira/browse/PARQUET-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449660#comment-16449660 ] Qinghui Xu commented on PARQUET-1231: - It sounds like a classpath/packaging problem. The org.apache.hadoop.fs.LocalFileSystem is not in your runtime classpath. Try to add hadoop-common into your runtime classpath, this might solve your problem. This is not a problem related to parquet. > Not able to load the LocalFileSystem class > -- > > Key: PARQUET-1231 > URL: https://issues.apache.org/jira/browse/PARQUET-1231 > Project: Parquet > Issue Type: Bug >Reporter: Persistent NGP >Priority: Blocker > > When we are running the code for converting parquet file to csv locally on > eclipse, it runs successfully and convert the parque file to csv but when we > are running in our UI environment then it is failing saying > Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: > Class org.apache.hadoop.fs.LocalFileSystem not found. > > Please help us on it -- This message was sent by Atlassian JIRA (v7.6.3#76005)