date:20230131

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-01-31 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682895#comment-17682895
 ] 

ASF GitHub Bot commented on PARQUET-2159:
-

wgtmac commented on PR #1011:
URL: https://github.com/apache/parquet-mr/pull/1011#issuecomment-1411584288

   > > > Sorry for the delay. I have left some comments and the implementation 
is overall looking good. Thanks @jiangjiguang for your effort!
   > > > My main concern is the extensibility to support other instruction 
sets. In addition, it seems to me that the java vector api is still incubating. 
As I am not a java expert, do we have the risk of unstable API?
   > > 
   > > 
   > > @wgtmac Jatin is a java expert， @jatin-bhateja Can you help give an 
answer? thanks.
   > 
   > Hi @wgtmac , our patch vectorizes unpacking algorithm for various decode 
bit sizes, entire new functionality is exposed through a plugin interface 
**ParquetReadRouter**, in order to prevent any performance regressions over 
other targets we have enabled the new functionality only for X86 targets with 
valid features, this limitation can be removed over time.
   > 
   > VectorAPI made its appearance in JDK16 and has been maturing since then 
with each successive release. I do not have a firm timeline for you at this 
point on its incubation exit and being exposed as a preview feature. Intent 
here is to enable parquet-mr community developers to make use of the plugin in 
parquet reader and provide us with early feedback, we are also in process of 
vectorizing packer algorithm.
   > 
   > Being a large project we plan to do this incrementally, we seek your 
guidance in pushing this patch through either on master or a separate 
development branch.
   
   Thanks for your explanation @jatin-bhateja! 
   
   So when vector API is finalized in the future java release, we may need to 
change the VM options to enable it accordingly.
   
   BTW, I may not be able to verify the generated code line by line. Please 
advice the best practice to test them according to your experience. Thanks 
@jatin-bhateja  




> Parquet bit-packing de/encode optimization
> --
>
> Key: PARQUET-2159
> URL: https://issues.apache.org/jira/browse/PARQUET-2159
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Fang-Xie
>Assignee: Fang-Xie
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: image-2022-06-15-22-56-08-396.png, 
> image-2022-06-15-22-57-15-964.png, image-2022-06-15-22-58-01-442.png, 
> image-2022-06-15-22-58-40-704.png
>
>
> Current Spark use Parquet-mr as parquet reader/writer library, but the 
> built-in bit-packing en/decode is not efficient enough. 
> Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector 
> in Open JDK18 brings prominent performance improvement.
> Due to Vector API is added to OpenJDK since 16, So this optimization request 
> JDK16 or higher.
> *Below are our test results*
> Functional test is based on open-source parquet-mr Bit-pack decoding 
> function: *_public final void unpack8Values(final byte[] in, final int inPos, 
> final int[] out, final int outPos)_* __
> compared with our implementation with vector API *_public final void 
> unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final 
> int outPos)_*
> We tested 10 pairs (open source parquet bit unpacking vs ours optimized 
> vectorized SIMD implementation) decode function with bit 
> width=\{1,2,3,4,5,6,7,8,9,10}, below are test results:
> !image-2022-06-15-22-56-08-396.png|width=437,height=223!
> We integrated our bit-packing decode implementation into parquet-mr, tested 
> the parquet batch reader ability from Spark VectorizedParquetRecordReader 
> which get parquet column data by the batch way. We construct parquet file 
> with different row count and column count, the column data type is Int32, the 
> maximum int value is 127 which satisfies bit pack encode with bit width=7,   
> the count of the row is from 10k to 100 million and the count of the column 
> is from 1 to 4.
> !image-2022-06-15-22-57-15-964.png|width=453,height=229!
> !image-2022-06-15-22-58-01-442.png|width=439,height=217!
> !image-2022-06-15-22-58-40-704.png|width=415,height=208!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [parquet-mr] wgtmac commented on pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-01-31 Thread via GitHub



wgtmac commented on PR #1011:
URL: https://github.com/apache/parquet-mr/pull/1011#issuecomment-1411584288

   > > > Sorry for the delay. I have left some comments and the implementation 
is overall looking good. Thanks @jiangjiguang for your effort!
   > > > My main concern is the extensibility to support other instruction 
sets. In addition, it seems to me that the java vector api is still incubating. 
As I am not a java expert, do we have the risk of unstable API?
   > > 
   > > 
   > > @wgtmac Jatin is a java expert， @jatin-bhateja Can you help give an 
answer? thanks.
   > 
   > Hi @wgtmac , our patch vectorizes unpacking algorithm for various decode 
bit sizes, entire new functionality is exposed through a plugin interface 
**ParquetReadRouter**, in order to prevent any performance regressions over 
other targets we have enabled the new functionality only for X86 targets with 
valid features, this limitation can be removed over time.
   > 
   > VectorAPI made its appearance in JDK16 and has been maturing since then 
with each successive release. I do not have a firm timeline for you at this 
point on its incubation exit and being exposed as a preview feature. Intent 
here is to enable parquet-mr community developers to make use of the plugin in 
parquet reader and provide us with early feedback, we are also in process of 
vectorizing packer algorithm.
   > 
   > Being a large project we plan to do this incrementally, we seek your 
guidance in pushing this patch through either on master or a separate 
development branch.
   
   Thanks for your explanation @jatin-bhateja! 
   
   So when vector API is finalized in the future java release, we may need to 
change the VM options to enable it accordingly.
   
   BTW, I may not be able to verify the generated code line by line. Please 
advice the best practice to test them according to your experience. Thanks 
@jatin-bhateja  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-01-31 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682892#comment-17682892
 ] 

ASF GitHub Bot commented on PARQUET-2159:
-

wgtmac commented on code in PR #1011:
URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1092845969


##
parquet-generator/src/main/resources/ByteBitPacking512VectorLE:
##
@@ -0,0 +1,3095 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.column.values.bitpacking;
+
+import jdk.incubator.vector.ByteVector;
+import jdk.incubator.vector.IntVector;
+import jdk.incubator.vector.LongVector;
+import jdk.incubator.vector.ShortVector;
+import jdk.incubator.vector.Vector;
+import jdk.incubator.vector.VectorMask;
+import jdk.incubator.vector.VectorOperators;
+import jdk.incubator.vector.VectorShuffle;
+import jdk.incubator.vector.VectorSpecies;
+
+import java.nio.ByteBuffer;
+
+/**
+ * This is an auto-generated source file and should not edit it directly.
+ */
+public abstract class ByteBitPacking512VectorLE {

Review Comment:
   Do you have any script to generate the code here? If true, it would be great 
to commit it as well.



##
parquet-benchmarks/src/main/java/org/apache/parquet/benchmarks/ByteBitPackingVectorBenchmarks.java:
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.benchmarks;
+
+import org.apache.parquet.column.values.bitpacking.BytePacker;
+import org.apache.parquet.column.values.bitpacking.Packer;
+import org.openjdk.jmh.annotations.*;
+
+import java.util.concurrent.TimeUnit;
+
+/**
+ * This class uses the java17 vector API, add VM options 
--add-modules=jdk.incubator.vector
+ */
+
+@State(Scope.Benchmark)
+@BenchmarkMode(Mode.AverageTime)
+@Warmup(iterations = 1, batchSize = 10)
+@Measurement(iterations = 1, batchSize = 10)
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+public class ByteBitPackingVectorBenchmarks {
+
+  /**
+   * The range of bitWidth is 1 ~ 32, change it directly if test other 
bitWidth.
+   */
+  private static final int bitWidth = 7;
+  private static final int outputValues = 1024;
+  private final byte[] input = new byte[outputValues * bitWidth / 8];
+  private final int[] output = new int[outputValues];
+  private final int[] outputVector = new int[outputValues];
+
+  @Setup(Level.Trial)
+  public void getInputBytes() {
+for (int i = 0; i < input.length; i++) {
+  input[i] = (byte) i;
+}
+  }
+
+  @Benchmark
+  public void testUnpack() {
+BytePacker bytePacker = Packer.LITTLE_ENDIAN.newBytePacker(bitWidth);
+for (int i = 0, j = 0; i < input.length; i += bitWidth, j += 8) {
+  bytePacker.unpack8Values(input, i, output, j);
+}
+  }
+
+  @Benchmark
+  public void testUnpackVector() {
+BytePacker bytePacker = Packer.LITTLE_ENDIAN.newBytePacker(bitWidth);
+BytePacker bytePackerVector = 
Packer.LITTLE_ENDIAN.newBytePackerVector(bitWidth);

Review Comment:
   Could you elaborate more? @jatin-bhateja 





> Parquet bit-packing de/encode optimization
> --
>
> Key: PARQUET-2159
> URL: https://issues.apache.org/jira/browse/PARQUET-2159
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-01-31 Thread via GitHub



wgtmac commented on code in PR #1011:
URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1092845969


##
parquet-generator/src/main/resources/ByteBitPacking512VectorLE:
##
@@ -0,0 +1,3095 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.column.values.bitpacking;
+
+import jdk.incubator.vector.ByteVector;
+import jdk.incubator.vector.IntVector;
+import jdk.incubator.vector.LongVector;
+import jdk.incubator.vector.ShortVector;
+import jdk.incubator.vector.Vector;
+import jdk.incubator.vector.VectorMask;
+import jdk.incubator.vector.VectorOperators;
+import jdk.incubator.vector.VectorShuffle;
+import jdk.incubator.vector.VectorSpecies;
+
+import java.nio.ByteBuffer;
+
+/**
+ * This is an auto-generated source file and should not edit it directly.
+ */
+public abstract class ByteBitPacking512VectorLE {

Review Comment:
   Do you have any script to generate the code here? If true, it would be great 
to commit it as well.



##
parquet-benchmarks/src/main/java/org/apache/parquet/benchmarks/ByteBitPackingVectorBenchmarks.java:
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.benchmarks;
+
+import org.apache.parquet.column.values.bitpacking.BytePacker;
+import org.apache.parquet.column.values.bitpacking.Packer;
+import org.openjdk.jmh.annotations.*;
+
+import java.util.concurrent.TimeUnit;
+
+/**
+ * This class uses the java17 vector API, add VM options 
--add-modules=jdk.incubator.vector
+ */
+
+@State(Scope.Benchmark)
+@BenchmarkMode(Mode.AverageTime)
+@Warmup(iterations = 1, batchSize = 10)
+@Measurement(iterations = 1, batchSize = 10)
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+public class ByteBitPackingVectorBenchmarks {
+
+  /**
+   * The range of bitWidth is 1 ~ 32, change it directly if test other 
bitWidth.
+   */
+  private static final int bitWidth = 7;
+  private static final int outputValues = 1024;
+  private final byte[] input = new byte[outputValues * bitWidth / 8];
+  private final int[] output = new int[outputValues];
+  private final int[] outputVector = new int[outputValues];
+
+  @Setup(Level.Trial)
+  public void getInputBytes() {
+for (int i = 0; i < input.length; i++) {
+  input[i] = (byte) i;
+}
+  }
+
+  @Benchmark
+  public void testUnpack() {
+BytePacker bytePacker = Packer.LITTLE_ENDIAN.newBytePacker(bitWidth);
+for (int i = 0, j = 0; i < input.length; i += bitWidth, j += 8) {
+  bytePacker.unpack8Values(input, i, output, j);
+}
+  }
+
+  @Benchmark
+  public void testUnpackVector() {
+BytePacker bytePacker = Packer.LITTLE_ENDIAN.newBytePacker(bitWidth);
+BytePacker bytePackerVector = 
Packer.LITTLE_ENDIAN.newBytePackerVector(bitWidth);

Review Comment:
   Could you elaborate more? @jatin-bhateja 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (PARQUET-2149) Implement async IO for Parquet file reader

2023-01-31 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682807#comment-17682807
 ] 

ASF GitHub Bot commented on PARQUET-2149:
-

wgtmac commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1411338630

   > @kazuyukitanimura @steveloughran @kbendick @ggershinsky @wgtmac 
@theosib-amazon Do you still have comments?
   
   It looks good to me. Please feel free to merge as you see fit. @shangxinli 




> Implement async IO for Parquet file reader
> --
>
> Key: PARQUET-2149
> URL: https://issues.apache.org/jira/browse/PARQUET-2149
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Parth Chandra
>Priority: Major
>
> ParquetFileReader's implementation has the following flow (simplified) - 
>       - For every column -> Read from storage in 8MB blocks -> Read all 
> uncompressed pages into output queue 
>       - From output queues -> (downstream ) decompression + decoding
> This flow is serialized, which means that downstream threads are blocked 
> until the data has been read. Because a large part of the time spent is 
> waiting for data from storage, threads are idle and CPU utilization is really 
> low.
> There is no reason why this cannot be made asynchronous _and_ parallel. So 
> For Column _i_ -> reading one chunk until end, from storage -> intermediate 
> output queue -> read one uncompressed page until end -> output queue -> 
> (downstream ) decompression + decoding
> Note that this can be made completely self contained in ParquetFileReader and 
> downstream implementations like Iceberg and Spark will automatically be able 
> to take advantage without code change as long as the ParquetFileReader apis 
> are not changed. 
> In past work with async io  [Drill - async page reader 
> |https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java]
>  , I have seen 2x-3x improvement in reading speed for Parquet files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [parquet-mr] wgtmac commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2023-01-31 Thread via GitHub



wgtmac commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1411338630

   > @kazuyukitanimura @steveloughran @kbendick @ggershinsky @wgtmac 
@theosib-amazon Do you still have comments?
   
   It looks good to me. Please feel free to merge as you see fit. @shangxinli 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (PARQUET-2173) Fix parquet build against hadoop 3.3.3+

2023-01-31 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682805#comment-17682805
 ] 

ASF GitHub Bot commented on PARQUET-2173:
-

wgtmac commented on PR #985:
URL: https://github.com/apache/parquet-mr/pull/985#issuecomment-1411336266

   It looks good to me but I don't have the privilege to merge. 
   
   May I request your help? @ggershinsky @shangxinli @gszadovszky 




> Fix parquet build against hadoop 3.3.3+
> ---
>
> Key: PARQUET-2173
> URL: https://issues.apache.org/jira/browse/PARQUET-2173
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Affects Versions: 1.13.0
>Reporter: Steve Loughran
>Priority: Major
>
> parquet won't build against hadoop 3.3.3+ because it swapped out log4j 1.17 
> for reload4j, and this creates maven dependency problems in parquet cli
> {code}
> [INFO] --- maven-dependency-plugin:3.1.1:analyze-only (default) @ parquet-cli 
> ---
> [WARNING] Used undeclared dependencies found:
> [WARNING]ch.qos.reload4j:reload4j:jar:1.2.22:provided
> {code}
> the hadoop common dependencies need to exclude this jar and any changed slf4j 
> ones.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [parquet-mr] wgtmac commented on pull request #985: PARQUET-2173. Fix parquet build against hadoop 3.3.3+

2023-01-31 Thread via GitHub



wgtmac commented on PR #985:
URL: https://github.com/apache/parquet-mr/pull/985#issuecomment-1411336266

   It looks good to me but I don't have the privilege to merge. 
   
   May I request your help? @ggershinsky @shangxinli @gszadovszky 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (PARQUET-758) [Format] HALF precision FLOAT Logical type

2023-01-31 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682674#comment-17682674
 ] 

ASF GitHub Bot commented on PARQUET-758:


pitrou commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r1092267094


##
src/main/thrift/parquet.thrift:
##
@@ -232,6 +232,7 @@ struct MapType {} // see LogicalTypes.md
 struct ListType {}// see LogicalTypes.md
 struct EnumType {}// allowed for BINARY, must be encoded with UTF-8
 struct DateType {}// allowed for INT32
+struct Float16Type {} // allowed for FIXED[2], must encoded raw FLOAT16 bytes

Review Comment:
   Well, I guess it wouldn't cost much to allow it (implementations would not 
support it at the start anyway).





> [Format] HALF precision FLOAT Logical type
> --
>
> Key: PARQUET-758
> URL: https://issues.apache.org/jira/browse/PARQUET-758
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Julien Le Dem
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

2023-01-31 Thread via GitHub



pitrou commented on code in PR #184:
URL: https://github.com/apache/parquet-format/pull/184#discussion_r1092267094


##
src/main/thrift/parquet.thrift:
##
@@ -232,6 +232,7 @@ struct MapType {} // see LogicalTypes.md
 struct ListType {}// see LogicalTypes.md
 struct EnumType {}// allowed for BINARY, must be encoded with UTF-8
 struct DateType {}// allowed for INT32
+struct Float16Type {} // allowed for FIXED[2], must encoded raw FLOAT16 bytes

Review Comment:
   Well, I guess it wouldn't cost much to allow it (implementations would not 
support it at the start anyway).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (PARQUET-2173) Fix parquet build against hadoop 3.3.3+

2023-01-31 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682474#comment-17682474
 ] 

ASF GitHub Bot commented on PARQUET-2173:
-

steveloughran commented on PR #985:
URL: https://github.com/apache/parquet-mr/pull/985#issuecomment-1410083313

   any plans to merge now?




> Fix parquet build against hadoop 3.3.3+
> ---
>
> Key: PARQUET-2173
> URL: https://issues.apache.org/jira/browse/PARQUET-2173
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Affects Versions: 1.13.0
>Reporter: Steve Loughran
>Priority: Major
>
> parquet won't build against hadoop 3.3.3+ because it swapped out log4j 1.17 
> for reload4j, and this creates maven dependency problems in parquet cli
> {code}
> [INFO] --- maven-dependency-plugin:3.1.1:analyze-only (default) @ parquet-cli 
> ---
> [WARNING] Used undeclared dependencies found:
> [WARNING]ch.qos.reload4j:reload4j:jar:1.2.22:provided
> {code}
> the hadoop common dependencies need to exclude this jar and any changed slf4j 
> ones.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [parquet-mr] steveloughran commented on pull request #985: PARQUET-2173. Fix parquet build against hadoop 3.3.3+

2023-01-31 Thread via GitHub



steveloughran commented on PR #985:
URL: https://github.com/apache/parquet-mr/pull/985#issuecomment-1410083313

   any plans to merge now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

[GitHub] [parquet-mr] wgtmac commented on pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

[jira] [Commented] (PARQUET-2149) Implement async IO for Parquet file reader

[GitHub] [parquet-mr] wgtmac commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

[jira] [Commented] (PARQUET-2173) Fix parquet build against hadoop 3.3.3+

[GitHub] [parquet-mr] wgtmac commented on pull request #985: PARQUET-2173. Fix parquet build against hadoop 3.3.3+

[jira] [Commented] (PARQUET-758) [Format] HALF precision FLOAT Logical type

[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

[jira] [Commented] (PARQUET-2173) Fix parquet build against hadoop 3.3.3+

[GitHub] [parquet-mr] steveloughran commented on pull request #985: PARQUET-2173. Fix parquet build against hadoop 3.3.3+

12 matches

Site Navigation

Mail list logo

Footer information