[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-30 Thread GitBox
emkornfield commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1232420353 > t is not that trivial. For the half-precision floating point numbers we do not have native support for either cpp or java so we can define the total ordering as we want. But

[GitHub] [parquet-format] gszadovszky commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-30 Thread GitBox
gszadovszky commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231323733 > > It would not be too easy to implement the half-precision floating point comparison logic since java does not have such a primitive type. > > While not effortless, it

[GitHub] [parquet-format] pitrou commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-30 Thread GitBox
pitrou commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231300374 > It would not be too easy to implement the half-precision floating point comparison logic since java does not have such a primitive type. While not effortless, it should be

[GitHub] [parquet-format] gszadovszky commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-30 Thread GitBox
gszadovszky commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231284535 > It isn't clear to me if this should be a logical type or a physical type. We would need understand if there is different handling for forward compatibility purposes (what do

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-30 Thread GitBox
emkornfield commented on code in PR #184: URL: https://github.com/apache/parquet-format/pull/184#discussion_r958042831 ## src/main/thrift/parquet.thrift: ## @@ -232,6 +232,7 @@ struct MapType {} // see LogicalTypes.md struct ListType {}// see LogicalTypes.md struct

[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-30 Thread GitBox
emkornfield commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231187345 It isn't clear to me if this should be a logical type or a physical type. We would need understand if there is different handling for forward compatibility purposes (what do we

[GitHub] [parquet-format] emkornfield commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-30 Thread GitBox
emkornfield commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1231185256 We should probably specify that using the [Byte Split Encodings](https://github.com/apache/parquet-format/blob/master/Encodings.md#byte-stream-split-byte_stream_split--9) can be

[GitHub] [parquet-format] anjakefala commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-29 Thread GitBox
anjakefala commented on code in PR #184: URL: https://github.com/apache/parquet-format/pull/184#discussion_r957822642 ## src/main/thrift/parquet.thrift: ## @@ -342,6 +343,7 @@ union LogicalType { 12: JsonType JSON // use ConvertedType JSON 13: BsonType BSON

[GitHub] [parquet-format] pitrou commented on pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-29 Thread GitBox
pitrou commented on PR #184: URL: https://github.com/apache/parquet-format/pull/184#issuecomment-1229983463 @anjakefala You need to add to the `LogicalType` union, not to the `Type` enum (which is for physical types). Also cc @emkornfield -- This is an automated message from the

[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-29 Thread GitBox
pitrou commented on code in PR #184: URL: https://github.com/apache/parquet-format/pull/184#discussion_r957065065 ## src/main/thrift/parquet.thrift: ## @@ -889,6 +891,7 @@ union ColumnOrder { * INT32 - signed comparison * INT64 - signed comparison * INT96

[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-29 Thread GitBox
pitrou commented on code in PR #184: URL: https://github.com/apache/parquet-format/pull/184#discussion_r957064862 ## src/main/thrift/parquet.thrift: ## @@ -416,6 +417,7 @@ enum Encoding { * BOOLEAN - 1 bit per value. 0 is false; 1 is true. * INT32 - 4 bytes per value.

[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-29 Thread GitBox
pitrou commented on code in PR #184: URL: https://github.com/apache/parquet-format/pull/184#discussion_r957064370 ## src/main/thrift/parquet.thrift: ## @@ -34,6 +34,7 @@ enum Type { INT32 = 1; INT64 = 2; INT96 = 3; // deprecated, only used by legacy implementations. +

[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-29 Thread GitBox
pitrou commented on code in PR #184: URL: https://github.com/apache/parquet-format/pull/184#discussion_r957063147 ## LogicalTypes.md: ## @@ -245,6 +245,18 @@ comparison. To support compatibility with older readers, implementations of parquet-format should write `DecimalType`

[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-29 Thread GitBox
pitrou commented on code in PR #184: URL: https://github.com/apache/parquet-format/pull/184#discussion_r957060962 ## LogicalTypes.md: ## @@ -245,6 +245,18 @@ comparison. To support compatibility with older readers, implementations of parquet-format should write `DecimalType`

[GitHub] [parquet-format] pitrou commented on a diff in pull request #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-29 Thread GitBox
pitrou commented on code in PR #184: URL: https://github.com/apache/parquet-format/pull/184#discussion_r957060962 ## LogicalTypes.md: ## @@ -245,6 +245,18 @@ comparison. To support compatibility with older readers, implementations of parquet-format should write `DecimalType`

[GitHub] [parquet-mr] matthieun commented on pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-08-26 Thread GitBox
matthieun commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1229042610 @shangxinli Let me know if this is good to merge! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [parquet-format] anjakefala opened a new pull request, #184: PARQUET-758: Add Float16/Half-float logical type

2022-08-26 Thread GitBox
anjakefala opened a new pull request, #184: URL: https://github.com/apache/parquet-format/pull/184 Make sure you have checked _all_ steps below. ### Jira - [X] My PR addresses the following [Parquet Jira 1](https://issues.apache.org/jira/browse/PARQUET-758) and

[GitHub] [parquet-mr] sekikn opened a new pull request, #991: PARQUET-2177: Fix parquet-cli not to fail showing descriptions

2022-08-25 Thread GitBox
sekikn opened a new pull request, #991: URL: https://github.com/apache/parquet-mr/pull/991 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in

[GitHub] [parquet-mr] sekikn opened a new pull request, #990: PARQUET-2142: Update the parquet-cli document to avoid NoSuchMethodError

2022-08-25 Thread GitBox
sekikn opened a new pull request, #990: URL: https://github.com/apache/parquet-mr/pull/990 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in

[GitHub] [parquet-mr] NickCrews commented on pull request #433: PARQUET-1115: Warn users when misusing parquet-tools merge

2022-08-24 Thread GitBox
NickCrews commented on PR #433: URL: https://github.com/apache/parquet-mr/pull/433#issuecomment-1226667307 It might be nice if we actually suggested an alternative instead of just saying "don't do this." You can see my solution at

[GitHub] [parquet-mr] zhongyujiang commented on a diff in pull request #982: PARQUET-2160: Close ZstdInputStream to free off-heap memory in time.

2022-08-24 Thread GitBox
zhongyujiang commented on code in PR #982: URL: https://github.com/apache/parquet-mr/pull/982#discussion_r953866331 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java: ## @@ -109,7 +110,17 @@ public BytesInput decompress(BytesInput bytes, int

[GitHub] [parquet-mr] patchwork01 opened a new pull request, #989: PARQUET-2176: Column index/statistics truncation in ParquetWriter

2022-08-24 Thread GitBox
patchwork01 opened a new pull request, #989: URL: https://github.com/apache/parquet-mr/pull/989 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them

[GitHub] [parquet-mr] matthieun commented on a diff in pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-08-23 Thread GitBox
matthieun commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r952835674 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { }

[GitHub] [parquet-mr] shangxinli merged pull request #986: Prevent IntelliJ from making unsolicited whitespace changes

2022-08-23 Thread GitBox
shangxinli merged PR #986: URL: https://github.com/apache/parquet-mr/pull/986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-08-23 Thread GitBox
shangxinli commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r952778132 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { }

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-08-23 Thread GitBox
shangxinli commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r952778132 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { }

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-08-23 Thread GitBox
shangxinli commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r952764513 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { }

[GitHub] [parquet-mr] matthieun commented on a diff in pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-08-22 Thread GitBox
matthieun commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r951648555 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { }

[GitHub] [parquet-mr] matthieun commented on a diff in pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-08-22 Thread GitBox
matthieun commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r951644453 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { }

[GitHub] [parquet-mr] steveloughran commented on a diff in pull request #985: PARQUET-2173. Fix parquet build against hadoop 3.3.3+

2022-08-22 Thread GitBox
steveloughran commented on code in PR #985: URL: https://github.com/apache/parquet-mr/pull/985#discussion_r951589888 ## pom.xml: ## @@ -160,7 +160,11 @@ org.slf4j -slf4j-log4j12 +* Review Comment: it means that

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-21 Thread GitBox
shangxinli commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r950908609 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -88,6 +136,21 @@ public long skip(long n) { return bytesToSkip;

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-21 Thread GitBox
shangxinli commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r950908296 ## parquet-common/src/main/java/org/apache/parquet/bytes/MultiBufferInputStream.java: ## @@ -379,4 +427,120 @@ public void remove() { second.remove(); }

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-21 Thread GitBox
shangxinli commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r950908215 ## parquet-common/src/main/java/org/apache/parquet/bytes/MultiBufferInputStream.java: ## @@ -238,8 +257,31 @@ public int read(byte[] bytes, int off, int len) { }

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-21 Thread GitBox
shangxinli commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r950908127 ## parquet-common/src/main/java/org/apache/parquet/bytes/ByteBufferInputStream.java: ## @@ -138,6 +134,18 @@ public int read(byte[] b, int off, int len) throws

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-21 Thread GitBox
shangxinli commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r950907839 ## parquet-common/src/main/java/org/apache/parquet/bytes/MultiBufferInputStream.java: ## @@ -238,8 +257,31 @@ public int read(byte[] bytes, int off, int len) { }

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-21 Thread GitBox
shangxinli commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r950906824 ## parquet-common/src/main/java/org/apache/parquet/bytes/ByteBufferInputStream.java: ## @@ -138,6 +134,18 @@ public int read(byte[] b, int off, int len) throws

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-21 Thread GitBox
shangxinli commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r950903406 ## parquet-common/src/main/java/org/apache/parquet/bytes/ByteBufferInputStream.java: ## @@ -157,4 +165,80 @@ public void reset() throws IOException { public

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-21 Thread GitBox
shangxinli commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r950903342 ## parquet-common/src/main/java/org/apache/parquet/bytes/ByteBufferInputStream.java: ## @@ -157,4 +165,80 @@ public void reset() throws IOException { public

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-08-21 Thread GitBox
shangxinli commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r950888122 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { }

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-08-21 Thread GitBox
shangxinli commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r950887905 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { }

[GitHub] [parquet-mr] shangxinli commented on pull request #986: Prevent IntelliJ from making unsolicited whitespace changes

2022-08-21 Thread GitBox
shangxinli commented on PR #986: URL: https://github.com/apache/parquet-mr/pull/986#issuecomment-1221602126 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #985: PARQUET-2173. Fix parquet build against hadoop 3.3.3+

2022-08-21 Thread GitBox
shangxinli commented on code in PR #985: URL: https://github.com/apache/parquet-mr/pull/985#discussion_r950886634 ## pom.xml: ## @@ -160,7 +160,11 @@ org.slf4j -slf4j-log4j12 +* Review Comment: '*' might be too

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #982: PARQUET-2160: Close ZstdInputStream to free off-heap memory in time.

2022-08-21 Thread GitBox
shangxinli commented on code in PR #982: URL: https://github.com/apache/parquet-mr/pull/982#discussion_r950883464 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java: ## @@ -109,7 +110,17 @@ public BytesInput decompress(BytesInput bytes, int

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #982: PARQUET-2160: Close ZstdInputStream to free off-heap memory in time.

2022-08-21 Thread GitBox
shangxinli commented on code in PR #982: URL: https://github.com/apache/parquet-mr/pull/982#discussion_r950883464 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java: ## @@ -109,7 +110,17 @@ public BytesInput decompress(BytesInput bytes, int

[GitHub] [parquet-mr] ggershinsky commented on pull request #987: Parquet-MR Encryption - Modify to true to encrypt

2022-08-18 Thread GitBox
ggershinsky commented on PR #987: URL: https://github.com/apache/parquet-mr/pull/987#issuecomment-1220261015 This breaks the parquet columnar encryption mode. We use the parquet "uniform" encryption mode instead for file encryption in Iceberg. Please have a look at

[GitHub] [parquet-mr] matthieun opened a new pull request, #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-08-18 Thread GitBox
matthieun opened a new pull request, #988: URL: https://github.com/apache/parquet-mr/pull/988 In case some proto definitions have circular dependencies, the proto schema converter breaks those and logs a warning, instead of a `StackOverflowException`. ### Jira - [x] My PR

[GitHub] [parquet-mr] renshangtao commented on pull request #987: Modify to true to encrypt

2022-08-18 Thread GitBox
renshangtao commented on PR #987: URL: https://github.com/apache/parquet-mr/pull/987#issuecomment-1219439769 @ggershinsky please review it. Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [parquet-mr] renshangtao opened a new pull request, #987: Modify to true to encrypt

2022-08-18 Thread GitBox
renshangtao opened a new pull request, #987: URL: https://github.com/apache/parquet-mr/pull/987 I use iceberg to encryption parquet fileļ¼ŒThen i find It will return "No encryption setup found for column [c1]". Looking at the code, I can see that this parameter is fixed to false, which

[GitHub] [parquet-mr] gszadovszky merged pull request #981: PARQUET-2169: Upgrade Avro to version 1.11.1

2022-08-18 Thread GitBox
gszadovszky merged PR #981: URL: https://github.com/apache/parquet-mr/pull/981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] parthchandra commented on pull request #986: Prevent IntelliJ from making unsolicited whitespace changes

2022-08-17 Thread GitBox
parthchandra commented on PR #986: URL: https://github.com/apache/parquet-mr/pull/986#issuecomment-1218501682 I think one needs this also - https://www.jetbrains.com/help/idea/reformat-and-rearrange-code.html#keep_existing_formatting -- This is an automated message from the Apache Git

[GitHub] [parquet-mr] theosib-amazon commented on pull request #986: Prevent IntelliJ from making unsolicited whitespace changes

2022-08-17 Thread GitBox
theosib-amazon commented on PR #986: URL: https://github.com/apache/parquet-mr/pull/986#issuecomment-1218488123 @parthchandra This might be the cause of the problem we were encountering with the whitespace changes. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [parquet-mr] theosib-amazon opened a new pull request, #986: Prevent IntelliJ from making unsolicited whitespace changes

2022-08-17 Thread GitBox
theosib-amazon opened a new pull request, #986: URL: https://github.com/apache/parquet-mr/pull/986 Every time I make a PR on this project, I get a whole bunch of complaints about superfluous whitespace changes that I have to manually revert. Those changes are caused by a flag in

[GitHub] [parquet-mr] iemejia commented on pull request #981: PARQUET-2169: Upgrade Avro to version 1.11.1

2022-08-17 Thread GitBox
iemejia commented on PR #981: URL: https://github.com/apache/parquet-mr/pull/981#issuecomment-1218433182 Ah oups sorry for the confusion @sunchao :) @gszadovszky maybe? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-17 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r948043918 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -38,6 +39,34 @@ class SingleBufferInputStream extends

[GitHub] [parquet-mr] steveloughran commented on pull request #985: PARQUET-2173. Fix parquet build against hadoop 3.3.3+

2022-08-16 Thread GitBox
steveloughran commented on PR #985: URL: https://github.com/apache/parquet-mr/pull/985#issuecomment-1217104595 i've also built against the next release of hadoop, and of 3.4.0-SNAPSHOT. the parquet build fails there as jackson 1 is purged from the hadoop classpath, breaking the

[GitHub] [parquet-mr] steveloughran opened a new pull request, #985: PARQUET-2173. Fix parquet build against hadoop 3.3.3+

2022-08-16 Thread GitBox
steveloughran opened a new pull request, #985: URL: https://github.com/apache/parquet-mr/pull/985 Hadoop 3.3.3 moved to reload4j for logging to stop shipping a version of log4j with known (albeit unused) CVEs. This bypasses the existing exclusion code used to keep

[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2022-08-16 Thread GitBox
ggershinsky commented on code in PR #968: URL: https://github.com/apache/parquet-mr/pull/968#discussion_r946662874 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -126,6 +127,42 @@ public class ParquetFileReader implements Closeable {

[GitHub] [parquet-mr] steveloughran commented on a diff in pull request #983: WiP: parquet to use openfile api and some other performance enhancements

2022-08-15 Thread GitBox
steveloughran commented on code in PR #983: URL: https://github.com/apache/parquet-mr/pull/983#discussion_r945594211 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopInputFile.java: ## @@ -66,7 +68,13 @@ public long getLength() { @Override public

[GitHub] [parquet-mr] zhongyujiang commented on a diff in pull request #982: PARQUET-2160: Close ZstdInputStream to free off-heap memory in time.

2022-08-14 Thread GitBox
zhongyujiang commented on code in PR #982: URL: https://github.com/apache/parquet-mr/pull/982#discussion_r945428783 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java: ## @@ -109,7 +110,12 @@ public BytesInput decompress(BytesInput bytes, int

[GitHub] [parquet-mr] sunchao commented on a diff in pull request #982: PARQUET-2160: Close ZstdInputStream to free off-heap memory in time.

2022-08-12 Thread GitBox
sunchao commented on code in PR #982: URL: https://github.com/apache/parquet-mr/pull/982#discussion_r944925723 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java: ## @@ -109,7 +110,12 @@ public BytesInput decompress(BytesInput bytes, int

[GitHub] [parquet-mr] sunchao commented on pull request #981: PARQUET-2169: Upgrade Avro to version 1.11.1

2022-08-11 Thread GitBox
sunchao commented on PR #981: URL: https://github.com/apache/parquet-mr/pull/981#issuecomment-1212528245 I'm not a committer. I think @nandorKollar can do it since he gave +1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [parquet-mr] dependabot[bot] opened a new pull request, #984: Bump hadoop-common from 3.2.3 to 3.2.4

2022-08-11 Thread GitBox
dependabot[bot] opened a new pull request, #984: URL: https://github.com/apache/parquet-mr/pull/984 Bumps hadoop-common from 3.2.3 to 3.2.4. [![Dependabot compatibility

[GitHub] [parquet-mr] iemejia commented on pull request #981: PARQUET-2169: Upgrade Avro to version 1.11.1

2022-08-11 Thread GitBox
iemejia commented on PR #981: URL: https://github.com/apache/parquet-mr/pull/981#issuecomment-1212462489 @sunchao maybe? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [parquet-mr] iemejia commented on pull request #981: PARQUET-2169: Upgrade Avro to version 1.11.1

2022-08-10 Thread GitBox
iemejia commented on PR #981: URL: https://github.com/apache/parquet-mr/pull/981#issuecomment-1210998275 Can somebody please merge this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [parquet-mr] sunchao commented on a diff in pull request #983: WiP: parquet to use openfile api and some other performance enhancements

2022-08-09 Thread GitBox
sunchao commented on code in PR #983: URL: https://github.com/apache/parquet-mr/pull/983#discussion_r941644415 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopInputFile.java: ## @@ -66,7 +68,13 @@ public long getLength() { @Override public

[GitHub] [parquet-mr] steveloughran commented on a diff in pull request #983: WiP: parquet to use openfile api and some other performance enhancements

2022-08-09 Thread GitBox
steveloughran commented on code in PR #983: URL: https://github.com/apache/parquet-mr/pull/983#discussion_r941640533 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopInputFile.java: ## @@ -66,7 +68,13 @@ public long getLength() { @Override public

[GitHub] [parquet-mr] sunchao commented on a diff in pull request #983: WiP: parquet to use openfile api and some other performance enhancements

2022-08-08 Thread GitBox
sunchao commented on code in PR #983: URL: https://github.com/apache/parquet-mr/pull/983#discussion_r940452465 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopInputFile.java: ## @@ -66,7 +68,13 @@ public long getLength() { @Override public

[GitHub] [parquet-mr] steveloughran opened a new pull request, #983: WiP: parquet to use openfile api and some other performance enhancements

2022-08-08 Thread GitBox
steveloughran opened a new pull request, #983: URL: https://github.com/apache/parquet-mr/pull/983 This is me looking at what minimal changes could be made to boost IO performance working with the cloud stores. Compiles against hadoop 3.3.3; will need hadoop 3.3.5 for some of the

[GitHub] [parquet-mr] zhongyujiang commented on a diff in pull request #982: PARQUET-2160: Close ZstdInputStream to free off-heap memory in time.

2022-08-07 Thread GitBox
zhongyujiang commented on code in PR #982: URL: https://github.com/apache/parquet-mr/pull/982#discussion_r939789492 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java: ## @@ -109,7 +110,12 @@ public BytesInput decompress(BytesInput bytes, int

[GitHub] [parquet-mr] zhongyujiang commented on a diff in pull request #982: PARQUET-2160: Close ZstdInputStream to free off-heap memory in time.

2022-08-07 Thread GitBox
zhongyujiang commented on code in PR #982: URL: https://github.com/apache/parquet-mr/pull/982#discussion_r939787073 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java: ## @@ -109,7 +110,12 @@ public BytesInput decompress(BytesInput bytes, int

[GitHub] [parquet-mr] zhongyujiang opened a new pull request, #982: PARQUET-2160: Close ZstdInputStream to free off-heap memory in time.

2022-08-07 Thread GitBox
zhongyujiang opened a new pull request, #982: URL: https://github.com/apache/parquet-mr/pull/982 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET-2160) issues and references

[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2022-08-05 Thread GitBox
parthchandra commented on code in PR #968: URL: https://github.com/apache/parquet-mr/pull/968#discussion_r927128065 ## parquet-hadoop/src/main/java/org/apache/parquet/HadoopReadOptions.java: ## @@ -61,9 +65,10 @@ private HadoopReadOptions(boolean useSignedStringMinMax,

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-01 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r934673092 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -88,6 +136,21 @@ public long skip(long n) { return

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-01 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r934670047 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -38,6 +39,34 @@ class SingleBufferInputStream extends

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-01 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r934669363 ## parquet-common/src/main/java/org/apache/parquet/bytes/MultiBufferInputStream.java: ## @@ -379,4 +427,120 @@ public void remove() { second.remove();

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-01 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r934668939 ## parquet-common/src/main/java/org/apache/parquet/bytes/MultiBufferInputStream.java: ## @@ -89,6 +91,15 @@ public long skip(long n) { return bytesSkipped;

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-01 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r934649460 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -174,4 +254,64 @@ public boolean markSupported() { public int

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-01 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r934648806 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -88,6 +136,21 @@ public long skip(long n) { return

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-01 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r934644315 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -70,9 +105,22 @@ public int read(byte[] bytes, int offset, int

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-01 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r934641756 ## parquet-common/src/main/java/org/apache/parquet/bytes/MultiBufferInputStream.java: ## @@ -379,4 +427,120 @@ public void remove() { second.remove();

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-01 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r934629584 ## parquet-common/src/main/java/org/apache/parquet/bytes/MultiBufferInputStream.java: ## @@ -238,8 +257,31 @@ public int read(byte[] bytes, int off, int len) {

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-01 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r934627078 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -38,6 +39,34 @@ class SingleBufferInputStream extends

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-01 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r934623419 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -38,6 +39,34 @@ class SingleBufferInputStream extends

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-08-01 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r934619931 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -88,6 +136,21 @@ public long skip(long n) { return

[GitHub] [parquet-mr] iemejia opened a new pull request, #981: PARQUET-2169: Upgrade Avro to version 1.11.1

2022-07-31 Thread GitBox
iemejia opened a new pull request, #981: URL: https://github.com/apache/parquet-mr/pull/981 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET-2169) issues and references them

[GitHub] [parquet-mr] steveloughran commented on pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

2022-07-29 Thread GitBox
steveloughran commented on PR #959: URL: https://github.com/apache/parquet-mr/pull/959#issuecomment-1199891512 ypu might want to look at WeakReferences...we've been using them recently to implement threadlocal-like storage where GCs will trigger cleanup of instances which aren't being used

[GitHub] [parquet-mr] sunchao commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-07-28 Thread GitBox
sunchao commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r932674870 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -88,6 +136,21 @@ public long skip(long n) { return bytesToSkip; }

[GitHub] [parquet-mr] theosib-amazon commented on pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

2022-07-26 Thread GitBox
theosib-amazon commented on PR #959: URL: https://github.com/apache/parquet-mr/pull/959#issuecomment-1195676003 I just thought of something that makes me nervous about this PR that requires further investigation. Consider the following scenario: - Thread A allocates a codec - Thread A

[GitHub] [parquet-mr] theosib-amazon commented on pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-07-26 Thread GitBox
theosib-amazon commented on PR #960: URL: https://github.com/apache/parquet-mr/pull/960#issuecomment-1195568967 > Is this mostly a refactoring PR? I also don't see `LittleEndianDataInputStream` being marked as deprecated. I initially marked `LittleEndianDataInputStream` as

[GitHub] [parquet-mr] ggershinsky commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

2022-07-26 Thread GitBox
ggershinsky commented on PR #978: URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1195083014 cc @shangxinli -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [parquet-mr] shangxinli merged pull request #980: PARQUET-2167: Fix CLI serializing footer with date fields

2022-07-25 Thread GitBox
shangxinli merged PR #980: URL: https://github.com/apache/parquet-mr/pull/980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] sunchao commented on pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-07-25 Thread GitBox
sunchao commented on PR #960: URL: https://github.com/apache/parquet-mr/pull/960#issuecomment-1194739358 Is this mostly a refactoring PR? I also don't see `LittleEndianDataInputStream` being marked as deprecated. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [parquet-mr] theosib-amazon commented on pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

2022-07-25 Thread GitBox
theosib-amazon commented on PR #959: URL: https://github.com/apache/parquet-mr/pull/959#issuecomment-1194259084 I did some poking around. It looks like if you call release() on a codec, it (a) resets the codec (freeing resources, I think) and (b) returns it to a pool of codecs without

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-07-25 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r929020852 ## parquet-common/src/main/java/org/apache/parquet/bytes/MultiBufferInputStream.java: ## @@ -379,4 +427,120 @@ public void remove() { second.remove();

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-07-25 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r929018213 ## parquet-common/src/main/java/org/apache/parquet/bytes/MultiBufferInputStream.java: ## @@ -238,8 +257,31 @@ public int read(byte[] bytes, int off, int len) {

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-07-25 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r929011710 ## parquet-common/src/main/java/org/apache/parquet/bytes/MultiBufferInputStream.java: ## @@ -238,8 +257,31 @@ public int read(byte[] bytes, int off, int len) {

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-07-25 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r929010259 ## parquet-common/src/main/java/org/apache/parquet/bytes/ByteBufferInputStream.java: ## @@ -157,4 +165,80 @@ public void reset() throws IOException { public

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-07-25 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r929008981 ## parquet-common/src/main/java/org/apache/parquet/bytes/ByteBufferInputStream.java: ## @@ -157,4 +165,80 @@ public void reset() throws IOException { public

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-07-25 Thread GitBox
theosib-amazon commented on code in PR #960: URL: https://github.com/apache/parquet-mr/pull/960#discussion_r929007342 ## parquet-common/src/main/java/org/apache/parquet/bytes/ByteBufferInputStream.java: ## @@ -157,4 +165,80 @@ public void reset() throws IOException { public

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #957: PARQUET-2069: Allow list and array record types to be compatible.

2022-07-25 Thread GitBox
theosib-amazon commented on code in PR #957: URL: https://github.com/apache/parquet-mr/pull/957#discussion_r928999801 ## parquet-avro/src/main/java/org/apache/parquet/avro/AvroReadSupport.java: ## @@ -136,10 +137,22 @@ public RecordMaterializer prepareForRead(

<    1   2   3   4   5   6   7   8   9   10   >