[GitHub] [parquet-mr] zhangjiashen commented on a diff in pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-24 Thread via GitHub
zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335132367 ## parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveStringifier.java: ## @@ -448,4 +449,16 @@ private void appendHex(byte[] array, int offset,

[GitHub] [parquet-mr] zhangjiashen commented on a diff in pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-24 Thread via GitHub
zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335132272 ## parquet-common/src/main/java/org/apache/parquet/util/Float16.java: ## @@ -0,0 +1,192 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [parquet-mr] zhangjiashen commented on a diff in pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-24 Thread via GitHub
zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335132034 ## parquet-common/src/test/java/org/apache/parquet/util/TestFloat16.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [parquet-mr] zhangjiashen commented on a diff in pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-24 Thread via GitHub
zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131998 ## parquet-common/src/main/java/org/apache/parquet/util/Float16.java: ## @@ -0,0 +1,192 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [parquet-mr] zhangjiashen commented on a diff in pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-24 Thread via GitHub
zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131904 ## parquet-hadoop/src/test/java/org/apache/parquet/format/converter/TestParquetMetadataConverter.java: ## @@ -990,6 +990,30 @@ private void

[GitHub] [parquet-mr] zhangjiashen commented on a diff in pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-24 Thread via GitHub
zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131904 ## parquet-hadoop/src/test/java/org/apache/parquet/format/converter/TestParquetMetadataConverter.java: ## @@ -990,6 +990,30 @@ private void

[GitHub] [parquet-mr] zhangjiashen commented on a diff in pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-24 Thread via GitHub
zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131781 ## parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveComparator.java: ## @@ -276,4 +279,24 @@ public String toString() { return

[GitHub] [parquet-mr] zhangjiashen commented on a diff in pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-24 Thread via GitHub
zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1335131705 ## parquet-common/src/test/java/org/apache/parquet/util/TestFloat16.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [parquet-mr] dependabot[bot] opened a new pull request, #1152: Bump com.h2database:h2 from 2.2.220 to 2.2.224

2023-09-23 Thread via GitHub
dependabot[bot] opened a new pull request, #1152: URL: https://github.com/apache/parquet-mr/pull/1152 Bumps [com.h2database:h2](https://github.com/h2database/h2database) from 2.2.220 to 2.2.224. Release notes Sourced from

[GitHub] [parquet-mr] dependabot[bot] opened a new pull request, #1151: Bump io.airlift:aircompressor from 0.21 to 0.25

2023-09-23 Thread via GitHub
dependabot[bot] opened a new pull request, #1151: URL: https://github.com/apache/parquet-mr/pull/1151 Bumps [io.airlift:aircompressor](https://github.com/airlift/aircompressor) from 0.21 to 0.25. Commits

[GitHub] [parquet-mr] MaheshGPai commented on pull request #1121: PARQUET-1381: Support merging of rowgroups during file rewrite

2023-09-23 Thread via GitHub
MaheshGPai commented on PR #1121: URL: https://github.com/apache/parquet-mr/pull/1121#issuecomment-1732238952 > This is a great initiative. Do you still have plan to address the feedback @MaheshGPai ? @shangxinli I do plan to work on it. But I have not had time to get to this. --

[GitHub] [parquet-mr] zhangjiashen commented on pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-22 Thread via GitHub
zhangjiashen commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1732203105 > CI failures are likely due to the fact that the addition of the logical type to parquet-format is unmerged, so the specific [PR

[GitHub] [parquet-mr] Fokko merged pull request #1150: Bump com.squareup.okhttp3:okhttp from 4.6.0 to 4.11.0

2023-09-22 Thread via GitHub
Fokko merged PR #1150: URL: https://github.com/apache/parquet-mr/pull/1150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] dependabot[bot] opened a new pull request, #1150: Bump com.squareup.okhttp3:okhttp from 4.6.0 to 4.11.0

2023-09-22 Thread via GitHub
dependabot[bot] opened a new pull request, #1150: URL: https://github.com/apache/parquet-mr/pull/1150 Bumps [com.squareup.okhttp3:okhttp](https://github.com/square/okhttp) from 4.6.0 to 4.11.0. Commits

[GitHub] [parquet-mr] dependabot[bot] opened a new pull request, #1149: Bump com.twitter.elephantbird:elephant-bird-core from 4.4 to 4.17

2023-09-22 Thread via GitHub
dependabot[bot] opened a new pull request, #1149: URL: https://github.com/apache/parquet-mr/pull/1149 Bumps [com.twitter.elephantbird:elephant-bird-core](https://github.com/twitter/elephant-bird) from 4.4 to 4.17. Release notes Sourced from

[GitHub] [parquet-mr] Fokko merged pull request #1146: MINOR: Modest refactor of ParquetFileWriter

2023-09-22 Thread via GitHub
Fokko merged PR #1146: URL: https://github.com/apache/parquet-mr/pull/1146 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] Fokko merged pull request #1148: Remove Dependabot PR limit

2023-09-22 Thread via GitHub
Fokko merged PR #1148: URL: https://github.com/apache/parquet-mr/pull/1148 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] Fokko merged pull request #1145: Use try-with-resources

2023-09-22 Thread via GitHub
Fokko merged PR #1145: URL: https://github.com/apache/parquet-mr/pull/1145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-format] findepi commented on a diff in pull request #216: PARQUET-2352: Allow truncation of row group min_values/max_value statistics

2023-09-22 Thread via GitHub
findepi commented on code in PR #216: URL: https://github.com/apache/parquet-format/pull/216#discussion_r1333952085 ## src/main/thrift/parquet.thrift: ## @@ -216,7 +216,12 @@ struct Statistics { /** count of distinct values occurring */ 4: optional i64 distinct_count;

[GitHub] [parquet-mr] shangxinli commented on pull request #1121: PARQUET-1381: Support merging of rowgroups during file rewrite

2023-09-21 Thread via GitHub
shangxinli commented on PR #1121: URL: https://github.com/apache/parquet-mr/pull/1121#issuecomment-1730734276 This is a great initiative. Do you still have plan to address the feedback @MaheshGPai ? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [parquet-format] mapleFU commented on a diff in pull request #216: PARQUET-2352: Allow truncation of row group min_values/max_value statistics

2023-09-21 Thread via GitHub
mapleFU commented on code in PR #216: URL: https://github.com/apache/parquet-format/pull/216#discussion_r1333823691 ## src/main/thrift/parquet.thrift: ## @@ -216,7 +216,12 @@ struct Statistics { /** count of distinct values occurring */ 4: optional i64 distinct_count;

[GitHub] [parquet-format] wgtmac commented on a diff in pull request #216: PARQUET-2352: Allow truncation of row group min_values/max_value statistics

2023-09-21 Thread via GitHub
wgtmac commented on code in PR #216: URL: https://github.com/apache/parquet-format/pull/216#discussion_r1333778393 ## src/main/thrift/parquet.thrift: ## @@ -216,7 +216,12 @@ struct Statistics { /** count of distinct values occurring */ 4: optional i64 distinct_count;

[GitHub] [parquet-format] findepi commented on pull request #216: PARQUET-2352: Allow truncation of row group min_values/max_value statistics

2023-09-21 Thread via GitHub
findepi commented on PR #216: URL: https://github.com/apache/parquet-format/pull/216#issuecomment-1730285274 Thanks @raunaqmorarka, this lgtm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [parquet-format] raunaqmorarka commented on a diff in pull request #216: PARQUET-2352: Allow truncation of row group min_values/max_value statistics

2023-09-21 Thread via GitHub
raunaqmorarka commented on code in PR #216: URL: https://github.com/apache/parquet-format/pull/216#discussion_r128796 ## src/main/thrift/parquet.thrift: ## @@ -216,7 +216,12 @@ struct Statistics { /** count of distinct values occurring */ 4: optional i64

[GitHub] [parquet-format] wgtmac commented on a diff in pull request #216: PARQUET-2352: Allow truncation of row group min_values/max_value statistics

2023-09-21 Thread via GitHub
wgtmac commented on code in PR #216: URL: https://github.com/apache/parquet-format/pull/216#discussion_r1333286445 ## src/main/thrift/parquet.thrift: ## @@ -216,7 +216,12 @@ struct Statistics { /** count of distinct values occurring */ 4: optional i64 distinct_count;

[GitHub] [parquet-mr] Fokko merged pull request #1147: Bump jmh.version from 1.21 to 1.37

2023-09-21 Thread via GitHub
Fokko merged PR #1147: URL: https://github.com/apache/parquet-mr/pull/1147 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-format] Fokko merged pull request #203: PARQUET-2313: Bump actions/setup-java from 1 to 3

2023-09-21 Thread via GitHub
Fokko merged PR #203: URL: https://github.com/apache/parquet-format/pull/203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] Fokko closed pull request #1054: PARQUET-2268: Bump Thrift to 0.18.1

2023-09-21 Thread via GitHub
Fokko closed pull request #1054: PARQUET-2268: Bump Thrift to 0.18.1 URL: https://github.com/apache/parquet-mr/pull/1054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [parquet-mr] Fokko opened a new pull request, #1148: Remove PR limit

2023-09-21 Thread via GitHub
Fokko opened a new pull request, #1148: URL: https://github.com/apache/parquet-mr/pull/1148 Dependabot is effectively not working because 5 PRs have been open for ages. Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet

[GitHub] [parquet-mr] dependabot[bot] commented on pull request #1110: Bump scalatest_2.12 from 3.0.1 to 3.3.0-SNAP4

2023-09-21 Thread via GitHub
dependabot[bot] commented on PR #1110: URL: https://github.com/apache/parquet-mr/pull/1110#issuecomment-1729435128 OK, I won't notify you about version 3.3.x again, unless you re-open this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [parquet-mr] dependabot[bot] closed pull request #1110: Bump scalatest_2.12 from 3.0.1 to 3.3.0-SNAP4

2023-09-21 Thread via GitHub
dependabot[bot] closed pull request #1110: Bump scalatest_2.12 from 3.0.1 to 3.3.0-SNAP4 URL: https://github.com/apache/parquet-mr/pull/1110 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [parquet-mr] Fokko commented on pull request #1110: Bump scalatest_2.12 from 3.0.1 to 3.3.0-SNAP4

2023-09-21 Thread via GitHub
Fokko commented on PR #1110: URL: https://github.com/apache/parquet-mr/pull/1110#issuecomment-1729435056 @dependabot ignore this minor version -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [parquet-mr] Fokko commented on pull request #1068: Bump elephant-bird.version from 4.4 to 4.17

2023-09-21 Thread via GitHub
Fokko commented on PR #1068: URL: https://github.com/apache/parquet-mr/pull/1068#issuecomment-1729434254 @lukasnalezenec what do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [parquet-mr] dependabot[bot] closed pull request #1062: Bump jmh.version from 1.21 to 1.36

2023-09-21 Thread via GitHub
dependabot[bot] closed pull request #1062: Bump jmh.version from 1.21 to 1.36 URL: https://github.com/apache/parquet-mr/pull/1062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [parquet-mr] dependabot[bot] commented on pull request #1062: Bump jmh.version from 1.21 to 1.36

2023-09-21 Thread via GitHub
dependabot[bot] commented on PR #1062: URL: https://github.com/apache/parquet-mr/pull/1062#issuecomment-1729427261 Superseded by #1147. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [parquet-mr] dependabot[bot] opened a new pull request, #1147: Bump jmh.version from 1.21 to 1.37

2023-09-21 Thread via GitHub
dependabot[bot] opened a new pull request, #1147: URL: https://github.com/apache/parquet-mr/pull/1147 Bumps `jmh.version` from 1.21 to 1.37. Updates `org.openjdk.jmh:jmh-core` from 1.21 to 1.37 Commits

[GitHub] [parquet-mr] Fokko commented on pull request #1062: Bump jmh.version from 1.21 to 1.36

2023-09-21 Thread via GitHub
Fokko commented on PR #1062: URL: https://github.com/apache/parquet-mr/pull/1062#issuecomment-1729426235 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [parquet-mr] Fokko opened a new pull request, #1146: Modest refactor of ParquetFileWriter

2023-09-21 Thread via GitHub
Fokko opened a new pull request, #1146: URL: https://github.com/apache/parquet-mr/pull/1146 IDEA was lighting up as a christmas tree :) Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet

[GitHub] [parquet-mr] Fokko opened a new pull request, #1145: Use try-with-resources

2023-09-21 Thread via GitHub
Fokko opened a new pull request, #1145: URL: https://github.com/apache/parquet-mr/pull/1145 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in

[GitHub] [parquet-mr] Fokko opened a new pull request, #1144: Remove old setting for cascading

2023-09-21 Thread via GitHub
Fokko opened a new pull request, #1144: URL: https://github.com/apache/parquet-mr/pull/1144 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in

[GitHub] [parquet-format] raunaqmorarka opened a new pull request, #216: PARQUET-2352: Allow truncation of row group min_values/max_value statistics

2023-09-20 Thread via GitHub
raunaqmorarka opened a new pull request, #216: URL: https://github.com/apache/parquet-format/pull/216 ### Jira - https://issues.apache.org/jira/browse/PARQUET-2352 This updates the spec to allow truncation of row group min_values/max_value statistics so that readers can take

[GitHub] [parquet-mr] benibus commented on a diff in pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-20 Thread via GitHub
benibus commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1331975632 ## parquet-common/src/main/java/org/apache/parquet/util/Float16.java: ## @@ -0,0 +1,192 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + *

[GitHub] [parquet-mr] clairemcginty commented on a diff in pull request #1140: allow read old parquet file which is maked by old api with old avro version which allow wrong default value in schema

2023-09-20 Thread via GitHub
clairemcginty commented on code in PR #1140: URL: https://github.com/apache/parquet-mr/pull/1140#discussion_r1331817509 ## parquet-avro/src/main/java/org/apache/parquet/avro/AvroReadSupport.java: ## @@ -129,10 +129,10 @@ public RecordMaterializer prepareForRead(

[GitHub] [parquet-mr] wgtmac commented on pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-19 Thread via GitHub
wgtmac commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1726797426 > CI failures are likely due to the fact that the addition of the logical type to parquet-format is unmerged, so the specific [PR

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-19 Thread via GitHub
wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1330863760 ## parquet-common/src/test/java/org/apache/parquet/util/TestFloat16.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

[GitHub] [parquet-mr] wgtmac commented on pull request #1141: PARQUET-2347: Add interface layer between Parquet and Hadoop Configuration

2023-09-19 Thread via GitHub
wgtmac commented on PR #1141: URL: https://github.com/apache/parquet-mr/pull/1141#issuecomment-1726754834 Thanks for working on this! I will take a look by the end of this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1140: allow read old parquet file which is maked by old api with old avro version which allow wrong default value in schema

2023-09-19 Thread via GitHub
wgtmac commented on code in PR #1140: URL: https://github.com/apache/parquet-mr/pull/1140#discussion_r1330857433 ## parquet-avro/src/main/java/org/apache/parquet/avro/AvroReadSupport.java: ## @@ -129,10 +129,10 @@ public RecordMaterializer prepareForRead( avroSchema =

[GitHub] [parquet-mr] wgtmac commented on pull request #1140: allow read old parquet file which is maked by old api with old avro version which allow wrong default value in schema

2023-09-19 Thread via GitHub
wgtmac commented on PR #1140: URL: https://github.com/apache/parquet-mr/pull/1140#issuecomment-1726750224 Thanks @wwang-talend for opening the PR! Could you create a JIRA issue for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [parquet-mr] benibus commented on a diff in pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-19 Thread via GitHub
benibus commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1330570985 ## parquet-hadoop/src/test/java/org/apache/parquet/format/converter/TestParquetMetadataConverter.java: ## @@ -990,6 +990,30 @@ private void

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1143: PARQUET-2348: Recompression/Re-encrypt should rewrite bloomfilter

2023-09-19 Thread via GitHub
wgtmac commented on code in PR #1143: URL: https://github.com/apache/parquet-mr/pull/1143#discussion_r1330272179 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ## @@ -366,6 +366,10 @@ private void processChunk(ColumnChunkMetaData chunk,

[GitHub] [parquet-mr] gszadovszky commented on a diff in pull request #1139: PARQUET-2171: Support Hadoop vectored IO

2023-09-19 Thread via GitHub
gszadovszky commented on code in PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#discussion_r1329973062 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetFileWriter.java: ## @@ -89,10 +89,13 @@ import

[GitHub] [parquet-mr] gszadovszky commented on a diff in pull request #1139: PARQUET-2171: Support Hadoop vectored IO

2023-09-19 Thread via GitHub
gszadovszky commented on code in PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#discussion_r1329967395 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/vectorio/BindingUtils.java: ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software

[GitHub] [parquet-mr] steveloughran commented on pull request #1139: PARQUET-2171: Support Hadoop vectored IO

2023-09-19 Thread via GitHub
steveloughran commented on PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1725306371 @gszadovszky thanks for your comments, will update the PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [parquet-mr] steveloughran commented on a diff in pull request #1139: PARQUET-2171: Support Hadoop vectored IO

2023-09-19 Thread via GitHub
steveloughran commented on code in PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#discussion_r1329958167 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetFileWriter.java: ## @@ -89,10 +89,13 @@ import

[GitHub] [parquet-mr] steveloughran commented on a diff in pull request #1139: PARQUET-2171: Support Hadoop vectored IO

2023-09-19 Thread via GitHub
steveloughran commented on code in PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#discussion_r1329955534 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/vectorio/BindingUtils.java: ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software

[GitHub] [parquet-mr] steveloughran commented on a diff in pull request #1139: PARQUET-2171: Support Hadoop vectored IO

2023-09-19 Thread via GitHub
steveloughran commented on code in PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#discussion_r1329955142 ## parquet-hadoop/pom.xml: ## @@ -79,6 +79,11 @@ ${hadoop.version} provided + Review Comment: no, will cut. that was how I migrated

[GitHub] [parquet-mr] steveloughran commented on a diff in pull request #1139: PARQUET-2171: Support Hadoop vectored IO

2023-09-19 Thread via GitHub
steveloughran commented on code in PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#discussion_r1329954899 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/vectorio/BindingUtils.java: ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software

[GitHub] [parquet-mr] gszadovszky commented on a diff in pull request #1139: PARQUET-2171: Support Hadoop vectored IO

2023-09-19 Thread via GitHub
gszadovszky commented on code in PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#discussion_r1329695018 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/vectorio/BindingUtils.java: ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software

[GitHub] [parquet-mr] Fokko merged pull request #1134: PARQUET-2336: Add caching key to CodecFactory

2023-09-18 Thread via GitHub
Fokko merged PR #1134: URL: https://github.com/apache/parquet-mr/pull/1134 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] benibus commented on pull request #1142: PARQUET-1647: Add logical type FLOAT16

2023-09-18 Thread via GitHub
benibus commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1724083429 CI failures are likely due to the fact that the addition of the logical type to parquet-format is unmerged, so the specific [PR

[GitHub] [parquet-mr] steveloughran commented on pull request #1139: PARQUET-2171: Support Hadoop vectored IO

2023-09-18 Thread via GitHub
steveloughran commented on PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1724053099 @danielcweeks that's a good point about pluggability. 1. an interface/implementation split in parquet would line you up later to choose the back end, maybe? 2. I've

[GitHub] [parquet-mr] steveloughran commented on pull request #1139: PARQUET-2171: Support Hadoop vectored IO

2023-09-18 Thread via GitHub
steveloughran commented on PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1724037037 @shangxinli looking forward to your comments -anything you can do to test will be wonderful too! -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [parquet-mr] ConeyLiu commented on pull request #1143: PARQUET-2348: Recompression/Re-encrypt should rewrite bloomfilter

2023-09-18 Thread via GitHub
ConeyLiu commented on PR #1143: URL: https://github.com/apache/parquet-mr/pull/1143#issuecomment-1722961986 Hi @wgtmac, could you help to review this? Thanks in advance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [parquet-mr] ConeyLiu opened a new pull request, #1143: PARQUET-2348: Recompression/Re-encrypt should rewrite bloomfilter

2023-09-18 Thread via GitHub
ConeyLiu opened a new pull request, #1143: URL: https://github.com/apache/parquet-mr/pull/1143 The bloomfilter data is lost after rewriting with recompression or re-encrypt. We should rewrite the bloomfilter data as well. Make sure you have checked _all_ steps below. ### Jira

[GitHub] [parquet-mr] shangxinli commented on pull request #1139: PARQUET-2171: Support Hadoop vectored IO

2023-09-17 Thread via GitHub
shangxinli commented on PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1722528243 @steveloughran Thanks a lot for creating this PR! This is an important feature that we improve the reading performance of Parquet. I just took a brief look and they look great! I

[GitHub] [parquet-mr] zhangjiashen opened a new pull request, #1142: [Parquet-1647] Add logical type FLOAT16

2023-09-17 Thread via GitHub
zhangjiashen opened a new pull request, #1142: URL: https://github.com/apache/parquet-mr/pull/1142 ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My

[GitHub] [parquet-mr] amousavigourabi opened a new pull request, #1141: PARQUET-2347: Add interface layer between Parquet and Hadoop Configuration

2023-09-16 Thread via GitHub
amousavigourabi opened a new pull request, #1141: URL: https://github.com/apache/parquet-mr/pull/1141 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references

[GitHub] [parquet-mr] danielcweeks commented on a diff in pull request #1139: PARQUET-2171: Support Hadoop vectored IO

2023-09-14 Thread via GitHub
danielcweeks commented on code in PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#discussion_r1326633841 ## parquet-common/src/main/java/org/apache/parquet/io/SeekableInputStream.java: ## @@ -105,4 +107,21 @@ public abstract class SeekableInputStream extends

[GitHub] [parquet-mr] wwang-talend opened a new pull request, #1140: allow read old parquet file which is maked by old api with old avro version which allow wrong default value in schema

2023-09-14 Thread via GitHub
wwang-talend opened a new pull request, #1140: URL: https://github.com/apache/parquet-mr/pull/1140 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references

[GitHub] [parquet-format] dependabot[bot] commented on pull request #215: PARQUET-2346: Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-13 Thread via GitHub
dependabot[bot] commented on PR #215: URL: https://github.com/apache/parquet-format/pull/215#issuecomment-1718760575 OK, I won't notify you about org.slf4j:slf4j-api again, unless you re-open this PR. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [parquet-format] dependabot[bot] closed pull request #215: PARQUET-2346: Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-13 Thread via GitHub
dependabot[bot] closed pull request #215: PARQUET-2346: Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9 URL: https://github.com/apache/parquet-format/pull/215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [parquet-format] wgtmac commented on pull request #215: PARQUET-2346: Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-13 Thread via GitHub
wgtmac commented on PR #215: URL: https://github.com/apache/parquet-format/pull/215#issuecomment-1718760542 @dependabot ignore this dependency -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [parquet-format] wgtmac commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-13 Thread via GitHub
wgtmac commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1324650964 ## src/main/thrift/parquet.thrift: ## @@ -764,6 +810,14 @@ struct ColumnMetaData { * in a single I/O. */ 15: optional i32 bloom_filter_length; + + /**

[GitHub] [parquet-mr] steveloughran opened a new pull request, #1139: PARQUET-2171: Support Hadoop vectored IO

2023-09-13 Thread via GitHub
steveloughran opened a new pull request, #1139: URL: https://github.com/apache/parquet-mr/pull/1139 Make sure you have checked _all_ steps below. ### Jira - [X] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323524900 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323506069 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323505565 ## src/main/thrift/parquet.thrift: ## @@ -764,6 +810,14 @@ struct ColumnMetaData { * in a single I/O. */ 15: optional i32 bloom_filter_length; + +

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323069846 ## src/main/thrift/parquet.thrift: ## @@ -764,6 +810,14 @@ struct ColumnMetaData { * in a single I/O. */ 15: optional i32 bloom_filter_length; + + /**

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323069846 ## src/main/thrift/parquet.thrift: ## @@ -764,6 +810,14 @@ struct ColumnMetaData { * in a single I/O. */ 15: optional i32 bloom_filter_length; + + /**

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323059489 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323028211 ## src/main/thrift/parquet.thrift: ## @@ -764,6 +810,14 @@ struct ColumnMetaData { * in a single I/O. */ 15: optional i32 bloom_filter_length; + + /**

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323028211 ## src/main/thrift/parquet.thrift: ## @@ -764,6 +810,14 @@ struct ColumnMetaData { * in a single I/O. */ 15: optional i32 bloom_filter_length; + + /**

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-11 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1322319582 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] dependabot[bot] closed pull request #204: Bump slf4j-api from 1.7.12 to 2.0.7

2023-09-10 Thread via GitHub
dependabot[bot] closed pull request #204: Bump slf4j-api from 1.7.12 to 2.0.7 URL: https://github.com/apache/parquet-format/pull/204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [parquet-format] dependabot[bot] commented on pull request #204: Bump slf4j-api from 1.7.12 to 2.0.7

2023-09-10 Thread via GitHub
dependabot[bot] commented on PR #204: URL: https://github.com/apache/parquet-format/pull/204#issuecomment-1712817820 Superseded by #215. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [parquet-format] dependabot[bot] opened a new pull request, #215: Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-10 Thread via GitHub
dependabot[bot] opened a new pull request, #215: URL: https://github.com/apache/parquet-format/pull/215 Bumps org.slf4j:slf4j-api from 1.7.12 to 2.0.9. [![Dependabot compatibility

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1320256768 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1320231941 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1320192142 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1320143220 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1320122860 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] gszadovszky commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
gszadovszky commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319479960 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] pitrou commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
pitrou commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319468027 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,73 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition

[GitHub] [parquet-format] pitrou commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
pitrou commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319465274 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319461836 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319461836 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319461836 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] pitrou commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
pitrou commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319445217 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] emkornfield commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
emkornfield commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319366325 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

[GitHub] [parquet-format] wgtmac commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
wgtmac commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319252134 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5:

<    5   6   7   8   9   10   11   12   13   14   >