[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610358#comment-17610358 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #988: URL:

[GitHub] [parquet-mr] jinyius commented on pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-09-27 Thread GitBox
jinyius commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1260420022 > @matthieun and @jinyius Would it be possible for you both to sync to come up with one solution? You can put the other one as co-author. imho, i believe #995 is a superset of

[jira] [Commented] (PARQUET-2184) Improve SnappyCompressor buffer expansion performance

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610269#comment-17610269 ] ASF GitHub Bot commented on PARQUET-2184: - shangxinli commented on PR #993: URL:

[GitHub] [parquet-mr] shangxinli commented on pull request #993: PARQUET-2184: Improve the allocation behavior of SnappyCompressor

2022-09-27 Thread GitBox
shangxinli commented on PR #993: URL: https://github.com/apache/parquet-mr/pull/993#issuecomment-1260143659 I wonder how much benefit get can gain of this fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[jira] [Commented] (PARQUET-2184) Improve SnappyCompressor buffer expansion performance

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610265#comment-17610265 ] ASF GitHub Bot commented on PARQUET-2184: - shangxinli commented on code in PR #993: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #993: PARQUET-2184: Improve the allocation behavior of SnappyCompressor

2022-09-27 Thread GitBox
shangxinli commented on code in PR #993: URL: https://github.com/apache/parquet-mr/pull/993#discussion_r981781086 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/SnappyCompressor.java: ## @@ -96,21 +100,40 @@ public synchronized void setInput(byte[] buffer, int

[jira] [Commented] (PARQUET-2184) Improve SnappyCompressor buffer expansion performance

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610260#comment-17610260 ] ASF GitHub Bot commented on PARQUET-2184: - shangxinli commented on code in PR #993: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #993: PARQUET-2184: Improve the allocation behavior of SnappyCompressor

2022-09-27 Thread GitBox
shangxinli commented on code in PR #993: URL: https://github.com/apache/parquet-mr/pull/993#discussion_r981762161 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/SnappyCompressor.java: ## @@ -32,6 +32,10 @@ * entire input in setInput and compresses it as one

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610211#comment-17610211 ] ASF GitHub Bot commented on PARQUET-2196: - shangxinli commented on PR #1000: URL:

[GitHub] [parquet-mr] shangxinli commented on pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
shangxinli commented on PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#issuecomment-1259976087 Nice implementation! For the test, can you add more for interop with lz4? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
shangxinli commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r981649538 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/NonBlockedDecompressor.java: ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610210#comment-17610210 ] ASF GitHub Bot commented on PARQUET-2196: - shangxinli commented on code in PR #1000: URL:

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610209#comment-17610209 ] ASF GitHub Bot commented on PARQUET-2196: - shangxinli commented on code in PR #1000: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
shangxinli commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r981648516 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/NonBlockedDecompressor.java: ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software

Vectored IO in Parquet ( https://issues.apache.org/jira/browse/PARQUET-2171)

2022-09-27 Thread Mukund Madhav Thakur
Hi Team, We in hadoop project recently added a new feature in Hadoop Vectored IO which will be released in the upcoming 3.3.5 hadoop release. This is a high performance scatter/gather extension of PositionedReadable API optimized for reading columnar data in cloud storage.

Parquet community sync meeting notes - 9/27/2022

2022-09-27 Thread Xinli shang
9/27/2022 Attendees ( Gidon Gershinsky, Xinli Shang, Tim Miller, Jiasheng Zhang) 1. Parquet Cell-level encryption 1. Will open PRs after delivering it internally 2. Parquet-2069 : Fix some Avro schema issues, in

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610117#comment-17610117 ] ASF GitHub Bot commented on PARQUET-2196: - shangxinli commented on code in PR #1000: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
shangxinli commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r981380734 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/Lz4RawDecompressor.java: ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610116#comment-17610116 ] ASF GitHub Bot commented on PARQUET-2196: - shangxinli commented on code in PR #1000: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
shangxinli commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r981379233 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/Lz4RawDecompressor.java: ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610112#comment-17610112 ] ASF GitHub Bot commented on PARQUET-2196: - shangxinli commented on code in PR #1000: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
shangxinli commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r981373160 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/Lz4RawCompressor.java: ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610111#comment-17610111 ] ASF GitHub Bot commented on PARQUET-2196: - shangxinli commented on code in PR #1000: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
shangxinli commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r981370983 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/Lz4RawCodec.java: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610109#comment-17610109 ] ASF GitHub Bot commented on PARQUET-2196: - shangxinli commented on code in PR #1000: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
shangxinli commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r981369466 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/Lz4RawCodec.java: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610104#comment-17610104 ] ASF GitHub Bot commented on PARQUET-2196: - shangxinli commented on PR #1000: URL:

[GitHub] [parquet-mr] shangxinli commented on pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
shangxinli commented on PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#issuecomment-1259621617 Thank Gang for contributing! Is there any benchmarking numbers? Any comparison with ZSTD? These are non-blocking question for review and merging. -- This is an automated message

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610028#comment-17610028 ] ASF GitHub Bot commented on PARQUET-2196: - wgtmac commented on PR #1000: URL:

[GitHub] [parquet-mr] wgtmac commented on pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
wgtmac commented on PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#issuecomment-1259431976 > @wgtmac Did you try to read an actual file produced by Parquet C++? > > Note you can find such files in https://github.com/apache/parquet-testing/ Yes, I have tried that.

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609962#comment-17609962 ] ASF GitHub Bot commented on PARQUET-2196: - pitrou commented on code in PR #1000: URL:

[GitHub] [parquet-mr] pitrou commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
pitrou commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r981055817 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestLz4RawCodec.java: ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609945#comment-17609945 ] ASF GitHub Bot commented on PARQUET-2196: - pitrou commented on PR #1000: URL:

[GitHub] [parquet-mr] pitrou commented on pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
pitrou commented on PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#issuecomment-1259279363 cc @lidavidm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609929#comment-17609929 ] ASF GitHub Bot commented on PARQUET-2196: - wgtmac commented on PR #1000: URL:

[GitHub] [parquet-mr] wgtmac commented on pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
wgtmac commented on PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#issuecomment-1259233215 @pitrou @shangxinli Can you please take a look? Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609928#comment-17609928 ] ASF GitHub Bot commented on PARQUET-2196: - wgtmac opened a new pull request, #1000: URL:

[GitHub] [parquet-mr] wgtmac opened a new pull request, #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-27 Thread GitBox
wgtmac opened a new pull request, #1000: URL: https://github.com/apache/parquet-mr/pull/1000 This PR implements the LZ4_RAW codec which was introduced by parquet format v2.9.0. Since there are a lot of common logic between the LZ4_RAW and SNAPPY codecs, this patch moves them into

[jira] [Updated] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2196: - Description: There is a long history about the LZ4 interoperability of parquet files between

[jira] [Updated] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2196: - Description: There is a long history about the LZ4 interoperability of parquet files between

[jira] [Created] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2196: Summary: Support LZ4_RAW codec Key: PARQUET-2196 URL: https://issues.apache.org/jira/browse/PARQUET-2196 Project: Parquet Issue Type: Improvement