date:20230412

[jira] [Commented] (PARQUET-2212) Add ByteBuffer api for decryptors to allow direct memory to be decrypted

2023-04-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711641#comment-17711641
 ] 

ASF GitHub Bot commented on PARQUET-2212:
-

wgtmac commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1506231596

   > Sure. What does one need to do? I believe all the comments are addressed 
and CI seems to be failing for unrelated reasons (is there a way to re-trigger 
the failed tests?).
   
   Let's try to rebase and force push it to see if the CIs are green then?




> Add ByteBuffer api for decryptors to allow direct memory to be decrypted
> 
>
> Key: PARQUET-2212
> URL: https://issues.apache.org/jira/browse/PARQUET-2212
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.12.3
>Reporter: Parth Chandra
>Priority: Major
>
> The decrypt API in BlockCipher.Decryptor currently only provides an api that 
> takes in a byte array
> {code:java}
> byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);{code}
> A parquet reader that uses the DirectByteBufferAllocator has to incur the 
> cost of copying the data into a byte array (and sometimes back to a 
> DirectByteBuffer) to decrypt data.
> This proposes adding a new API that accepts ByteBuffer as input and avoids 
> the data copy.
> {code:java}
> ByteBuffer decrypt(ByteBuffer from, byte[] AAD);{code}
> The decryption in ColumnChunkPageReadStore can also be updated to use the 
> ByteBuffer based api if the buffer is a DirectByteBuffer. If the buffer is a 
> HeapByteBuffer, then we can continue to use the byte array API since that 
> does not incur a copy when the underlying byte array is accessed.
> Also, some investigation has shown that decryption with ByteBuffers is not 
> able to use hardware acceleration in JVM's before JDK17. In those cases, the 
> overall decryption speed is faster with byte arrays even after incurring the 
> overhead of making a copy. 
> The proposal, then, is to enable the use of the ByteBuffer api for 
> DirectByteBuffers only, and only if the JDK is JDK17 or higher or the user 
> explicitly configures it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [parquet-mr] wgtmac commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

2023-04-12 Thread via GitHub



wgtmac commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1506231596

   > Sure. What does one need to do? I believe all the comments are addressed 
and CI seems to be failing for unrelated reasons (is there a way to re-trigger 
the failed tests?).
   
   Let's try to rebase and force push it to see if the CIs are green then?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (PARQUET-2266) Fix support for files without ColumnIndexes

2023-04-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711640#comment-17711640
 ] 

ASF GitHub Bot commented on PARQUET-2266:
-

wgtmac commented on PR #1048:
URL: https://github.com/apache/parquet-mr/pull/1048#issuecomment-1506230575

   @richardkerr Could you please provide your email address and expected user 
name to use? I can create a JIRA account for you and assign the JIRA to you.




> Fix support for files without ColumnIndexes
> ---
>
> Key: PARQUET-2266
> URL: https://issues.apache.org/jira/browse/PARQUET-2266
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.12.3
>Reporter: Gang Wu
>Priority: Major
> Fix For: 1.12.4, 1.14.0, 1.13.1
>
>
> Fix for failure when writing ColumnChunks that do not have a ColumnIndex 
> populated.
> This is introduced by PARQUET-2081



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [parquet-mr] wgtmac commented on pull request #1048: PARQUET-2266: Fix support for files without ColumnIndexes

2023-04-12 Thread via GitHub



wgtmac commented on PR #1048:
URL: https://github.com/apache/parquet-mr/pull/1048#issuecomment-1506230575

   @richardkerr Could you please provide your email address and expected user 
name to use? I can create a JIRA account for you and assign the JIRA to you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (PARQUET-2266) Fix support for files without ColumnIndexes

2023-04-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711638#comment-17711638
 ] 

ASF GitHub Bot commented on PARQUET-2266:
-

wgtmac commented on PR #1048:
URL: https://github.com/apache/parquet-mr/pull/1048#issuecomment-1506227667

   @richardkerr Sorry for the bad experience! Unfortunately, I do not have the 
permission to receive the email for JIRA account request. Can you help? 
@gszadovszky @shangxinli @ggershinsky 
   
   I have created a new JIRA and updated the title. Will backport it into 
1.12.4 and 1.13.1 branch as well.




> Fix support for files without ColumnIndexes
> ---
>
> Key: PARQUET-2266
> URL: https://issues.apache.org/jira/browse/PARQUET-2266
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.12.3
>Reporter: Gang Wu
>Priority: Major
> Fix For: 1.12.4, 1.14.0, 1.13.1
>
>
> Fix for failure when writing ColumnChunks that do not have a ColumnIndex 
> populated.
> This is introduced by PARQUET-2081



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [parquet-mr] wgtmac commented on pull request #1048: PARQUET-2266: Fix support for files without ColumnIndexes

2023-04-12 Thread via GitHub



wgtmac commented on PR #1048:
URL: https://github.com/apache/parquet-mr/pull/1048#issuecomment-1506227667

   @richardkerr Sorry for the bad experience! Unfortunately, I do not have the 
permission to receive the email for JIRA account request. Can you help? 
@gszadovszky @shangxinli @ggershinsky 
   
   I have created a new JIRA and updated the title. Will backport it into 
1.12.4 and 1.13.1 branch as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (PARQUET-2266) Fix support for files without ColumnIndexes

2023-04-12 Thread Gang Wu (Jira)

Gang Wu created PARQUET-2266:


 Summary: Fix support for files without ColumnIndexes
 Key: PARQUET-2266
 URL: https://issues.apache.org/jira/browse/PARQUET-2266
 Project: Parquet
  Issue Type: Bug
Affects Versions: 1.12.3
Reporter: Gang Wu
 Fix For: 1.12.4, 1.14.0, 1.13.1


Fix for failure when writing ColumnChunks that do not have a ColumnIndex 
populated.

This is introduced by PARQUET-2081



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2149) Implement async IO for Parquet file reader

2023-04-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711588#comment-17711588
 ] 

ASF GitHub Bot commented on PARQUET-2149:
-

parthchandra commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1506058505

   FWIW the hadoop 3.3.5 vector io changes might make this PR redundant.




> Implement async IO for Parquet file reader
> --
>
> Key: PARQUET-2149
> URL: https://issues.apache.org/jira/browse/PARQUET-2149
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Parth Chandra
>Priority: Major
>
> ParquetFileReader's implementation has the following flow (simplified) - 
>       - For every column -> Read from storage in 8MB blocks -> Read all 
> uncompressed pages into output queue 
>       - From output queues -> (downstream ) decompression + decoding
> This flow is serialized, which means that downstream threads are blocked 
> until the data has been read. Because a large part of the time spent is 
> waiting for data from storage, threads are idle and CPU utilization is really 
> low.
> There is no reason why this cannot be made asynchronous _and_ parallel. So 
> For Column _i_ -> reading one chunk until end, from storage -> intermediate 
> output queue -> read one uncompressed page until end -> output queue -> 
> (downstream ) decompression + decoding
> Note that this can be made completely self contained in ParquetFileReader and 
> downstream implementations like Iceberg and Spark will automatically be able 
> to take advantage without code change as long as the ParquetFileReader apis 
> are not changed. 
> In past work with async io  [Drill - async page reader 
> |https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java]
>  , I have seen 2x-3x improvement in reading speed for Parquet files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [parquet-mr] parthchandra commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2023-04-12 Thread via GitHub



parthchandra commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1506058505

   FWIW the hadoop 3.3.5 vector io changes might make this PR redundant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (PARQUET-2212) Add ByteBuffer api for decryptors to allow direct memory to be decrypted

2023-04-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711587#comment-17711587
 ] 

ASF GitHub Bot commented on PARQUET-2212:
-

parthchandra commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1506055435

   Sure. What does one need to do? 
   I believe all the comments are addressed and CI seems to be failing for 
unrelated reasons (is there a way to re-trigger the failed tests?). 




> Add ByteBuffer api for decryptors to allow direct memory to be decrypted
> 
>
> Key: PARQUET-2212
> URL: https://issues.apache.org/jira/browse/PARQUET-2212
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.12.3
>Reporter: Parth Chandra
>Priority: Major
>
> The decrypt API in BlockCipher.Decryptor currently only provides an api that 
> takes in a byte array
> {code:java}
> byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);{code}
> A parquet reader that uses the DirectByteBufferAllocator has to incur the 
> cost of copying the data into a byte array (and sometimes back to a 
> DirectByteBuffer) to decrypt data.
> This proposes adding a new API that accepts ByteBuffer as input and avoids 
> the data copy.
> {code:java}
> ByteBuffer decrypt(ByteBuffer from, byte[] AAD);{code}
> The decryption in ColumnChunkPageReadStore can also be updated to use the 
> ByteBuffer based api if the buffer is a DirectByteBuffer. If the buffer is a 
> HeapByteBuffer, then we can continue to use the byte array API since that 
> does not incur a copy when the underlying byte array is accessed.
> Also, some investigation has shown that decryption with ByteBuffers is not 
> able to use hardware acceleration in JVM's before JDK17. In those cases, the 
> overall decryption speed is faster with byte arrays even after incurring the 
> overhead of making a copy. 
> The proposal, then, is to enable the use of the ByteBuffer api for 
> DirectByteBuffers only, and only if the JDK is JDK17 or higher or the user 
> explicitly configures it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [parquet-mr] parthchandra commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

2023-04-12 Thread via GitHub



parthchandra commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1506055435

   Sure. What does one need to do? 
   I believe all the comments are addressed and CI seems to be failing for 
unrelated reasons (is there a way to re-trigger the failed tests?). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

RE: [C++] Parquet and Arrow overlap

2023-04-12 Thread wish maple

On 2023/02/01 19:27:22 Will Jones wrote:
> Hello,
>
> A while back, the Parquet C++ implementation was merged into the Apache
> Arrow monorepo [1]. As I understand it, this helped the development
process
> immensely. However, I am noticing some governance issues because of it.
>
> First, it's not obvious where issues are supposed to be open: In Parquet
> Jira or Arrow GitHub issues. Looking back at some of the original
> discussion, it looks like the intention was
>
> * use PARQUET-XXX for issues relating to Parquet core
> > * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
> > core (e.g. changes that are in parquet/arrow right now)
> >
>
> The README for the old parquet-cpp repo [3] states instead in it's
> migration note:
>
>  JIRA issues should continue to be opened in the PARQUET JIRA project.
>
>
> Either way, it doesn't seem like this process is obvious to people.
Perhaps
> we could clarify this and add notices to Arrow's GitHub issues template?
>
> Second, committer status is a little unclear. I am a committer on Arrow,
> but not on Parquet right now. Does that mean I should only merge Parquet
> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> Parquet changes at all?
>
> Also, are the contributions to Arrow C++ Parquet being actively reviewed
> for potential new committers?
>
> Best,
>
> Will Jones
>
> [1] https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw
> [2] https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j
> [3] https://github.com/apache/parquet-cpp
>

Personally, I think jira for Parquet is de facto only for Parquet format and
Java Parquet. The implementation for C++/Rust parquet is discussed in
their own repo now.

Best,

Xuwei Fu

[GitHub] [parquet-site] charlesmahler commented on pull request #30: add InfluxDB blog post link about Parquet catalog

2023-04-12 Thread via GitHub



charlesmahler commented on PR #30:
URL: https://github.com/apache/parquet-site/pull/30#issuecomment-1505370275

   @wgtmac yes please, no worries about the delay


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (PARQUET-2081) Encryption translation tool - Parquet-hadoop

2023-04-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711417#comment-17711417
 ] 

ASF GitHub Bot commented on PARQUET-2081:
-

richardkerr commented on PR #1048:
URL: https://github.com/apache/parquet-mr/pull/1048#issuecomment-1505354950

   I started committing against PARQUET-2081 because it was still open at the 
time and this issue was introduced by those changes, it seemed reasonable to me 
to attribute this fix to that ticket.   
   
   Otherwise.  I put in a request to get a jira account almost a week ago with 
no feedback since so unable to create anything.  Can you create the ticket for 
me in the meantime?  Or approve the jira access request?




> Encryption translation tool - Parquet-hadoop
> 
>
> Key: PARQUET-2081
> URL: https://issues.apache.org/jira/browse/PARQUET-2081
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
> Fix For: 1.12.3
>
>
> This is the implement the core part of the Encryption translation tool in 
> parquet-hadoop. After this, we will have another Jira/PR for parquet-cli to 
> integrate with key tools for encryption properties.. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [parquet-mr] richardkerr commented on pull request #1048: PARQUET-2081: Fix support for files without ColumnIndexes

2023-04-12 Thread via GitHub



richardkerr commented on PR #1048:
URL: https://github.com/apache/parquet-mr/pull/1048#issuecomment-1505354950

   I started committing against PARQUET-2081 because it was still open at the 
time and this issue was introduced by those changes, it seemed reasonable to me 
to attribute this fix to that ticket.   
   
   Otherwise.  I put in a request to get a jira account almost a week ago with 
no feedback since so unable to create anything.  Can you create the ticket for 
me in the meantime?  Or approve the jira access request?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (PARQUET-2149) Implement async IO for Parquet file reader

2023-04-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711388#comment-17711388
 ] 

ASF GitHub Bot commented on PARQUET-2149:
-

steveloughran commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1505269164

   @hazelnutsgz hadoop 3.3.5 supports vector IO on an s3 stream; async parallel 
fetch of blocks, which also works on local fs (and with gcs, abfs TODO items). 
we see significant performance increases there. There's a PR for it, though as 
it is 3.3.5+ only, not merged in to asf parquet branches unless the move or we 
finish a shim library to offer (serialized) support for the api on older 
releases.




> Implement async IO for Parquet file reader
> --
>
> Key: PARQUET-2149
> URL: https://issues.apache.org/jira/browse/PARQUET-2149
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Parth Chandra
>Priority: Major
>
> ParquetFileReader's implementation has the following flow (simplified) - 
>       - For every column -> Read from storage in 8MB blocks -> Read all 
> uncompressed pages into output queue 
>       - From output queues -> (downstream ) decompression + decoding
> This flow is serialized, which means that downstream threads are blocked 
> until the data has been read. Because a large part of the time spent is 
> waiting for data from storage, threads are idle and CPU utilization is really 
> low.
> There is no reason why this cannot be made asynchronous _and_ parallel. So 
> For Column _i_ -> reading one chunk until end, from storage -> intermediate 
> output queue -> read one uncompressed page until end -> output queue -> 
> (downstream ) decompression + decoding
> Note that this can be made completely self contained in ParquetFileReader and 
> downstream implementations like Iceberg and Spark will automatically be able 
> to take advantage without code change as long as the ParquetFileReader apis 
> are not changed. 
> In past work with async io  [Drill - async page reader 
> |https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java]
>  , I have seen 2x-3x improvement in reading speed for Parquet files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [parquet-mr] steveloughran commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2023-04-12 Thread via GitHub



steveloughran commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1505269164

   @hazelnutsgz hadoop 3.3.5 supports vector IO on an s3 stream; async parallel 
fetch of blocks, which also works on local fs (and with gcs, abfs TODO items). 
we see significant performance increases there. There's a PR for it, though as 
it is 3.3.5+ only, not merged in to asf parquet branches unless the move or we 
finish a shim library to offer (serialized) support for the api on older 
releases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (PARQUET-1989) Deep verification of encrypted files

2023-04-12 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711367#comment-17711367
 ] 

Steve Loughran commented on PARQUET-1989:
-

you might want to have a design which can do the scan on a spark rdd, where the 
rdd is simply the deep listFiles(path) scan of the directory tree. This would 
give the best scale for a massive dataset compared to even some parallelised 
scan in a single process.

I do have an RDD which can do line-by-line work, with locality of work 
determined on each file, which lets you schedule the work on the relevant hdfs 
nodes with the data; unfortunately it needs to be in the o.a.spark package to 
build
https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/main/scala/org/apache/spark/cloudera/ParallelizedWithLocalityRDD.scala

...that could maybe be added to spark itself.



> Deep verification of encrypted files
> 
>
> Key: PARQUET-1989
> URL: https://issues.apache.org/jira/browse/PARQUET-1989
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-cli
>Reporter: Gidon Gershinsky
>Assignee: Maya Anderson
>Priority: Major
> Fix For: 1.14.0
>
>
> A tools that verifies encryption of parquet files in a given folder. Analyzes 
> the footer, and then every module (page headers, pages, column indexes, bloom 
> filters) - making sure they are encrypted (in relevant columns). Potentially 
> checking the encryption keys.
> We'll start with a design doc, open for discussion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2198) Vulnerabilities in jackson-databind

2023-04-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711234#comment-17711234
 ] 

ASF GitHub Bot commented on PARQUET-2198:
-

nikhilenr commented on PR #1005:
URL: https://github.com/apache/parquet-mr/pull/1005#issuecomment-1504802025

   Hi All,
   
   New parquet-jackson version is released and reported cves are resolved with 
v 1.13.0.
   https://mvnrepository.com/artifact/org.apache.parquet/parquet-jackson/1.13.0
   
   Thanks @shangxinli for fixing.
   




> Vulnerabilities in jackson-databind
> ---
>
> Key: PARQUET-2198
> URL: https://issues.apache.org/jira/browse/PARQUET-2198
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.12.3
>Reporter: Łukasz Dziedziul
>Priority: Major
>  Labels: jackson-databind, security, vulnerabilities
> Fix For: 1.13.0
>
>
> Update jackson-databind to mitigate CVEs:
>  * [CVE-2022-42003|https://github.com/advisories/GHSA-jjjh-jjxp-wpff] - 
> [https://nvd.nist.gov/vuln/detail/CVE-2022-42003]
>  * [CVE-2022-42004|https://github.com/advisories/GHSA-rgv9-q543-rqg4] - 
> [https://nvd.nist.gov/vuln/detail/CVE-2022-42004 (fixed in  
> 2.13.4)|https://nvd.nist.gov/vuln/detail/CVE-2022-42004]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [parquet-mr] nikhilenr commented on pull request #1005: PARQUET-2198 : Updating jackson data bind version to fix CVEs

2023-04-12 Thread via GitHub



nikhilenr commented on PR #1005:
URL: https://github.com/apache/parquet-mr/pull/1005#issuecomment-1504802025

   Hi All,
   
   New parquet-jackson version is released and reported cves are resolved with 
v 1.13.0.
   https://mvnrepository.com/artifact/org.apache.parquet/parquet-jackson/1.13.0
   
   Thanks @shangxinli for fixing.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (PARQUET-2212) Add ByteBuffer api for decryptors to allow direct memory to be decrypted

[GitHub] [parquet-mr] wgtmac commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

[jira] [Commented] (PARQUET-2266) Fix support for files without ColumnIndexes

[GitHub] [parquet-mr] wgtmac commented on pull request #1048: PARQUET-2266: Fix support for files without ColumnIndexes

[jira] [Commented] (PARQUET-2266) Fix support for files without ColumnIndexes

[GitHub] [parquet-mr] wgtmac commented on pull request #1048: PARQUET-2266: Fix support for files without ColumnIndexes

[jira] [Created] (PARQUET-2266) Fix support for files without ColumnIndexes

[jira] [Commented] (PARQUET-2149) Implement async IO for Parquet file reader

[GitHub] [parquet-mr] parthchandra commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

[jira] [Commented] (PARQUET-2212) Add ByteBuffer api for decryptors to allow direct memory to be decrypted

[GitHub] [parquet-mr] parthchandra commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

RE: [C++] Parquet and Arrow overlap

[GitHub] [parquet-site] charlesmahler commented on pull request #30: add InfluxDB blog post link about Parquet catalog

[jira] [Commented] (PARQUET-2081) Encryption translation tool - Parquet-hadoop

[GitHub] [parquet-mr] richardkerr commented on pull request #1048: PARQUET-2081: Fix support for files without ColumnIndexes

[jira] [Commented] (PARQUET-2149) Implement async IO for Parquet file reader

[GitHub] [parquet-mr] steveloughran commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

[jira] [Commented] (PARQUET-1989) Deep verification of encrypted files

[jira] [Commented] (PARQUET-2198) Vulnerabilities in jackson-databind

[GitHub] [parquet-mr] nikhilenr commented on pull request #1005: PARQUET-2198 : Updating jackson data bind version to fix CVEs

20 matches

Site Navigation

Mail list logo

Footer information