[jira] [Updated] (PARQUET-1877) [C++] Reconcile container size with string size for memory issues

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated PARQUET-1877:

Labels: pull-request-available  (was: )

> [C++] Reconcile container size with string size for memory issues
> -
>
> Key: PARQUET-1877
> URL: https://issues.apache.org/jira/browse/PARQUET-1877
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now the size can cause allocations an order of magnitude larger then 
> string size limits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (PARQUET-1877) [C++] Reconcile container size with string size for memory issues

2020-06-16 Thread Micah Kornfield (Jira)
Micah Kornfield created PARQUET-1877:


 Summary: [C++] Reconcile container size with string size for 
memory issues
 Key: PARQUET-1877
 URL: https://issues.apache.org/jira/browse/PARQUET-1877
 Project: Parquet
  Issue Type: Bug
  Components: parquet-cpp
Reporter: Micah Kornfield
Assignee: Micah Kornfield


Right now the size can cause allocations an order of magnitude larger then 
string size limits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1872) Add TransCompression command

2020-06-16 Thread Xinli Shang (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17137967#comment-17137967
 ] 

Xinli Shang commented on PARQUET-1872:
--

[~gszadovszky]Thanks for the reply! I just manually linked the PR. 

For the subtask, I was thinking to have a review & changes first with 
parquet-tools then I can add it to parquet-cli instead of changing both at the 
same time. But that is also fine for me to have the two places changes at the 
same PR. I just add to parquet-cli in the newest PR. 

For Column and OffsetIndex, they are taken care of in my PR. I also added tests 
for both ColumnIndex and OffsetIndex validation. 

For bloom filter, I will work on the subtask when this PR is done. That would 
require to copy over the existing bloom filters to the new files.

Xinli 

> Add TransCompression command 
> -
>
> Key: PARQUET-1872
> URL: https://issues.apache.org/jira/browse/PARQUET-1872
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
>
> When ZSTD becomes more popular, there is a need to translate existing data to 
> ZSTD compressed which can achieve a higher compression ratio. It would be 
> useful if we can have a tool to convert a Parquet file directly by just 
> decompressing/compressing each page without decoding/encoding or assembling 
> the record because it is much faster. The initial result shows it is ~5 times 
> faster. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (PARQUET-1874) Add to parquet-cli

2020-06-16 Thread Xinli Shang (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinli Shang reassigned PARQUET-1874:


Assignee: Xinli Shang

> Add to parquet-cli
> --
>
> Key: PARQUET-1874
> URL: https://issues.apache.org/jira/browse/PARQUET-1874
> Project: Parquet
>  Issue Type: Sub-task
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1373) Encryption key management tools

2020-06-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136555#comment-17136555
 ] 

ASF GitHub Bot commented on PARQUET-1373:
-

gszadovszky commented on pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#issuecomment-644698810


   Thanks a lot for explanation. Agreed on protecting sensitive data in memory 
cannot be handled by Parquet.
   
   I'll wait for your comment for the next review round.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Encryption key management tools 
> 
>
> Key: PARQUET-1373
> URL: https://issues.apache.org/jira/browse/PARQUET-1373
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>
> Parquet Modular Encryption 
> ([PARQUET-1178|https://issues.apache.org/jira/browse/PARQUET-1178]) provides 
> an API that accepts keys, arbitrary key metadata and key retrieval callbacks 
> - which allows to implement basically any key management policy on top of it. 
> This Jira will add tools that implement a set of best practice elements for 
> key management. This is not an end-to-end key management, but rather a set of 
> components that might simplify design and development of an end-to-end 
> solution.
> This tool set is one of many possible. There is no goal to create a single or 
> “standard” toolkit for Parquet encryption keys. Parquet has a Crypto Factory 
> interface [(PARQUET-1817|https://issues.apache.org/jira/browse/PARQUET-1817]) 
> that allows to plug in different implementations of encryption key management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] gszadovszky commented on pull request #615: PARQUET-1373: Encryption key tools

2020-06-16 Thread GitBox


gszadovszky commented on pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#issuecomment-644698810


   Thanks a lot for explanation. Agreed on protecting sensitive data in memory 
cannot be handled by Parquet.
   
   I'll wait for your comment for the next review round.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org