[jira] [Commented] (PARQUET-2223) Parquet Data Masking for Column Encryption

2023-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677533#comment-17677533
 ] 

ASF GitHub Bot commented on PARQUET-2223:
-

zhangjiashen commented on PR #1016:
URL: https://github.com/apache/parquet-mr/pull/1016#issuecomment-1384639397

   > I found the doc. Could you provide me with a "comment" access, so we'll 
discuss the goals and design there? Thanks.
   
   @ggershinsky thanks for looking at this, I have added permission for you, 
feel free to add questions/comments!




> Parquet Data Masking for Column Encryption
> --
>
> Key: PARQUET-2223
> URL: https://issues.apache.org/jira/browse/PARQUET-2223
> Project: Parquet
>  Issue Type: Task
>Reporter: Jiashen Zhang
>Priority: Minor
>
> h1. Background
> h2. What is Data Masking?
> Data masking is the process of obfuscating sensitive data. Instead of 
> revealing PII data, masking allows us to return NULLs, hashes or redacted 
> data in its place. With data masking, users who are in the correct permission 
> groups can retrieve the original data and users without permissions will 
> receive masked data.
> h2. Why do we need it?
>  * Fined-Grained Access Control
> h2. Why do we want to enhance data masking?
>  
> Users might not have all permissions for all columns, existing code doesn’t 
> have support for us to skip columns that users don’t have permissions to 
> access. This enhancement will add this support so that users can decide to 
> skip some columns to avoid decryption error.
> h1. Design Requirements
>  # Users can skip some columns with a configuration
> h1. Proposed solution
> Key idea is to modify the request schema by removing skipped columns from the 
> schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [parquet-mr] zhangjiashen commented on pull request #1016: PARQUET-2223: Parquet Data Masking Enhancement for Column Encryption

2023-01-16 Thread GitBox


zhangjiashen commented on PR #1016:
URL: https://github.com/apache/parquet-mr/pull/1016#issuecomment-1384639397

   > I found the doc. Could you provide me with a "comment" access, so we'll 
discuss the goals and design there? Thanks.
   
   @ggershinsky thanks for looking at this, I have added permission for you, 
feel free to add questions/comments!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Feedback needed on PR #184 apache/parquet-format

2023-01-16 Thread Anja Boskovic

Hello!

I have a PR open proposing the addition of float-16/half-float logical
type in the parquet-format:
https://github.com/apache/parquet-format/pull/184 
.


I am looking for feedback on what the next step is. Does the PR need an
additional round of reviews before I send a poll to the mailing list? If
it does, do you have advice on who I could ask for a review? Does an
implementation need to occur before the mailing list poll?

Thanks

~* Anja


[jira] [Commented] (PARQUET-2223) Parquet Data Masking for Column Encryption

2023-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677379#comment-17677379
 ] 

ASF GitHub Bot commented on PARQUET-2223:
-

ggershinsky commented on PR #1016:
URL: https://github.com/apache/parquet-mr/pull/1016#issuecomment-1384054816

   I found the doc. Could you provide me with a "comment" access, so we'll 
discuss the goals and design there? Thanks.




> Parquet Data Masking for Column Encryption
> --
>
> Key: PARQUET-2223
> URL: https://issues.apache.org/jira/browse/PARQUET-2223
> Project: Parquet
>  Issue Type: Task
>Reporter: Jiashen Zhang
>Priority: Minor
>
> h1. Background
> h2. What is Data Masking?
> Data masking is the process of obfuscating sensitive data. Instead of 
> revealing PII data, masking allows us to return NULLs, hashes or redacted 
> data in its place. With data masking, users who are in the correct permission 
> groups can retrieve the original data and users without permissions will 
> receive masked data.
> h2. Why do we need it?
>  * Fined-Grained Access Control
> h2. Why do we want to enhance data masking?
>  
> Users might not have all permissions for all columns, existing code doesn’t 
> have support for us to skip columns that users don’t have permissions to 
> access. This enhancement will add this support so that users can decide to 
> skip some columns to avoid decryption error.
> h1. Design Requirements
>  # Users can skip some columns with a configuration
> h1. Proposed solution
> Key idea is to modify the request schema by removing skipped columns from the 
> schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [parquet-mr] ggershinsky commented on pull request #1016: PARQUET-2223: Parquet Data Masking Enhancement for Column Encryption

2023-01-16 Thread GitBox


ggershinsky commented on PR #1016:
URL: https://github.com/apache/parquet-mr/pull/1016#issuecomment-1384054816

   I found the doc. Could you provide me with a "comment" access, so we'll 
discuss the goals and design there? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PARQUET-2231) [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY

2023-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677313#comment-17677313
 ] 

ASF GitHub Bot commented on PARQUET-2231:
-

pitrou commented on PR #189:
URL: https://github.com/apache/parquet-format/pull/189#issuecomment-1383840418

   Also cc @rok




> [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY
> -
>
> Key: PARQUET-2231
> URL: https://issues.apache.org/jira/browse/PARQUET-2231
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Critical
> Fix For: format-2.10.0
>
>
> The spec says that DELTA_BYTE_ARRAY is only supported for BYTE_ARRAY, but in 
> parquet-mr it has been allowed for FIXED_LEN_BYTE_ARRAY as well since 2015.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [parquet-format] pitrou commented on pull request #189: PARQUET-2231: [Format] Allow DELTA_BYTE_ARRAY for FIXED_LEN_BYTE_ARRAY

2023-01-16 Thread GitBox


pitrou commented on PR #189:
URL: https://github.com/apache/parquet-format/pull/189#issuecomment-1383840418

   Also cc @rok


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PARQUET-2231) [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY

2023-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677307#comment-17677307
 ] 

ASF GitHub Bot commented on PARQUET-2231:
-

pitrou commented on PR #189:
URL: https://github.com/apache/parquet-format/pull/189#issuecomment-1383831257

   @emkornfield @gszadovszky @rdblue 




> [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY
> -
>
> Key: PARQUET-2231
> URL: https://issues.apache.org/jira/browse/PARQUET-2231
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Critical
> Fix For: format-2.10.0
>
>
> The spec says that DELTA_BYTE_ARRAY is only supported for BYTE_ARRAY, but in 
> parquet-mr it has been allowed for FIXED_LEN_BYTE_ARRAY as well since 2015.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [parquet-format] pitrou commented on pull request #189: PARQUET-2231: [Format] Allow DELTA_BYTE_ARRAY for FIXED_LEN_BYTE_ARRAY

2023-01-16 Thread GitBox


pitrou commented on PR #189:
URL: https://github.com/apache/parquet-format/pull/189#issuecomment-1383831257

   @emkornfield @gszadovszky @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PARQUET-2231) [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY

2023-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677306#comment-17677306
 ] 

ASF GitHub Bot commented on PARQUET-2231:
-

pitrou commented on PR #189:
URL: https://github.com/apache/parquet-format/pull/189#issuecomment-1383830870

   @wjones127 Could you help review the wording?




> [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY
> -
>
> Key: PARQUET-2231
> URL: https://issues.apache.org/jira/browse/PARQUET-2231
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Critical
> Fix For: format-2.10.0
>
>
> The spec says that DELTA_BYTE_ARRAY is only supported for BYTE_ARRAY, but in 
> parquet-mr it has been allowed for FIXED_LEN_BYTE_ARRAY as well since 2015.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2231) [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY

2023-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677305#comment-17677305
 ] 

ASF GitHub Bot commented on PARQUET-2231:
-

pitrou opened a new pull request, #189:
URL: https://github.com/apache/parquet-format/pull/189

   DELTA_BYTE_ARRAY has been supported for FIXED_LEN_BYTE_ARRAY by parquet-mr 
since 2015 (see PARQUET-152). Update the spec in consequence.
   
   Also improve wording, markup and add an example.
   
   




> [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY
> -
>
> Key: PARQUET-2231
> URL: https://issues.apache.org/jira/browse/PARQUET-2231
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Critical
> Fix For: format-2.10.0
>
>
> The spec says that DELTA_BYTE_ARRAY is only supported for BYTE_ARRAY, but in 
> parquet-mr it has been allowed for FIXED_LEN_BYTE_ARRAY as well since 2015.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [parquet-format] pitrou commented on pull request #189: PARQUET-2231: [Format] Allow DELTA_BYTE_ARRAY for FIXED_LEN_BYTE_ARRAY

2023-01-16 Thread GitBox


pitrou commented on PR #189:
URL: https://github.com/apache/parquet-format/pull/189#issuecomment-1383830870

   @wjones127 Could you help review the wording?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [parquet-format] pitrou opened a new pull request, #189: PARQUET-2231: [Format] Allow DELTA_BYTE_ARRAY for FIXED_LEN_BYTE_ARRAY

2023-01-16 Thread GitBox


pitrou opened a new pull request, #189:
URL: https://github.com/apache/parquet-format/pull/189

   DELTA_BYTE_ARRAY has been supported for FIXED_LEN_BYTE_ARRAY by parquet-mr 
since 2015 (see PARQUET-152). Update the spec in consequence.
   
   Also improve wording, markup and add an example.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (PARQUET-152) Encoding issue with fixed length byte arrays

2023-01-16 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated PARQUET-152:
---
Component/s: parquet-mr

> Encoding issue with fixed length byte arrays
> 
>
> Key: PARQUET-152
> URL: https://issues.apache.org/jira/browse/PARQUET-152
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Nezih Yigitbasi
>Assignee: Sergio Peña
>Priority: Minor
> Fix For: 1.8.0
>
>
> While running some tests against the master branch I hit an encoding issue 
> that seemed like a bug to me.
> I noticed that when writing a fixed length byte array and the array's size is 
> > dictionaryPageSize (in my test it was 512), the encoding falls back to 
> DELTA_BYTE_ARRAY as seen below:
> {noformat}
> Dec 17, 2014 3:41:10 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
> written 12,125B for [flba_field] FIXED_LEN_BYTE_ARRAY: 5,000 values, 1,710B 
> raw, 1,710B comp, 5 pages, encodings: [DELTA_BYTE_ARRAY]
> {noformat}
> But then read fails with the following exception:
> {noformat}
> Caused by: parquet.io.ParquetDecodingException: Encoding DELTA_BYTE_ARRAY is 
> only supported for type BINARY
>   at parquet.column.Encoding$7.getValuesReader(Encoding.java:193)
>   at 
> parquet.column.impl.ColumnReaderImpl.initDataReader(ColumnReaderImpl.java:534)
>   at 
> parquet.column.impl.ColumnReaderImpl.readPageV2(ColumnReaderImpl.java:574)
>   at 
> parquet.column.impl.ColumnReaderImpl.access$400(ColumnReaderImpl.java:54)
>   at 
> parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:518)
>   at 
> parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:510)
>   at parquet.column.page.DataPageV2.accept(DataPageV2.java:123)
>   at 
> parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:510)
>   at 
> parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:502)
>   at 
> parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:604)
>   at 
> parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:348)
>   at 
> parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:63)
>   at 
> parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:58)
>   at 
> parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:267)
>   at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:131)
>   at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:96)
>   at 
> parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:136)
>   at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:96)
>   at 
> parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:129)
>   at 
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:198)
>   ... 16 more
> {noformat}
> When the array's size is < dictionaryPageSize, RLE_DICTIONARY encoding is 
> used and read works fine:
> {noformat}
> Dec 17, 2014 3:39:50 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
> written 50B for [flba_field] FIXED_LEN_BYTE_ARRAY: 5,000 values, 3B raw, 3B 
> comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 1 entries, 8B raw, 
> 1B comp}
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2231) [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY

2023-01-16 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677300#comment-17677300
 ] 

Antoine Pitrou commented on PARQUET-2231:
-

[~rok] [~shanhuang] [~muthunagappan] [~jinshang] FYI

> [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY
> -
>
> Key: PARQUET-2231
> URL: https://issues.apache.org/jira/browse/PARQUET-2231
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Critical
> Fix For: format-2.10.0
>
>
> The spec says that DELTA_BYTE_ARRAY is only supported for BYTE_ARRAY, but in 
> parquet-mr it has been allowed for FIXED_LEN_BYTE_ARRAY as well since 2015.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PARQUET-2231) [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY

2023-01-16 Thread Antoine Pitrou (Jira)
Antoine Pitrou created PARQUET-2231:
---

 Summary: [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY
 Key: PARQUET-2231
 URL: https://issues.apache.org/jira/browse/PARQUET-2231
 Project: Parquet
  Issue Type: Bug
  Components: parquet-format
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou
 Fix For: format-2.10.0


The spec says that DELTA_BYTE_ARRAY is only supported for BYTE_ARRAY, but in 
parquet-mr it has been allowed for FIXED_LEN_BYTE_ARRAY as well since 2015.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-152) Encoding issue with fixed length byte arrays

2023-01-16 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677297#comment-17677297
 ] 

Antoine Pitrou commented on PARQUET-152:


It would be nice if the encodings spec had been updated as well, because for 
now it mentions that DELTA_BYTE_ARRAY is only supported for BYTE_ARRAY columns, 
not FIXED_LEN_BYTE_ARRAY. See PARQUET-2231.

> Encoding issue with fixed length byte arrays
> 
>
> Key: PARQUET-152
> URL: https://issues.apache.org/jira/browse/PARQUET-152
> Project: Parquet
>  Issue Type: Bug
>Reporter: Nezih Yigitbasi
>Assignee: Sergio Peña
>Priority: Minor
> Fix For: 1.8.0
>
>
> While running some tests against the master branch I hit an encoding issue 
> that seemed like a bug to me.
> I noticed that when writing a fixed length byte array and the array's size is 
> > dictionaryPageSize (in my test it was 512), the encoding falls back to 
> DELTA_BYTE_ARRAY as seen below:
> {noformat}
> Dec 17, 2014 3:41:10 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
> written 12,125B for [flba_field] FIXED_LEN_BYTE_ARRAY: 5,000 values, 1,710B 
> raw, 1,710B comp, 5 pages, encodings: [DELTA_BYTE_ARRAY]
> {noformat}
> But then read fails with the following exception:
> {noformat}
> Caused by: parquet.io.ParquetDecodingException: Encoding DELTA_BYTE_ARRAY is 
> only supported for type BINARY
>   at parquet.column.Encoding$7.getValuesReader(Encoding.java:193)
>   at 
> parquet.column.impl.ColumnReaderImpl.initDataReader(ColumnReaderImpl.java:534)
>   at 
> parquet.column.impl.ColumnReaderImpl.readPageV2(ColumnReaderImpl.java:574)
>   at 
> parquet.column.impl.ColumnReaderImpl.access$400(ColumnReaderImpl.java:54)
>   at 
> parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:518)
>   at 
> parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:510)
>   at parquet.column.page.DataPageV2.accept(DataPageV2.java:123)
>   at 
> parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:510)
>   at 
> parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:502)
>   at 
> parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:604)
>   at 
> parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:348)
>   at 
> parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:63)
>   at 
> parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:58)
>   at 
> parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:267)
>   at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:131)
>   at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:96)
>   at 
> parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:136)
>   at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:96)
>   at 
> parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:129)
>   at 
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:198)
>   ... 16 more
> {noformat}
> When the array's size is < dictionaryPageSize, RLE_DICTIONARY encoding is 
> used and read works fine:
> {noformat}
> Dec 17, 2014 3:39:50 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: 
> written 50B for [flba_field] FIXED_LEN_BYTE_ARRAY: 5,000 values, 3B raw, 3B 
> comp, 1 pages, encodings: [RLE_DICTIONARY, PLAIN], dic { 1 entries, 8B raw, 
> 1B comp}
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2226) Support merge Bloom Filter

2023-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677290#comment-17677290
 ] 

ASF GitHub Bot commented on PARQUET-2226:
-

gszadovszky commented on PR #1020:
URL: https://github.com/apache/parquet-mr/pull/1020#issuecomment-1383783806

   Sure. :)
   Please double-check the jira if I assigned it to the correct one.




> Support merge Bloom Filter
> --
>
> Key: PARQUET-2226
> URL: https://issues.apache.org/jira/browse/PARQUET-2226
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Mars
>Assignee: miracle
>Priority: Major
>
> We need to collect Parquet's bloom filter of multiple files, and then 
> synthesize a more comprehensive bloom filter for common use. 
> Guava supports similar api operations
> https://guava.dev/releases/31.0.1-jre/api/docs/src-html/com/google/common/hash/BloomFilter.html#line.252



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PARQUET-2226) Support merge Bloom Filter

2023-01-16 Thread Gabor Szadovszky (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky resolved PARQUET-2226.
---
Resolution: Fixed

> Support merge Bloom Filter
> --
>
> Key: PARQUET-2226
> URL: https://issues.apache.org/jira/browse/PARQUET-2226
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Mars
>Assignee: miracle
>Priority: Major
>
> We need to collect Parquet's bloom filter of multiple files, and then 
> synthesize a more comprehensive bloom filter for common use. 
> Guava supports similar api operations
> https://guava.dev/releases/31.0.1-jre/api/docs/src-html/com/google/common/hash/BloomFilter.html#line.252



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [parquet-mr] gszadovszky commented on pull request #1020: PARQUET-2226 Support merge bloom filters

2023-01-16 Thread GitBox


gszadovszky commented on PR #1020:
URL: https://github.com/apache/parquet-mr/pull/1020#issuecomment-1383783806

   Sure. :)
   Please double-check the jira if I assigned it to the correct one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (PARQUET-2226) Support merge Bloom Filter

2023-01-16 Thread Gabor Szadovszky (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky reassigned PARQUET-2226:
-

Assignee: miracle

> Support merge Bloom Filter
> --
>
> Key: PARQUET-2226
> URL: https://issues.apache.org/jira/browse/PARQUET-2226
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Mars
>Assignee: miracle
>Priority: Major
>
> We need to collect Parquet's bloom filter of multiple files, and then 
> synthesize a more comprehensive bloom filter for common use. 
> Guava supports similar api operations
> https://guava.dev/releases/31.0.1-jre/api/docs/src-html/com/google/common/hash/BloomFilter.html#line.252



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (PARQUET-2226) Support merge Bloom Filter

2023-01-16 Thread Gabor Szadovszky (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky reassigned PARQUET-2226:
-

Assignee: (was: miracle)

> Support merge Bloom Filter
> --
>
> Key: PARQUET-2226
> URL: https://issues.apache.org/jira/browse/PARQUET-2226
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Mars
>Priority: Major
>
> We need to collect Parquet's bloom filter of multiple files, and then 
> synthesize a more comprehensive bloom filter for common use. 
> Guava supports similar api operations
> https://guava.dev/releases/31.0.1-jre/api/docs/src-html/com/google/common/hash/BloomFilter.html#line.252



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (PARQUET-2226) Support merge Bloom Filter

2023-01-16 Thread Gabor Szadovszky (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky reassigned PARQUET-2226:
-

Assignee: miracle

> Support merge Bloom Filter
> --
>
> Key: PARQUET-2226
> URL: https://issues.apache.org/jira/browse/PARQUET-2226
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Mars
>Assignee: miracle
>Priority: Major
>
> We need to collect Parquet's bloom filter of multiple files, and then 
> synthesize a more comprehensive bloom filter for common use. 
> Guava supports similar api operations
> https://guava.dev/releases/31.0.1-jre/api/docs/src-html/com/google/common/hash/BloomFilter.html#line.252



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [parquet-mr] yabola commented on pull request #1020: PARQUET-2226 Support merge bloom filters

2023-01-16 Thread GitBox


yabola commented on PR #1020:
URL: https://github.com/apache/parquet-mr/pull/1020#issuecomment-1383715909

   @wgtmac Thank you for your detailed review and @gszadovszky help.
   My jira id is miracle


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PARQUET-2226) Support merge Bloom Filter

2023-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677264#comment-17677264
 ] 

ASF GitHub Bot commented on PARQUET-2226:
-

yabola commented on PR #1020:
URL: https://github.com/apache/parquet-mr/pull/1020#issuecomment-1383715909

   @wgtmac Thank you for your detailed review and @gszadovszky help.
   My jira id is miracle




> Support merge Bloom Filter
> --
>
> Key: PARQUET-2226
> URL: https://issues.apache.org/jira/browse/PARQUET-2226
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Mars
>Priority: Major
>
> We need to collect Parquet's bloom filter of multiple files, and then 
> synthesize a more comprehensive bloom filter for common use. 
> Guava supports similar api operations
> https://guava.dev/releases/31.0.1-jre/api/docs/src-html/com/google/common/hash/BloomFilter.html#line.252



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2226) Support merge Bloom Filter

2023-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677251#comment-17677251
 ] 

ASF GitHub Bot commented on PARQUET-2226:
-

gszadovszky commented on PR #1020:
URL: https://github.com/apache/parquet-mr/pull/1020#issuecomment-1383689874

   @yabola, what is your jira account? I'd like to assign the jira to you 
before closing.




> Support merge Bloom Filter
> --
>
> Key: PARQUET-2226
> URL: https://issues.apache.org/jira/browse/PARQUET-2226
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Mars
>Priority: Major
>
> We need to collect Parquet's bloom filter of multiple files, and then 
> synthesize a more comprehensive bloom filter for common use. 
> Guava supports similar api operations
> https://guava.dev/releases/31.0.1-jre/api/docs/src-html/com/google/common/hash/BloomFilter.html#line.252



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [parquet-mr] gszadovszky commented on pull request #1020: PARQUET-2226 Support merge bloom filters

2023-01-16 Thread GitBox


gszadovszky commented on PR #1020:
URL: https://github.com/apache/parquet-mr/pull/1020#issuecomment-1383689874

   @yabola, what is your jira account? I'd like to assign the jira to you 
before closing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PARQUET-2226) Support merge Bloom Filter

2023-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677249#comment-17677249
 ] 

ASF GitHub Bot commented on PARQUET-2226:
-

gszadovszky merged PR #1020:
URL: https://github.com/apache/parquet-mr/pull/1020




> Support merge Bloom Filter
> --
>
> Key: PARQUET-2226
> URL: https://issues.apache.org/jira/browse/PARQUET-2226
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Mars
>Priority: Major
>
> We need to collect Parquet's bloom filter of multiple files, and then 
> synthesize a more comprehensive bloom filter for common use. 
> Guava supports similar api operations
> https://guava.dev/releases/31.0.1-jre/api/docs/src-html/com/google/common/hash/BloomFilter.html#line.252



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [parquet-mr] gszadovszky merged pull request #1020: PARQUET-2226 Support merge bloom filters

2023-01-16 Thread GitBox


gszadovszky merged PR #1020:
URL: https://github.com/apache/parquet-mr/pull/1020


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org