[jira] [Commented] (PARQUET-2117) Add rowPosition API in parquet record readers

2022-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509355#comment-17509355 ] ASF GitHub Bot commented on PARQUET-2117: - shangxinli merged pull request #945:

[GitHub] [parquet-mr] shangxinli merged pull request #945: PARQUET-2117: Expose Row Index via ParquetReader and ParquetRecordReader

2022-03-19 Thread GitBox
shangxinli merged pull request #945: URL: https://github.com/apache/parquet-mr/pull/945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubs

[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay

2022-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509336#comment-17509336 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli commented on pull request

[GitHub] [parquet-mr] shangxinli commented on pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-19 Thread GitBox
shangxinli commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1073074540 Can you squash all the commits? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[jira] [Commented] (PARQUET-2006) Column resolution by ID

2022-03-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509332#comment-17509332 ] ASF GitHub Bot commented on PARQUET-2006: - shangxinli commented on pull request

[GitHub] [parquet-mr] shangxinli commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-19 Thread GitBox
shangxinli commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1073074030 Hi. @huaxingao Thanks for working on it. I just had a first-round review and left some comments. After we address them, I will have another look. -- This is an automat

Re: Multiple pages with indexes vs multiple row groups with one data page per chunk

2022-03-19 Thread Jacques Nadeau
I can take your comment two ways: what is the downside to large pages or what is the downside to small row groups. One of the key considerations I've dealt with is that page is the unit of compression and if I recall correctly, parquet uses block rather than stream compression. This means you typi

Multiple pages with indexes vs multiple row groups with one data page per chunk

2022-03-19 Thread Jorge Cardoso Leitão
Hi, I am trying to understand the benefits of using multiple data pages and indexes vs multiple row groups. Some basics first: row groups ensures that a sequence of rows are "aligned" at the group boundary independently of how they are divided in pages: row group 1: c1: |--p11--|--p12--|---p13-