[jira] [Commented] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap
[ https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503301#comment-17503301 ] ASF GitHub Bot commented on PARQUET-2134: - 7c00 commented on pull request #951: URL: https://github.com/apache/parquet-mr/pull/951#issuecomment-1062551863 Related issue: https://github.com/prestodb/presto/pull/17435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Incorrect type checking in HadoopStreams.wrap > - > > Key: PARQUET-2134 > URL: https://issues.apache.org/jira/browse/PARQUET-2134 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.3, 1.10.1, 1.11.2, 1.12.2 >Reporter: Todd Gao >Priority: Minor > > The method > [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51] > wraps an FSDataInputStream to a SeekableInputStream. > It checks whether the underlying stream of the passed FSDataInputStream > implements ByteBufferReadable: if true, wraps the FSDataInputStream to > H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream. > In some cases, we may add another wrapper over FSDataInputStream. For > example, > {code:java} > class CustomDataInputStream extends FSDataInputStream { > public CustomDataInputStream(FSDataInputStream original) { > super(original); > } > } > {code} > When we create an FSDataInputStream, whose underlying stream does not > implements ByteBufferReadable, and then creates a CustomDataInputStream with > it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get > an error like > {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported > by input stream{quote} > We can fix this by taking recursive checks over the underlying stream of > FSDataInputStream. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [parquet-mr] 7c00 commented on pull request #951: PARQUET-2134: Fix type checking in HadoopStreams.wrap
7c00 commented on pull request #951: URL: https://github.com/apache/parquet-mr/pull/951#issuecomment-1062551863 Related issue: https://github.com/prestodb/presto/pull/17435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap
[ https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503284#comment-17503284 ] ASF GitHub Bot commented on PARQUET-2134: - 7c00 opened a new pull request #951: URL: https://github.com/apache/parquet-mr/pull/951 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2134 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [x] does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Incorrect type checking in HadoopStreams.wrap > - > > Key: PARQUET-2134 > URL: https://issues.apache.org/jira/browse/PARQUET-2134 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.3, 1.10.1, 1.11.2, 1.12.2 >Reporter: Todd Gao >Priority: Minor > > The method > [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51] > wraps an FSDataInputStream to a SeekableInputStream. > It checks whether the underlying stream of the passed FSDataInputStream > implements ByteBufferReadable: if true, wraps the FSDataInputStream to > H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream. > In some cases, we may add another wrapper over FSDataInputStream. For > example, > {code:java} > class CustomDataInputStream extends FSDataInputStream { > public CustomDataInputStream(FSDataInputStream original) { > super(original); > } > } > {code} > When we create an FSDataInputStream, whose underlying stream does not > implements ByteBufferReadable, and then creates a CustomDataInputStream with > it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get > an error like > {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported > by input stream{quote} > We can fix this by taking recursive checks over the underlying stream of > FSDataInputStream. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [parquet-mr] 7c00 opened a new pull request #951: PARQUET-2134: Fix type checking in HadoopStreams.wrap
7c00 opened a new pull request #951: URL: https://github.com/apache/parquet-mr/pull/951 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2134 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [x] does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap
[ https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Gao updated PARQUET-2134: -- Affects Version/s: 1.12.2 1.11.2 1.10.1 1.8.3 > Incorrect type checking in HadoopStreams.wrap > - > > Key: PARQUET-2134 > URL: https://issues.apache.org/jira/browse/PARQUET-2134 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.3, 1.10.1, 1.11.2, 1.12.2 >Reporter: Todd Gao >Priority: Minor > > The method > [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51] > wraps an FSDataInputStream to a SeekableInputStream. > It checks whether the underlying stream of the passed FSDataInputStream > implements ByteBufferReadable: if true, wraps the FSDataInputStream to > H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream. > In some cases, we may add another wrapper over FSDataInputStream. For > example, > {code:java} > class CustomDataInputStream extends FSDataInputStream { > public CustomDataInputStream(FSDataInputStream original) { > super(original); > } > } > {code} > When we create an FSDataInputStream, whose underlying stream does not > implements ByteBufferReadable, and then creates a CustomDataInputStream with > it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get > an error like > {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported > by input stream{quote} > We can fix this by taking recursive checks over the underlying stream of > FSDataInputStream. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap
[ https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Gao updated PARQUET-2134: -- Description: The method [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51] wraps an FSDataInputStream to a SeekableInputStream. It checks whether the underlying stream of the passed FSDataInputStream implements ByteBufferReadable: if true, wraps the FSDataInputStream to H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream. In some cases, we may add another wrapper over FSDataInputStream. For example, {code:java} class CustomDataInputStream extends FSDataInputStream { public CustomDataInputStream(FSDataInputStream original) { super(original); } } {code} When we create an FSDataInputStream, whose underlying stream does not implements ByteBufferReadable, and then creates a CustomDataInputStream with it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get an error like {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream{quote} We can fix this by taking recursive checks over the underlying stream of FSDataInputStream. was: The method [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51] wraps an FSDataInputStream to a SeekableInputStream. It checks whether the underlying stream of the passed FSDataInputStream implements ByteBufferReadable: if true, wraps the FSDataInputStream to H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream. In some cases, we may add another wrapper over FSDataInputStream. For example, {code:java} class CustomDataInputStream extends FSDataInputStream { public CustomDataInputStream(FSDataInputStream original) { super(original); } } {code} When we create an FSDataInputStream, whose underlying stream does not implements ByteBufferReadable, and then creates a CustomDataInputStream with it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get an error like {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream{quote}. We can fix this by taking recursive checks over the underlying stream of FSDataInputStream. > Incorrect type checking in HadoopStreams.wrap > - > > Key: PARQUET-2134 > URL: https://issues.apache.org/jira/browse/PARQUET-2134 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Todd Gao >Priority: Minor > > The method > [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51] > wraps an FSDataInputStream to a SeekableInputStream. > It checks whether the underlying stream of the passed FSDataInputStream > implements ByteBufferReadable: if true, wraps the FSDataInputStream to > H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream. > In some cases, we may add another wrapper over FSDataInputStream. For > example, > {code:java} > class CustomDataInputStream extends FSDataInputStream { > public CustomDataInputStream(FSDataInputStream original) { > super(original); > } > } > {code} > When we create an FSDataInputStream, whose underlying stream does not > implements ByteBufferReadable, and then creates a CustomDataInputStream with > it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get > an error like > {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported > by input stream{quote} > We can fix this by taking recursive checks over the underlying stream of > FSDataInputStream. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap
Todd Gao created PARQUET-2134: - Summary: Incorrect type checking in HadoopStreams.wrap Key: PARQUET-2134 URL: https://issues.apache.org/jira/browse/PARQUET-2134 Project: Parquet Issue Type: Bug Components: parquet-mr Reporter: Todd Gao The method [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51] wraps an FSDataInputStream to a SeekableInputStream. It checks whether the underlying stream of the passed FSDataInputStream implements ByteBufferReadable: if true, wraps the FSDataInputStream to H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream. In some cases, we may add another wrapper over FSDataInputStream. For example, {code:java} class CustomDataInputStream extends FSDataInputStream { public CustomDataInputStream(FSDataInputStream original) { super(original); } } {code} When we create an FSDataInputStream, whose underlying stream does not implements ByteBufferReadable, and then creates a CustomDataInputStream with it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get an error like {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream{quote}. We can fix this by taking recursive checks over the underlying stream of FSDataInputStream. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (PARQUET-2129) Add uncompressedSize to "meta" output
[ https://issues.apache.org/jira/browse/PARQUET-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoo Ganesh resolved PARQUET-2129. --- Resolution: Fixed https://github.com/apache/parquet-mr/pull/949 > Add uncompressedSize to "meta" output > - > > Key: PARQUET-2129 > URL: https://issues.apache.org/jira/browse/PARQUET-2129 > Project: Parquet > Issue Type: Improvement >Reporter: Vinoo Ganesh >Assignee: Vinoo Ganesh >Priority: Minor > > The `uncompressedSize` is currently not printed in the output of the parquet > meta command. This PR adds the uncompressedSize in to the output. > This was also reported by Deepak Gangwar. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2129) Add uncompressedSize to "meta" output
[ https://issues.apache.org/jira/browse/PARQUET-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503206#comment-17503206 ] Vinoo Ganesh commented on PARQUET-2129: --- Fixed in: https://github.com/apache/parquet-mr/pull/949 > Add uncompressedSize to "meta" output > - > > Key: PARQUET-2129 > URL: https://issues.apache.org/jira/browse/PARQUET-2129 > Project: Parquet > Issue Type: Improvement >Reporter: Vinoo Ganesh >Assignee: Vinoo Ganesh >Priority: Minor > > The `uncompressedSize` is currently not printed in the output of the parquet > meta command. This PR adds the uncompressedSize in to the output. > This was also reported by Deepak Gangwar. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (PARQUET-2128) Bump Thrift to 0.16.0
[ https://issues.apache.org/jira/browse/PARQUET-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoo Ganesh resolved PARQUET-2128. --- Resolution: Fixed Fixed in https://github.com/apache/parquet-mr/pull/948 > Bump Thrift to 0.16.0 > - > > Key: PARQUET-2128 > URL: https://issues.apache.org/jira/browse/PARQUET-2128 > Project: Parquet > Issue Type: Improvement >Reporter: Vinoo Ganesh >Assignee: Vinoo Ganesh >Priority: Minor > > Thrift 0.16.0 has been released > https://github.com/apache/thrift/releases/tag/v0.16.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [parquet-site] shangxinli merged pull request #12: Docsy Submodule Added
shangxinli merged pull request #12: URL: https://github.com/apache/parquet-site/pull/12 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-site] vinooganesh opened a new pull request #12: Docsy Submodule Added
vinooganesh opened a new pull request #12: URL: https://github.com/apache/parquet-site/pull/12 Final part of new site - the docsy submodule -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-site] shangxinli commented on pull request #6: Fix small typos in latest documentation
shangxinli commented on pull request #6: URL: https://github.com/apache/parquet-site/pull/6#issuecomment-1062210397 Thanks for working on it @vegarsti! Do you want to rebase your change given @vinooganesh just redesigned the webpage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-site] shangxinli merged pull request #11: Final hidden files to kick off automation
shangxinli merged pull request #11: URL: https://github.com/apache/parquet-site/pull/11 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-site] vinooganesh opened a new pull request #11: Final hidden files to kick off automation
vinooganesh opened a new pull request #11: URL: https://github.com/apache/parquet-site/pull/11 Adding the final files to kick off the publication of the site -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2006) Column resolution by ID
[ https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503093#comment-17503093 ] ASF GitHub Bot commented on PARQUET-2006: - huaxingao commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1062056376 cc @ggershinsky @sunchao Could you please take a look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Column resolution by ID > --- > > Key: PARQUET-2006 > URL: https://issues.apache.org/jira/browse/PARQUET-2006 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr >Reporter: Xinli Shang >Assignee: Xinli Shang >Priority: Major > > Parquet relies on the name. In a lot of usages e.g. schema resolution, this > would be a problem. Iceberg uses ID and stored Id/name mappings. > This Jira is to add column ID resolution support. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [parquet-mr] huaxingao commented on pull request #950: PARQUET-2006: Column resolution by ID
huaxingao commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1062056376 cc @ggershinsky @sunchao Could you please take a look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-site] shangxinli merged pull request #10: New Hugo-based website
shangxinli merged pull request #10: URL: https://github.com/apache/parquet-site/pull/10 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-site] shangxinli merged pull request #9: Clearning out all contents of parquet staging
shangxinli merged pull request #9: URL: https://github.com/apache/parquet-site/pull/9 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [parquet-site] vinooganesh opened a new pull request #10: New Hugo-based website
vinooganesh opened a new pull request #10: URL: https://github.com/apache/parquet-site/pull/10 This commit updates the current parquet website to the new Hugo-based version of the website (live demo can be found here: https://parquet.vinoo.io/). This PR will need to merge https://github.com/apache/parquet-site/pull/9 as a prerequisite. This PR also kicks off the new workflow for parquet website development. Our workflow will look as follows. We will make PRs against either the master branch (for prod) or staging branch for staging, and upon merge to that branch, kick off automation (though a Github Actions workflow) that will build the site and publish the artifacts to either the asf-site branch for production or the asf-staging branch for staging. More information can be found in the readme. Once this PR merges, `parquet.staged.apache.org` should be updated with the new website. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-2117) Add rowPosition API in parquet record readers
[ https://issues.apache.org/jira/browse/PARQUET-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502820#comment-17502820 ] ASF GitHub Bot commented on PARQUET-2117: - prakharjain09 commented on pull request #945: URL: https://github.com/apache/parquet-mr/pull/945#issuecomment-1061584498 @shangxinli Thanks for taking another look. I have addressed all comments other [than one](https://github.com/apache/parquet-mr/pull/945#discussion_r820928524). Please advice on the same. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add rowPosition API in parquet record readers > - > > Key: PARQUET-2117 > URL: https://issues.apache.org/jira/browse/PARQUET-2117 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr >Reporter: Prakhar Jain >Priority: Major > Fix For: 1.13.0 > > > Currently the parquet-mr RecordReader/ParquetFileReader exposes API’s to read > parquet file in columnar fashion or record-by-record. > It will be great to extend them to also support rowPosition API which can > tell the position of the current record in the parquet file. > The rowPosition can be used as a unique row identifier to mark a row. This > can be useful to create an index (e.g. B+ tree) over a parquet file/parquet > table (e.g. Spark/Hive). > There are multiple projects in the parquet eco-system which can benefit from > such a functionality: > # Apache Iceberg needs this functionality. It has this implementation > already as it relies on low level parquet APIs - > [Link1|https://github.com/apache/iceberg/blob/apache-iceberg-0.12.1/parquet/src/main/java/org/apache/iceberg/parquet/ReadConf.java#L171], > > [Link2|https://github.com/apache/iceberg/blob/d4052a73f14b63e1f519aaa722971dc74f8c9796/core/src/main/java/org/apache/iceberg/MetadataColumns.java#L37] > # Apache Spark can use this functionality - SPARK-37980 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [parquet-mr] prakharjain09 commented on pull request #945: PARQUET-2117: Expose Row Index via ParquetReader and ParquetRecordReader
prakharjain09 commented on pull request #945: URL: https://github.com/apache/parquet-mr/pull/945#issuecomment-1061584498 @shangxinli Thanks for taking another look. I have addressed all comments other [than one](https://github.com/apache/parquet-mr/pull/945#discussion_r820928524). Please advice on the same. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org