[jira] [Commented] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap

2022-03-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503301#comment-17503301
 ] 

ASF GitHub Bot commented on PARQUET-2134:
-

7c00 commented on pull request #951:
URL: https://github.com/apache/parquet-mr/pull/951#issuecomment-1062551863


   Related issue: https://github.com/prestodb/presto/pull/17435


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incorrect type checking in HadoopStreams.wrap
> -
>
> Key: PARQUET-2134
> URL: https://issues.apache.org/jira/browse/PARQUET-2134
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.3, 1.10.1, 1.11.2, 1.12.2
>Reporter: Todd Gao
>Priority: Minor
>
> The method 
> [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51]
>  wraps an FSDataInputStream to a SeekableInputStream. 
> It checks whether the underlying stream of the passed  FSDataInputStream 
> implements ByteBufferReadable: if true, wraps the FSDataInputStream to 
> H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream.
> In some cases, we may add another wrapper over FSDataInputStream. For 
> example, 
> {code:java}
> class CustomDataInputStream extends FSDataInputStream {
> public CustomDataInputStream(FSDataInputStream original) {
> super(original);
> }
> }
> {code}
> When we create an FSDataInputStream, whose underlying stream does not 
> implements ByteBufferReadable, and then creates a CustomDataInputStream with 
> it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get 
> an error like 
> {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported 
> by input stream{quote}
> We can fix this by taking recursive checks over the underlying stream of 
> FSDataInputStream.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [parquet-mr] 7c00 commented on pull request #951: PARQUET-2134: Fix type checking in HadoopStreams.wrap

2022-03-08 Thread GitBox


7c00 commented on pull request #951:
URL: https://github.com/apache/parquet-mr/pull/951#issuecomment-1062551863


   Related issue: https://github.com/prestodb/presto/pull/17435


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap

2022-03-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503284#comment-17503284
 ] 

ASF GitHub Bot commented on PARQUET-2134:
-

7c00 opened a new pull request #951:
URL: https://github.com/apache/parquet-mr/pull/951


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
 - https://issues.apache.org/jira/browse/PARQUET-2134
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [x] does not need testing for this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incorrect type checking in HadoopStreams.wrap
> -
>
> Key: PARQUET-2134
> URL: https://issues.apache.org/jira/browse/PARQUET-2134
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.3, 1.10.1, 1.11.2, 1.12.2
>Reporter: Todd Gao
>Priority: Minor
>
> The method 
> [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51]
>  wraps an FSDataInputStream to a SeekableInputStream. 
> It checks whether the underlying stream of the passed  FSDataInputStream 
> implements ByteBufferReadable: if true, wraps the FSDataInputStream to 
> H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream.
> In some cases, we may add another wrapper over FSDataInputStream. For 
> example, 
> {code:java}
> class CustomDataInputStream extends FSDataInputStream {
> public CustomDataInputStream(FSDataInputStream original) {
> super(original);
> }
> }
> {code}
> When we create an FSDataInputStream, whose underlying stream does not 
> implements ByteBufferReadable, and then creates a CustomDataInputStream with 
> it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get 
> an error like 
> {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported 
> by input stream{quote}
> We can fix this by taking recursive checks over the underlying stream of 
> FSDataInputStream.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [parquet-mr] 7c00 opened a new pull request #951: PARQUET-2134: Fix type checking in HadoopStreams.wrap

2022-03-08 Thread GitBox


7c00 opened a new pull request #951:
URL: https://github.com/apache/parquet-mr/pull/951


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
 - https://issues.apache.org/jira/browse/PARQUET-2134
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [x] does not need testing for this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap

2022-03-08 Thread Todd Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Gao updated PARQUET-2134:
--
Affects Version/s: 1.12.2
   1.11.2
   1.10.1
   1.8.3

> Incorrect type checking in HadoopStreams.wrap
> -
>
> Key: PARQUET-2134
> URL: https://issues.apache.org/jira/browse/PARQUET-2134
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.3, 1.10.1, 1.11.2, 1.12.2
>Reporter: Todd Gao
>Priority: Minor
>
> The method 
> [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51]
>  wraps an FSDataInputStream to a SeekableInputStream. 
> It checks whether the underlying stream of the passed  FSDataInputStream 
> implements ByteBufferReadable: if true, wraps the FSDataInputStream to 
> H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream.
> In some cases, we may add another wrapper over FSDataInputStream. For 
> example, 
> {code:java}
> class CustomDataInputStream extends FSDataInputStream {
> public CustomDataInputStream(FSDataInputStream original) {
> super(original);
> }
> }
> {code}
> When we create an FSDataInputStream, whose underlying stream does not 
> implements ByteBufferReadable, and then creates a CustomDataInputStream with 
> it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get 
> an error like 
> {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported 
> by input stream{quote}
> We can fix this by taking recursive checks over the underlying stream of 
> FSDataInputStream.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap

2022-03-08 Thread Todd Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Gao updated PARQUET-2134:
--
Description: 
The method 
[HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51]
 wraps an FSDataInputStream to a SeekableInputStream. 

It checks whether the underlying stream of the passed  FSDataInputStream 
implements ByteBufferReadable: if true, wraps the FSDataInputStream to 
H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream.

In some cases, we may add another wrapper over FSDataInputStream. For example, 

{code:java}
class CustomDataInputStream extends FSDataInputStream {
public CustomDataInputStream(FSDataInputStream original) {
super(original);
}
}
{code}

When we create an FSDataInputStream, whose underlying stream does not 
implements ByteBufferReadable, and then creates a CustomDataInputStream with 
it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get an 
error like 
{quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported by 
input stream{quote}

We can fix this by taking recursive checks over the underlying stream of 
FSDataInputStream.


  was:
The method 
[HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51]
 wraps an FSDataInputStream to a SeekableInputStream. 

It checks whether the underlying stream of the passed  FSDataInputStream 
implements ByteBufferReadable: if true, wraps the FSDataInputStream to 
H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream.

In some cases, we may add another wrapper over FSDataInputStream. For example, 

{code:java}
class CustomDataInputStream extends FSDataInputStream {
public CustomDataInputStream(FSDataInputStream original) {
super(original);
}
}
{code}

When we create an FSDataInputStream, whose underlying stream does not 
implements ByteBufferReadable, and then creates a CustomDataInputStream with 
it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get an 
error like 
{quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported by 
input stream{quote}.

We can fix this by taking recursive checks over the underlying stream of 
FSDataInputStream.



> Incorrect type checking in HadoopStreams.wrap
> -
>
> Key: PARQUET-2134
> URL: https://issues.apache.org/jira/browse/PARQUET-2134
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Todd Gao
>Priority: Minor
>
> The method 
> [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51]
>  wraps an FSDataInputStream to a SeekableInputStream. 
> It checks whether the underlying stream of the passed  FSDataInputStream 
> implements ByteBufferReadable: if true, wraps the FSDataInputStream to 
> H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream.
> In some cases, we may add another wrapper over FSDataInputStream. For 
> example, 
> {code:java}
> class CustomDataInputStream extends FSDataInputStream {
> public CustomDataInputStream(FSDataInputStream original) {
> super(original);
> }
> }
> {code}
> When we create an FSDataInputStream, whose underlying stream does not 
> implements ByteBufferReadable, and then creates a CustomDataInputStream with 
> it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get 
> an error like 
> {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported 
> by input stream{quote}
> We can fix this by taking recursive checks over the underlying stream of 
> FSDataInputStream.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap

2022-03-08 Thread Todd Gao (Jira)
Todd Gao created PARQUET-2134:
-

 Summary: Incorrect type checking in HadoopStreams.wrap
 Key: PARQUET-2134
 URL: https://issues.apache.org/jira/browse/PARQUET-2134
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Reporter: Todd Gao


The method 
[HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51]
 wraps an FSDataInputStream to a SeekableInputStream. 

It checks whether the underlying stream of the passed  FSDataInputStream 
implements ByteBufferReadable: if true, wraps the FSDataInputStream to 
H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream.

In some cases, we may add another wrapper over FSDataInputStream. For example, 

{code:java}
class CustomDataInputStream extends FSDataInputStream {
public CustomDataInputStream(FSDataInputStream original) {
super(original);
}
}
{code}

When we create an FSDataInputStream, whose underlying stream does not 
implements ByteBufferReadable, and then creates a CustomDataInputStream with 
it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get an 
error like 
{quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported by 
input stream{quote}.

We can fix this by taking recursive checks over the underlying stream of 
FSDataInputStream.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (PARQUET-2129) Add uncompressedSize to "meta" output

2022-03-08 Thread Vinoo Ganesh (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoo Ganesh resolved PARQUET-2129.
---
Resolution: Fixed

https://github.com/apache/parquet-mr/pull/949

> Add uncompressedSize to "meta" output
> -
>
> Key: PARQUET-2129
> URL: https://issues.apache.org/jira/browse/PARQUET-2129
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Vinoo Ganesh
>Assignee: Vinoo Ganesh
>Priority: Minor
>
> The `uncompressedSize` is currently not printed in the output of the parquet 
> meta command. This PR adds the uncompressedSize in to the output. 
> This was also reported by Deepak Gangwar. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (PARQUET-2129) Add uncompressedSize to "meta" output

2022-03-08 Thread Vinoo Ganesh (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503206#comment-17503206
 ] 

Vinoo Ganesh commented on PARQUET-2129:
---

Fixed in: https://github.com/apache/parquet-mr/pull/949

> Add uncompressedSize to "meta" output
> -
>
> Key: PARQUET-2129
> URL: https://issues.apache.org/jira/browse/PARQUET-2129
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Vinoo Ganesh
>Assignee: Vinoo Ganesh
>Priority: Minor
>
> The `uncompressedSize` is currently not printed in the output of the parquet 
> meta command. This PR adds the uncompressedSize in to the output. 
> This was also reported by Deepak Gangwar. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (PARQUET-2128) Bump Thrift to 0.16.0

2022-03-08 Thread Vinoo Ganesh (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoo Ganesh resolved PARQUET-2128.
---
Resolution: Fixed

Fixed in https://github.com/apache/parquet-mr/pull/948

> Bump Thrift to 0.16.0
> -
>
> Key: PARQUET-2128
> URL: https://issues.apache.org/jira/browse/PARQUET-2128
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Vinoo Ganesh
>Assignee: Vinoo Ganesh
>Priority: Minor
>
> Thrift 0.16.0 has been released 
> https://github.com/apache/thrift/releases/tag/v0.16.0



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [parquet-site] shangxinli merged pull request #12: Docsy Submodule Added

2022-03-08 Thread GitBox


shangxinli merged pull request #12:
URL: https://github.com/apache/parquet-site/pull/12


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-site] vinooganesh opened a new pull request #12: Docsy Submodule Added

2022-03-08 Thread GitBox


vinooganesh opened a new pull request #12:
URL: https://github.com/apache/parquet-site/pull/12


   Final part of new site - the docsy submodule 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-site] shangxinli commented on pull request #6: Fix small typos in latest documentation

2022-03-08 Thread GitBox


shangxinli commented on pull request #6:
URL: https://github.com/apache/parquet-site/pull/6#issuecomment-1062210397


   Thanks for working on it @vegarsti! Do you want to rebase your change given 
@vinooganesh just redesigned the webpage. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-site] shangxinli merged pull request #11: Final hidden files to kick off automation

2022-03-08 Thread GitBox


shangxinli merged pull request #11:
URL: https://github.com/apache/parquet-site/pull/11


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-site] vinooganesh opened a new pull request #11: Final hidden files to kick off automation

2022-03-08 Thread GitBox


vinooganesh opened a new pull request #11:
URL: https://github.com/apache/parquet-site/pull/11


   Adding the final files to kick off the publication of the site 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2006) Column resolution by ID

2022-03-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503093#comment-17503093
 ] 

ASF GitHub Bot commented on PARQUET-2006:
-

huaxingao commented on pull request #950:
URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1062056376


   cc @ggershinsky @sunchao 
   Could you please take a look? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Column resolution by ID
> ---
>
> Key: PARQUET-2006
> URL: https://issues.apache.org/jira/browse/PARQUET-2006
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Xinli Shang
>Assignee: Xinli Shang
>Priority: Major
>
> Parquet relies on the name. In a lot of usages e.g. schema resolution, this 
> would be a problem. Iceberg uses ID and stored Id/name mappings. 
> This Jira is to add column ID resolution support. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [parquet-mr] huaxingao commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-08 Thread GitBox


huaxingao commented on pull request #950:
URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1062056376


   cc @ggershinsky @sunchao 
   Could you please take a look? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-site] shangxinli merged pull request #10: New Hugo-based website

2022-03-08 Thread GitBox


shangxinli merged pull request #10:
URL: https://github.com/apache/parquet-site/pull/10


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-site] shangxinli merged pull request #9: Clearning out all contents of parquet staging

2022-03-08 Thread GitBox


shangxinli merged pull request #9:
URL: https://github.com/apache/parquet-site/pull/9


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-site] vinooganesh opened a new pull request #10: New Hugo-based website

2022-03-08 Thread GitBox


vinooganesh opened a new pull request #10:
URL: https://github.com/apache/parquet-site/pull/10


   This commit updates the current parquet website to the new Hugo-based 
version of the website (live demo can be found here: 
https://parquet.vinoo.io/).  
   
   This PR will need to merge https://github.com/apache/parquet-site/pull/9 as 
a prerequisite. 
   
   This PR also kicks off the new workflow for parquet website development. Our 
workflow will look as follows. We will make PRs against either the master 
branch (for prod) or staging branch for staging, and upon merge to that branch, 
kick off automation (though a Github Actions workflow) that will build the site 
and publish the artifacts to either the asf-site branch for production or the 
asf-staging branch for staging. More information can be found in the readme.
   
   Once this PR merges, `parquet.staged.apache.org` should be updated with the 
new website. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-2117) Add rowPosition API in parquet record readers

2022-03-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502820#comment-17502820
 ] 

ASF GitHub Bot commented on PARQUET-2117:
-

prakharjain09 commented on pull request #945:
URL: https://github.com/apache/parquet-mr/pull/945#issuecomment-1061584498


   @shangxinli Thanks for taking another look. I have addressed all comments 
other [than 
one](https://github.com/apache/parquet-mr/pull/945#discussion_r820928524). 
Please advice on the same. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add rowPosition API in parquet record readers
> -
>
> Key: PARQUET-2117
> URL: https://issues.apache.org/jira/browse/PARQUET-2117
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Prakhar Jain
>Priority: Major
> Fix For: 1.13.0
>
>
> Currently the parquet-mr RecordReader/ParquetFileReader exposes API’s to read 
> parquet file in columnar fashion or record-by-record.
> It will be great to extend them to also support rowPosition API which can 
> tell the position of the current record in the parquet file.
> The rowPosition can be used as a unique row identifier to mark a row. This 
> can be useful to create an index (e.g. B+ tree) over a parquet file/parquet 
> table (e.g.  Spark/Hive).
> There are multiple projects in the parquet eco-system which can benefit from 
> such a functionality: 
>  # Apache Iceberg needs this functionality. It has this implementation 
> already as it relies on low level parquet APIs -  
> [Link1|https://github.com/apache/iceberg/blob/apache-iceberg-0.12.1/parquet/src/main/java/org/apache/iceberg/parquet/ReadConf.java#L171],
>  
> [Link2|https://github.com/apache/iceberg/blob/d4052a73f14b63e1f519aaa722971dc74f8c9796/core/src/main/java/org/apache/iceberg/MetadataColumns.java#L37]
>  # Apache Spark can use this functionality - SPARK-37980



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [parquet-mr] prakharjain09 commented on pull request #945: PARQUET-2117: Expose Row Index via ParquetReader and ParquetRecordReader

2022-03-08 Thread GitBox


prakharjain09 commented on pull request #945:
URL: https://github.com/apache/parquet-mr/pull/945#issuecomment-1061584498


   @shangxinli Thanks for taking another look. I have addressed all comments 
other [than 
one](https://github.com/apache/parquet-mr/pull/945#discussion_r820928524). 
Please advice on the same. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org