[jira] [Updated] (PARQUET-2151) Drop Hadoop 1 input stream support from parquet-hadoop

2022-06-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated PARQUET-2151:

Description: 
Parquet uses reflection to load a hadoop2 input stream, falling back to a 
hadoop-1 compatible client if not found.

All hadoop 2.0.2+ releases work with H2SeekableInputStream, so 
H1SeekableInputStream can be cut and the binding to H2SeekableInputStream 
reworked to avoid needing reflection. This would make it a lot easier to probe 
for/use the bytebuffer input, and line the code up for more recent hadoop 
releases.




  was:
Parquet uses reflection to load a hadoop2 input stream, falling back to a 
hadoop-1 compatible client if not found.

All hadoop 2.0.2+ releases work with H2SeekableInputStream, so 
H1SeekableInputStream can be cut and the binding to H2SeekableInputStream 
reworked to avoid needing reflection. This would make it a lot easier to probe 
for/use the bytebuffer input, and line the code up for more recent hadoop 
releases.

One thing H1SeekableInputStream does do is read into a temp array if the 
FSDataInputStream doesn't support , that is, doesn't implement 
ByteBufferReadable.
but FSDataInputStream simply forwards that to the inner stream, if it too 
implements ByteBufferReadable. Filesystems which don't (the cloud stores) can't 
be read through H2SeekableInputStream.read(ByteBufferReadable). If this 
desired, H2SeekableInputStream will need to dynamically downgrade to 
DelegatingSeekableInputStream's base methods if a call to 
FSDataInputStream.read(ByteBuffer) fails.




> Drop Hadoop 1 input stream support from parquet-hadoop 
> ---
>
> Key: PARQUET-2151
> URL: https://issues.apache.org/jira/browse/PARQUET-2151
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Steve Loughran
>Priority: Minor
>
> Parquet uses reflection to load a hadoop2 input stream, falling back to a 
> hadoop-1 compatible client if not found.
> All hadoop 2.0.2+ releases work with H2SeekableInputStream, so 
> H1SeekableInputStream can be cut and the binding to H2SeekableInputStream 
> reworked to avoid needing reflection. This would make it a lot easier to 
> probe for/use the bytebuffer input, and line the code up for more recent 
> hadoop releases.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (PARQUET-2151) Drop Hadoop 1 input stream support from parquet-hadoop

2022-06-06 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated PARQUET-2151:

Summary: Drop Hadoop 1 input stream support from parquet-hadoop   (was: 
parquet-hadoop to drop Hadoop 1 input stream support)

> Drop Hadoop 1 input stream support from parquet-hadoop 
> ---
>
> Key: PARQUET-2151
> URL: https://issues.apache.org/jira/browse/PARQUET-2151
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Steve Loughran
>Priority: Minor
>
> Parquet uses reflection to load a hadoop2 input stream, falling back to a 
> hadoop-1 compatible client if not found.
> All hadoop 2.0.2+ releases work with H2SeekableInputStream, so 
> H1SeekableInputStream can be cut and the binding to H2SeekableInputStream 
> reworked to avoid needing reflection. This would make it a lot easier to 
> probe for/use the bytebuffer input, and line the code up for more recent 
> hadoop releases.
> One thing H1SeekableInputStream does do is read into a temp array if the 
> FSDataInputStream doesn't support , that is, doesn't implement 
> ByteBufferReadable.
> but FSDataInputStream simply forwards that to the inner stream, if it too 
> implements ByteBufferReadable. Filesystems which don't (the cloud stores) 
> can't be read through H2SeekableInputStream.read(ByteBufferReadable). If this 
> desired, H2SeekableInputStream will need to dynamically downgrade to 
> DelegatingSeekableInputStream's base methods if a call to 
> FSDataInputStream.read(ByteBuffer) fails.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)