[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet

2019-08-16 Thread Brendan Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909374#comment-16909374
 ] 

Brendan Hogan commented on ARROW-6278:
--

Sure, feel free to translate as adding HDFS support.  That would be interesting 
to try out.

I will add that the real value for any of this parquet access will only be 
unlocked once arrow properly supports nested fields, i.e. ARROW-1644.  Although 
I am happy to put a plug in for HDFS support in the meantime.  Thanks.

 

> [R] Handle raw vector from read_parquet 
> 
>
> Key: ARROW-6278
> URL: https://issues.apache.org/jira/browse/ARROW-6278
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Brendan Hogan
>Priority: Major
>
> {{read_parquet}} currently handles a path to a local file or an Arrow input 
> stream.  Would it be possible to add support for a raw vector containing the 
> contents of a parquet file?
> Apologies if there is already a way to do this.  I have tried populating a 
> buffer and passing that as input, but that is unsupported as well.  An 
> example of how to work using an input stream would be useful as well.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet

2019-08-16 Thread Brendan Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909303#comment-16909303
 ] 

Brendan Hogan commented on ARROW-6278:
--

Fair question.  I have parquet files in HDFS.  I can, of course, open a spark 
session and {{spark_read_parquet}}, but I am exploring options for 
lighter-weight read access.  I can grab the data into a raw vector via WebHDFS 
(e.g. [https://mitre.github.io/webhdfs/]).  Hence my interest in 
{{read_parquet}} on that.  I'm open to other suggestions here.

> [R] Handle raw vector from read_parquet 
> 
>
> Key: ARROW-6278
> URL: https://issues.apache.org/jira/browse/ARROW-6278
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Brendan Hogan
>Priority: Major
>
> {{read_parquet}} currently handles a path to a local file or an Arrow input 
> stream.  Would it be possible to add support for a raw vector containing the 
> contents of a parquet file?
> Apologies if there is already a way to do this.  I have tried populating a 
> buffer and passing that as input, but that is unsupported as well.  An 
> example of how to work using an input stream would be useful as well.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet

2019-08-16 Thread Brendan Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909288#comment-16909288
 ] 

Brendan Hogan commented on ARROW-6278:
--

[~fsaintjacques], yes BufferReader appears to work fine.  Thank you.
{code:java}
> test_br <- BufferReader(test_raw) 
> test_df <- read_parquet(test_br) 
>
{code}
 

> [R] Handle raw vector from read_parquet 
> 
>
> Key: ARROW-6278
> URL: https://issues.apache.org/jira/browse/ARROW-6278
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Brendan Hogan
>Priority: Major
>
> {{read_parquet}} currently handles a path to a local file or an Arrow input 
> stream.  Would it be possible to add support for a raw vector containing the 
> contents of a parquet file?
> Apologies if there is already a way to do this.  I have tried populating a 
> buffer and passing that as input, but that is unsupported as well.  An 
> example of how to work using an input stream would be useful as well.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet

2019-08-16 Thread Brendan Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909286#comment-16909286
 ] 

Brendan Hogan commented on ARROW-6278:
--

[~npr], here is an example of what I'm trying to do:
{code:java}
> test_raw <- readBin(system.file("v0.7.1.parquet", package="arrow"), what = 
> "raw", n = 5000) 
> test_df <- read_parquet(test_raw) 
Error in UseMethod("parquet_file_reader") :
 no applicable method for 'parquet_file_reader' applied to an object of class 
"raw"
{code}

> [R] Handle raw vector from read_parquet 
> 
>
> Key: ARROW-6278
> URL: https://issues.apache.org/jira/browse/ARROW-6278
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Brendan Hogan
>Priority: Major
>
> {{read_parquet}} currently handles a path to a local file or an Arrow input 
> stream.  Would it be possible to add support for a raw vector containing the 
> contents of a parquet file?
> Apologies if there is already a way to do this.  I have tried populating a 
> buffer and passing that as input, but that is unsupported as well.  An 
> example of how to work using an input stream would be useful as well.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-6278) [R] Handle raw vector from read_parquet

2019-08-16 Thread Brendan Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brendan Hogan updated ARROW-6278:
-
Summary: [R] Handle raw vector from read_parquet   (was: Handle raw vector 
from read_parquet )

> [R] Handle raw vector from read_parquet 
> 
>
> Key: ARROW-6278
> URL: https://issues.apache.org/jira/browse/ARROW-6278
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Brendan Hogan
>Priority: Major
>
> {{read_parquet}} currently handles a path to a local file or an Arrow input 
> stream.  Would it be possible to add support for a raw vector containing the 
> contents of a parquet file?
> Apologies if there is already a way to do this.  I have tried populating a 
> buffer and passing that as input, but that is unsupported as well.  An 
> example of how to work using an input stream would be useful as well.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6278) Handle raw vector from read_parquet

2019-08-16 Thread Brendan Hogan (JIRA)
Brendan Hogan created ARROW-6278:


 Summary: Handle raw vector from read_parquet 
 Key: ARROW-6278
 URL: https://issues.apache.org/jira/browse/ARROW-6278
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Brendan Hogan


{{read_parquet}} currently handles a path to a local file or an Arrow input 
stream.  Would it be possible to add support for a raw vector containing the 
contents of a parquet file?

Apologies if there is already a way to do this.  I have tried populating a 
buffer and passing that as input, but that is unsupported as well.  An example 
of how to work using an input stream would be useful as well.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)