[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet
[ https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909374#comment-16909374 ] Brendan Hogan commented on ARROW-6278: -- Sure, feel free to translate as adding HDFS support. That would be interesting to try out. I will add that the real value for any of this parquet access will only be unlocked once arrow properly supports nested fields, i.e. ARROW-1644. Although I am happy to put a plug in for HDFS support in the meantime. Thanks. > [R] Handle raw vector from read_parquet > > > Key: ARROW-6278 > URL: https://issues.apache.org/jira/browse/ARROW-6278 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Brendan Hogan >Priority: Major > > {{read_parquet}} currently handles a path to a local file or an Arrow input > stream. Would it be possible to add support for a raw vector containing the > contents of a parquet file? > Apologies if there is already a way to do this. I have tried populating a > buffer and passing that as input, but that is unsupported as well. An > example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet
[ https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909303#comment-16909303 ] Brendan Hogan commented on ARROW-6278: -- Fair question. I have parquet files in HDFS. I can, of course, open a spark session and {{spark_read_parquet}}, but I am exploring options for lighter-weight read access. I can grab the data into a raw vector via WebHDFS (e.g. [https://mitre.github.io/webhdfs/]). Hence my interest in {{read_parquet}} on that. I'm open to other suggestions here. > [R] Handle raw vector from read_parquet > > > Key: ARROW-6278 > URL: https://issues.apache.org/jira/browse/ARROW-6278 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Brendan Hogan >Priority: Major > > {{read_parquet}} currently handles a path to a local file or an Arrow input > stream. Would it be possible to add support for a raw vector containing the > contents of a parquet file? > Apologies if there is already a way to do this. I have tried populating a > buffer and passing that as input, but that is unsupported as well. An > example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet
[ https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909288#comment-16909288 ] Brendan Hogan commented on ARROW-6278: -- [~fsaintjacques], yes BufferReader appears to work fine. Thank you. {code:java} > test_br <- BufferReader(test_raw) > test_df <- read_parquet(test_br) > {code} > [R] Handle raw vector from read_parquet > > > Key: ARROW-6278 > URL: https://issues.apache.org/jira/browse/ARROW-6278 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Brendan Hogan >Priority: Major > > {{read_parquet}} currently handles a path to a local file or an Arrow input > stream. Would it be possible to add support for a raw vector containing the > contents of a parquet file? > Apologies if there is already a way to do this. I have tried populating a > buffer and passing that as input, but that is unsupported as well. An > example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet
[ https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909286#comment-16909286 ] Brendan Hogan commented on ARROW-6278: -- [~npr], here is an example of what I'm trying to do: {code:java} > test_raw <- readBin(system.file("v0.7.1.parquet", package="arrow"), what = > "raw", n = 5000) > test_df <- read_parquet(test_raw) Error in UseMethod("parquet_file_reader") : no applicable method for 'parquet_file_reader' applied to an object of class "raw" {code} > [R] Handle raw vector from read_parquet > > > Key: ARROW-6278 > URL: https://issues.apache.org/jira/browse/ARROW-6278 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Brendan Hogan >Priority: Major > > {{read_parquet}} currently handles a path to a local file or an Arrow input > stream. Would it be possible to add support for a raw vector containing the > contents of a parquet file? > Apologies if there is already a way to do this. I have tried populating a > buffer and passing that as input, but that is unsupported as well. An > example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-6278) [R] Handle raw vector from read_parquet
[ https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brendan Hogan updated ARROW-6278: - Summary: [R] Handle raw vector from read_parquet (was: Handle raw vector from read_parquet ) > [R] Handle raw vector from read_parquet > > > Key: ARROW-6278 > URL: https://issues.apache.org/jira/browse/ARROW-6278 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Brendan Hogan >Priority: Major > > {{read_parquet}} currently handles a path to a local file or an Arrow input > stream. Would it be possible to add support for a raw vector containing the > contents of a parquet file? > Apologies if there is already a way to do this. I have tried populating a > buffer and passing that as input, but that is unsupported as well. An > example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6278) Handle raw vector from read_parquet
Brendan Hogan created ARROW-6278: Summary: Handle raw vector from read_parquet Key: ARROW-6278 URL: https://issues.apache.org/jira/browse/ARROW-6278 Project: Apache Arrow Issue Type: New Feature Components: R Reporter: Brendan Hogan {{read_parquet}} currently handles a path to a local file or an Arrow input stream. Would it be possible to add support for a raw vector containing the contents of a parquet file? Apologies if there is already a way to do this. I have tried populating a buffer and passing that as input, but that is unsupported as well. An example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)