[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet
[ https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909374#comment-16909374 ] Brendan Hogan commented on ARROW-6278: -- Sure, feel free to translate as adding HDFS support. That would be interesting to try out. I will add that the real value for any of this parquet access will only be unlocked once arrow properly supports nested fields, i.e. ARROW-1644. Although I am happy to put a plug in for HDFS support in the meantime. Thanks. > [R] Handle raw vector from read_parquet > > > Key: ARROW-6278 > URL: https://issues.apache.org/jira/browse/ARROW-6278 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Brendan Hogan >Priority: Major > > {{read_parquet}} currently handles a path to a local file or an Arrow input > stream. Would it be possible to add support for a raw vector containing the > contents of a parquet file? > Apologies if there is already a way to do this. I have tried populating a > buffer and passing that as input, but that is unsupported as well. An > example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet
[ https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909350#comment-16909350 ] Neal Richardson commented on ARROW-6278: I know there is some support for HDFS in the Arrow C++ library but I don't see any R bindings to it yet. Would you mind if I rewrote this ticket to be for adding HDFS support to the R package? It looks like François's suggestion unblocks you for now. You may also consider syncing the files from HDFS to your local file system and passing the file path to {{read_parquet}}; if the files are large that will be much more efficient with memory. > [R] Handle raw vector from read_parquet > > > Key: ARROW-6278 > URL: https://issues.apache.org/jira/browse/ARROW-6278 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Brendan Hogan >Priority: Major > > {{read_parquet}} currently handles a path to a local file or an Arrow input > stream. Would it be possible to add support for a raw vector containing the > contents of a parquet file? > Apologies if there is already a way to do this. I have tried populating a > buffer and passing that as input, but that is unsupported as well. An > example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet
[ https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909303#comment-16909303 ] Brendan Hogan commented on ARROW-6278: -- Fair question. I have parquet files in HDFS. I can, of course, open a spark session and {{spark_read_parquet}}, but I am exploring options for lighter-weight read access. I can grab the data into a raw vector via WebHDFS (e.g. [https://mitre.github.io/webhdfs/]). Hence my interest in {{read_parquet}} on that. I'm open to other suggestions here. > [R] Handle raw vector from read_parquet > > > Key: ARROW-6278 > URL: https://issues.apache.org/jira/browse/ARROW-6278 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Brendan Hogan >Priority: Major > > {{read_parquet}} currently handles a path to a local file or an Arrow input > stream. Would it be possible to add support for a raw vector containing the > contents of a parquet file? > Apologies if there is already a way to do this. I have tried populating a > buffer and passing that as input, but that is unsupported as well. An > example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet
[ https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909289#comment-16909289 ] Neal Richardson commented on ARROW-6278: Thanks. Out of curiosity, why are you trying to do this? Why do you have a Parquet file in memory as a raw vector? I'm wondering if there's a better solution to your actual problem than extending {{read_parquet}} to read raw vectors. > [R] Handle raw vector from read_parquet > > > Key: ARROW-6278 > URL: https://issues.apache.org/jira/browse/ARROW-6278 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Brendan Hogan >Priority: Major > > {{read_parquet}} currently handles a path to a local file or an Arrow input > stream. Would it be possible to add support for a raw vector containing the > contents of a parquet file? > Apologies if there is already a way to do this. I have tried populating a > buffer and passing that as input, but that is unsupported as well. An > example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet
[ https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909288#comment-16909288 ] Brendan Hogan commented on ARROW-6278: -- [~fsaintjacques], yes BufferReader appears to work fine. Thank you. {code:java} > test_br <- BufferReader(test_raw) > test_df <- read_parquet(test_br) > {code} > [R] Handle raw vector from read_parquet > > > Key: ARROW-6278 > URL: https://issues.apache.org/jira/browse/ARROW-6278 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Brendan Hogan >Priority: Major > > {{read_parquet}} currently handles a path to a local file or an Arrow input > stream. Would it be possible to add support for a raw vector containing the > contents of a parquet file? > Apologies if there is already a way to do this. I have tried populating a > buffer and passing that as input, but that is unsupported as well. An > example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet
[ https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909286#comment-16909286 ] Brendan Hogan commented on ARROW-6278: -- [~npr], here is an example of what I'm trying to do: {code:java} > test_raw <- readBin(system.file("v0.7.1.parquet", package="arrow"), what = > "raw", n = 5000) > test_df <- read_parquet(test_raw) Error in UseMethod("parquet_file_reader") : no applicable method for 'parquet_file_reader' applied to an object of class "raw" {code} > [R] Handle raw vector from read_parquet > > > Key: ARROW-6278 > URL: https://issues.apache.org/jira/browse/ARROW-6278 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Brendan Hogan >Priority: Major > > {{read_parquet}} currently handles a path to a local file or an Arrow input > stream. Would it be possible to add support for a raw vector containing the > contents of a parquet file? > Apologies if there is already a way to do this. I have tried populating a > buffer and passing that as input, but that is unsupported as well. An > example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet
[ https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909248#comment-16909248 ] Francois Saint-Jacques commented on ARROW-6278: --- There's the BufferReader in C++ https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/memory.h#L131-L168 which seems to be referenced/reachable in R bindinds: https://github.com/apache/arrow/blob/master/r/src/io.cpp#L137-L141 > [R] Handle raw vector from read_parquet > > > Key: ARROW-6278 > URL: https://issues.apache.org/jira/browse/ARROW-6278 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Brendan Hogan >Priority: Major > > {{read_parquet}} currently handles a path to a local file or an Arrow input > stream. Would it be possible to add support for a raw vector containing the > contents of a parquet file? > Apologies if there is already a way to do this. I have tried populating a > buffer and passing that as input, but that is unsupported as well. An > example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6278) [R] Handle raw vector from read_parquet
[ https://issues.apache.org/jira/browse/ARROW-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909246#comment-16909246 ] Neal Richardson commented on ARROW-6278: Could you give an example of the code you have that you'd expect to work? > [R] Handle raw vector from read_parquet > > > Key: ARROW-6278 > URL: https://issues.apache.org/jira/browse/ARROW-6278 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Brendan Hogan >Priority: Major > > {{read_parquet}} currently handles a path to a local file or an Arrow input > stream. Would it be possible to add support for a raw vector containing the > contents of a parquet file? > Apologies if there is already a way to do this. I have tried populating a > buffer and passing that as input, but that is unsupported as well. An > example of how to work using an input stream would be useful as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)