Re: Re: [C++] How can I read streaming parquet file in v0.15.0

Micah Kornfield Wed, 06 Nov 2019 22:47:50 -0800

I'm not sure what is meant by "streaming" in this  context.  My
understanding is that Parquet file reading needs RandomAccess.  In this
regard if you are trying to fetch from S3  A RandomAccessFile object using
the S3FileSystem
https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.h#L110
and
then create a Parquet file reader with the object.  I'm not sure if this
code path has been well tested.


On Fri, Nov 1, 2019 at 12:56 AM annsshadow <[email protected]> wrote:

> The arrow::RecordBatchReader needs a arrow::dataset::RecordBatchProjector
> which needs the Schema. It seems that I can't get the schema first and read
> the streaming parquet by arrow.<br/>In my situation, the parquet file is in
> the object system like S3. I can get it from the network slice by slice
> with any filesize, but can't hold the whole file in the memory and
> disk.<br/>Your reply indicates that the C++ can't read the streaming
> parquet now, so what should I try next with the arrow or anything
> else?<br/>Thank you for your work~~
> At 2019-11-01 01:46:32, "Wes McKinney" <[email protected]> wrote:
> >You will want to use the GetRecordBatchReader C++ API here
> >
> >
> https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/reader.h#L152
> >
> >It may not be optimal for your use case. Support for streaming reads
> >is not yet exposed in Python or other bindings as far as I know.
> >
> >There is work happening in the C++ Datasets project to better support
> >this use case.
> >
> >On Wed, Oct 30, 2019 at 9:28 PM annsshadow <[email protected]> wrote:
> >>
> >>
> >> hi~
> >> I hava a question about reading parquet file.
> >> The offical example is reading the whole file from the local.
> >> Now I can't get the whole parquet file in the memory, only can fetch it
> slice by slice from the network, so how can I use arrow to read the parquet
> file?
> >> thank you~
>

Re: Re: [C++] How can I read streaming parquet file in v0.15.0

Reply via email to