Re: streaming many R dataframes into feather or parquet

Nic Crane Thu, 27 Jul 2023 14:59:46 -0700

Hi Richard,

It is possible - I've created an example in this gist showing how to loop
through a list of files and write to a Parquet file one row at a time:
https://gist.github.com/thisisnic/5bdb85d2742bc318433f2f14b8bd77cf.


Does this solve your problem?

On Thu, 27 Jul 2023 at 12:22, Richard Beare <[email protected]> wrote:

> Hi arrow experts,
>
> I have what I think should be a standard problem, but I'm not seeing the
> correct solution.
>
> I have data in a nonstandard form (nifti neuroimaging files) that I can
> load into R and transform into a single row dataframe (which is 30K
> columns). In a small example I can load about 80 of these into a single
> dataframe and save as feather or parquet without problem. I'd like to
> address the problem where I have thousands.
>
> The approach of loading a collection (e.g. 10) into a dataframe and saving
> with a hive standard name and repeating does work, but doesn't seem like
> the right way to do it.
>
> Is there a way to stream data, one row at a time, into a feather or
> parquet file?
> I've attempted to use write_feather with a FileOutputputStream sink, but
> without luch
>

Re: streaming many R dataframes into feather or parquet

Reply via email to