First you will need a filesystem that can read & write to GCS. There is no native GCS filesystem (yet, see [1]) at the moment so you will need to use fsspec to wrap an fsspec compatible GCS filesystem. There is an example of how to do this at [2].
To open a CSV read stream you can either create a dataset with the CSV file format (see [3] to learn about datasets) or you can create an incremental CSV reader using open_csv[4] and an incremental CSV writer using CSVWriter[5]. More general CSV reading/writing information can be found at [6]. [1] https://issues.apache.org/jira/browse/ARROW-1231 [2] https://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems [3] https://arrow.apache.org/docs/python/dataset.html#tabular-datasets [4] https://arrow.apache.org/docs/python/generated/pyarrow.csv.open_csv.html#pyarrow.csv.open_csv [5] https://arrow.apache.org/docs/python/generated/pyarrow.csv.CSVWriter.html#pyarrow.csv.CSVWriter [6] https://arrow.apache.org/docs/python/generated/pyarrow.csv.CSVWriter.html#pyarrow.csv.CSVWriter On Wed, Aug 25, 2021 at 4:59 PM gates ma <[email protected]> wrote: > > hi folks, > > Looking to use the csv read stream to write to GCS. Is there an ability to > use pyarrow cvs stream to write to a GCS bucket ? > > Thanks, > MG.
