You can create a PCollection of file names (either using Create or
placing them in, e.g., a text file) and then read the actual data in a
subsequent DoFn. http://s.apache.org/splittable-do-fn will make this
even more flexible. Alternatively, if your binary files contain many
records each you could write a custom source yourself (if an IO
doesn't already exist for that format).

On Sun, Jul 16, 2017 at 8:20 PM, Derek Hao Hu <[email protected]> wrote:
> Hi,
>
> I'm trying to read a binary file from GCS. I've seen that `TextIO` can read
> directly from GCS buckets but based on the documentation it would split
> lines based on carriage returns. Is there a way to read a binary file
> directly from GCS buckets?
>
> Thanks,
> --
> Derek Hao Hu
>
> Software Engineer | Snapchat
> Snap Inc.

Reply via email to