You can create a PCollection of file names (either using Create or placing them in, e.g., a text file) and then read the actual data in a subsequent DoFn. http://s.apache.org/splittable-do-fn will make this even more flexible. Alternatively, if your binary files contain many records each you could write a custom source yourself (if an IO doesn't already exist for that format).
On Sun, Jul 16, 2017 at 8:20 PM, Derek Hao Hu <[email protected]> wrote: > Hi, > > I'm trying to read a binary file from GCS. I've seen that `TextIO` can read > directly from GCS buckets but based on the documentation it would split > lines based on carriage returns. Is there a way to read a binary file > directly from GCS buckets? > > Thanks, > -- > Derek Hao Hu > > Software Engineer | Snapchat > Snap Inc.
