You can use the read* and write* methods of FileIO to read and write
arbitrary binary files. The examples in the Javadoc for FileIO [0] include
an example of reading the entire contents of a file as a string into a Beam
record, along with metadata about the file.

If a one-to-one mapping of files to records is fine for your use case, then
it should be fairly straightforward to read and write byte arrays.

If your files are large and contain many logical records, then you need a
way to understand the format of a binary file in order to break it up into
records.

To support writing batched records in an arbitrary file format, you could
build a custom implementation of FileIO.Sink [1]. There are existing
pre-built sinks for newline-delimited text (TextIO.Sink), Avro, xml, etc.
which may or may not meet your needs.

[0]
https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/io/FileIO.html
[1]
https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/io/FileIO.Sink.html

On Fri, Apr 26, 2019 at 3:13 PM Nikhil Goyal <[email protected]> wrote:

> Hi,
>
> Is there a way to read and write binary files in beam?
>
> Thanks
> Nikhil
>

Reply via email to