Hi,
(Disclaimer: I'm new to Avro and Beam)
Question: *is there a way to read the schema from an Avro file in GCS
without having to read the entire file?*
Context:
I have a bunch of large files in GCS
I want to process them by doing
AvroIO.readGenericRecords(theSchema).from(filePattern)
(this is from the Apache Beam SDK). However, I don’t know the schema up
front.
Now, I can read one of the files and extract the schema from it up front,
sort of like this:
```
Blob avroFile = … // get Blob from GCS
SeekableInput seekableInput = new SeekableByteArrayInput(
avroFile.getContent());
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
try (DataFileReader<GenericRecord> dataFileReader = new
DataFileReader<>(seekableInput,
datumReader)) {
String schema = dataFileReader.getSchema().toString();
}
```
but.. the file is really large, and my nodes are really tiny, so they run
out of memory. Is there a way to not have to read the entire file in order
to extract the schema?
Thanks!