Hi,

(Disclaimer: I'm new to Avro and Beam)


Question: *is there a way to read the schema from an Avro file in GCS
without having to read the entire file?*


Context:

I have a bunch of large files in GCS

I want to process them by doing
AvroIO.readGenericRecords(theSchema).from(filePattern)
(this is from the Apache Beam SDK). However, I don’t know the schema up
front.


Now, I can read one of the files and extract the schema from it up front,
sort of like this:

```
Blob avroFile = … // get Blob from GCS

SeekableInput seekableInput = new SeekableByteArrayInput(
avroFile.getContent());

DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();

try (DataFileReader<GenericRecord> dataFileReader = new
DataFileReader<>(seekableInput,
datumReader)) {

  String schema = dataFileReader.getSchema().toString();

}

```


but.. the file is really large, and my nodes are really tiny, so they run
out of memory. Is there a way to not have to read the entire file in order
to extract the schema?

Thanks!

Reply via email to