Eugene Kirpichov created BEAM-2677:
--------------------------------------

             Summary: AvroIO.read without specifying a schema
                 Key: BEAM-2677
                 URL: https://issues.apache.org/jira/browse/BEAM-2677
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-core
            Reporter: Eugene Kirpichov
            Assignee: Eugene Kirpichov


Sometimes it is inconvenient to require the user of AvroIO.read/readAll to 
specify a Schema for the Avro files they are reading, especially if different 
files may have different schemas.

It is possible to read GenericRecord objects from an Avro file, however it is 
not possible to provide a Coder for GenericRecord without knowing the schema: a 
GenericRecord knows its schema so we can encode it into a byte array, but we 
can not decode it from a byte array without knowing the schema (and encoding 
the full schema together with every record would be impractical).

Instead, a reasonable approach is to treat schemaless GenericRecord as 
unencodable and use the same approach as JdbcIO - a user-specified parse 
callback.

Suggested API: AvroIO.parseGenericRecords(SerializableFunction<GenericRecord, 
T> parseFn).from(filepattern).

CC: [~mkhadikov] [~reuvenlax]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to