Freaky. The following works just fine.
scala> val anonSchema = Schema.createRecord(Lists.newArrayList(new
Field("foo", Schema.create(Type.STRING), null, null)))
anonSchema: org.apache.avro.Schema =
{"type":"record","fields":[{"name":"foo","type":"string"}]}
scala> val writer = new DataFileWriter[Record](new
GenericDatumWriter[Record](anonSchema))
writer:
org.apache.avro.file.DataFileWriter[org.apache.avro.generic.GenericData.Record]
= org.apache.avro.file.DataFileWriter@417f6125
scala> writer.create(anonSchema, new File("test-anon.avro"))
res0:
org.apache.avro.file.DataFileWriter[org.apache.avro.generic.GenericData.Record]
= org.apache.avro.file.DataFileWriter@417f6125
scala> writer.append(new GenericRecordBuilder(anonSchema).set("foo",
"bar").build())
scala> writer.flush()
scala> writer.close()
Of course, test-anon.avro can't be read back in any meaningful way, which
is the problem. I'll file a JIRA. The question is, if Schema allows such a
case, the semantic validation needs to exist in many places. I've been
whining about the awkwardness of the Schema APIs (to Doug, at the office)
for some time now. Maybe it's time we provided a set of builders that
ensure semantic validity upon construction. I wouldn't mind putting in the
work.
On Mon, Mar 4, 2013 at 9:31 AM, Doug Cutting <[email protected]> wrote:
> As Francis noted, anonymous records are not permitted. That said, the
> runtime uses anonymous record schemas internally to implement message
> parameter lists (which are written and read like records, but don't
> have names).
>
> How did you manage to create a file containing an anonymous record?
> Perhaps the API lets you create anonymous record schemas? If so, we
> should probably fix that, so they're only created by the Protocol
> parser via a package-private API.
>
> Doug
>
> On Sun, Mar 3, 2013 at 11:50 PM, Eric Sammer <[email protected]> wrote:
> > All:
> >
> > I'm looking for some clarity on the use of anonymous records in Avro data
> > files. Is this considered legal? 1.7.3 allows one to write a data file
> with
> > DataFileWriter with an anonymous record schema that can't be read back
> which
> > is not the nicest behavior. Here's a contrived example of a data file:
> >
> > esammer:~/ esammer$ ~/bin/avro-tool getmeta 1362381940987-1
> > Exception in thread "main" org.apache.avro.SchemaParseException: No name
> in
> > schema: {"type":"record","fields":[{"name":"word","type":"string"}]}
> > at org.apache.avro.Schema.getRequiredText(Schema.java:1198)
> > at org.apache.avro.Schema.parse(Schema.java:1066)
> > at org.apache.avro.Schema$Parser.parse(Schema.java:927)
> > at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> > at org.apache.avro.Schema.parse(Schema.java:974)
> > at
> > org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:124)
> > at
> > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
> > at
> > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:89)
> > at
> > org.apache.avro.tool.DataFileGetMetaTool.run(DataFileGetMetaTool.java:63)
> > at org.apache.avro.tool.Main.run(Main.java:78)
> > at org.apache.avro.tool.Main.main(Main.java:67)
> >
> > Before I filed the bug I wanted to clarify that anonymous records are
> > against the spec (or that they aren't, and the bug is the schema parser).
> >
> > Thanks.
> > --
> > Eric Sammer
> > twitter: esammer
> > data: www.cloudera.com
>
--
Eric Sammer
twitter: esammer
data: www.cloudera.com