Thanks Doug. Makes perfect sense... I just hadn't found the ticket. Internally, we just made a new InputFormat that uses JsonEncoder.
2013/5/6 Doug Cutting <[email protected]> > This has been previously reported as: > > https://issues.apache.org/jira/browse/AVRO-1275 > > Please also note that GenericData#toString() does not always produce > output that JsonDecoder can read. If you're using JsonDecoder then > you should also use JsonEncoder. That said, some folks don't like the > way that those classes encode unions and prefer the JSON that > GenericData#toString() generates. > > A union between, e.g., a string an an enum can produce ambiguous json. > To resolve this, JsonEncoder/Decoder tags union values (except unions > with null) with the intended type. A union between string and an enum > named Flavor with values SWEET and SOUR might be rendered by > JsonEncoder as {"string":"SOUR"} or {"Flavor":"SOUR"}, while > GenericData#toString() would print "SOUR" in both cases. > > The wrapping of all "bytes" values in {"bytes": ...} by > GenericData#toString() is separate and should probably be considered a > bug. Unfortunately fixing it would be an incompatible change, so > should probably wait until release 1.8. > > Doug > > On Thu, Apr 25, 2013 at 6:26 AM, Jonathan Coveney <[email protected]> > wrote: > > This should replicate the issue on 1.7.4: > > https://gist.github.com/jcoveney/5459644 > > > > Basically, when using DataFileReader to read a union of bytes, it's > > outputting in the form of {"bytes": "<thebytes>"}, which it doesn't do > for > > any other union types. > > > > Is this expected? Is this a bug? > > > > I appreciate your help, > > Jon >
