Yes, that should be possible, A given JsonEncoder instance only works for a given schema. And every generic record conforms to a schema.
http://avro.apache.org/docs/current/api/java/org/apache/avro/io/EncoderFactory.html#jsonEncoder(org.apache.avro.Schema, java.io.OutputStream) Doug On Tue, Feb 5, 2013 at 3:30 PM, Public Network Services <[email protected]> wrote: > Thanks for the clarification. > > Is there any way to use JsonEncoder in the scenario I mentioned, i.e. in > totally schema-agnostic data extraction from either binary or JSON files? > > > On Tue, Feb 5, 2013 at 2:58 PM, Doug Cutting <[email protected]> wrote: >> >> Yes, GenericData.Record#toString() should generate valid Json. It >> does lose some information, e.g.: >> - record names; and >> - the distinction between strings & enum symbols, ints & longs, >> floats & doubles, and maps & records. >> >> JsonEncoder loses less information. It saves enough information to, >> with the schema, always reconstitute an equivalent object. >> >> Doug >> >> >> On Tue, Feb 5, 2013 at 11:53 AM, Public Network Services >> <[email protected]> wrote: >> > Folks, >> > >> > Assuming an application that only needs to quickly examine the contents >> > of a >> > bunch of Avro data files (irrespective of binary or JSON encoding and >> > without any prior schema or object structure knowledge), an approach >> > could >> > be to just extract the Avro records as text JSON records. To this >> > effect, a >> > simple approach could be: >> > >> > Create a DataFileStream<GenericRecord>(FileInputStream, >> > GenericDatumReader<GenericRecord>) from a FileInputStream to the file. >> > (If >> > the file is not an Avro data file, an IOException is caused.) >> > Read GenericRecord records from the DataFileStream object, while its >> > hasNext() method returns true. >> > Convert each GenericRecord object read into a JSON string, via calling >> > its >> > toString() method. >> > >> > For the test datasets in the Avro 1.7.3 distribution, this actually >> > works >> > fine. >> > >> > My question is, does anyone see any potential problems for (binary or >> > JSON >> > encoded) Avro data files, given the above logic? For example, should the >> > GenericRecord.toString() method always produce a valid JSON string? >> > >> > Thanks! >> > > >
