Yes, GenericData.Record#toString() should generate valid Json. It does lose some information, e.g.: - record names; and - the distinction between strings & enum symbols, ints & longs, floats & doubles, and maps & records.
JsonEncoder loses less information. It saves enough information to, with the schema, always reconstitute an equivalent object. Doug On Tue, Feb 5, 2013 at 11:53 AM, Public Network Services <[email protected]> wrote: > Folks, > > Assuming an application that only needs to quickly examine the contents of a > bunch of Avro data files (irrespective of binary or JSON encoding and > without any prior schema or object structure knowledge), an approach could > be to just extract the Avro records as text JSON records. To this effect, a > simple approach could be: > > Create a DataFileStream<GenericRecord>(FileInputStream, > GenericDatumReader<GenericRecord>) from a FileInputStream to the file. (If > the file is not an Avro data file, an IOException is caused.) > Read GenericRecord records from the DataFileStream object, while its > hasNext() method returns true. > Convert each GenericRecord object read into a JSON string, via calling its > toString() method. > > For the test datasets in the Avro 1.7.3 distribution, this actually works > fine. > > My question is, does anyone see any potential problems for (binary or JSON > encoded) Avro data files, given the above logic? For example, should the > GenericRecord.toString() method always produce a valid JSON string? > > Thanks! >
