Thanks for the clarification. Is there any way to use JsonEncoder in the scenario I mentioned, i.e. in totally schema-agnostic data extraction from either binary or JSON files?
On Tue, Feb 5, 2013 at 2:58 PM, Doug Cutting <[email protected]> wrote: > Yes, GenericData.Record#toString() should generate valid Json. It > does lose some information, e.g.: > - record names; and > - the distinction between strings & enum symbols, ints & longs, > floats & doubles, and maps & records. > > JsonEncoder loses less information. It saves enough information to, > with the schema, always reconstitute an equivalent object. > > Doug > > > On Tue, Feb 5, 2013 at 11:53 AM, Public Network Services > <[email protected]> wrote: > > Folks, > > > > Assuming an application that only needs to quickly examine the contents > of a > > bunch of Avro data files (irrespective of binary or JSON encoding and > > without any prior schema or object structure knowledge), an approach > could > > be to just extract the Avro records as text JSON records. To this > effect, a > > simple approach could be: > > > > Create a DataFileStream<GenericRecord>(FileInputStream, > > GenericDatumReader<GenericRecord>) from a FileInputStream to the file. > (If > > the file is not an Avro data file, an IOException is caused.) > > Read GenericRecord records from the DataFileStream object, while its > > hasNext() method returns true. > > Convert each GenericRecord object read into a JSON string, via calling > its > > toString() method. > > > > For the test datasets in the Avro 1.7.3 distribution, this actually works > > fine. > > > > My question is, does anyone see any potential problems for (binary or > JSON > > encoded) Avro data files, given the above logic? For example, should the > > GenericRecord.toString() method always produce a valid JSON string? > > > > Thanks! > > >
