What Eric suggests (reader schemas) would work, but may incur a double read cost when you wish to proceed based on a positive condition met by the specific read.
If this data is held, order-wise, early into the record, then perhaps using a custom DatumReader implementation (that does the low level deserialization) may work more effectively. You can pass a DatumReader when constructing a DataFileReader - but its quite a long route to go IMO. On Sat, Aug 17, 2013 at 4:17 AM, Eric Wasserman <[email protected]> wrote: > If you define you records like this (this is in the Avro IDL lang. for > brevity) > > If you write your records with a schema like this: > > > record R { > > Header header; > > Body body; > > } > > > > Then you can read with a schema like this: > > > record RSansBody { > > Header header; > > } > > > And the Avro libraries will read the header part (in which your "type" would > reside) and effectively skip the body part. > > ________________________________ > From: Anna Lahoud <[email protected]> > Sent: Friday, August 16, 2013 12:23 PM > To: [email protected] > Subject: Is there a way to conditionally read Avro data? > > I am wondering if there is a way that I can avoid reading all of an item in > an Avro file, based on some of the data that I have already read. For > instance, say I have a datum where I know that if it's 'type' value is a > 'ComputerVirus', and that I do not want to touch the remaining fields. Is > there a way to 'move on' and get the next datum, without touching the > remainder of the scary datum? I would call it a 'conditional read' in that I > only want to fully read the datum if the datum meets some criteria. > > Anna > -- Harsh J
