On 12/07/2011 05:16 AM, Gaurav wrote: > One option is to construct record schema on the fly and second option is to > use unions to write schema in a general way. > > Problems with 1 is that we have to construct schema everytime depending upon > keys and then attach the entire string schema to a relatively small record.
You might instead write the Schema more efficiently in binary. It could be written as binary Json using the following: http://avro.apache.org/docs/current/api/java/org/apache/avro/data/Json.html Or there's an even more efficient schema-for-schemas approach in: https://issues.apache.org/jira/browse/AVRO-251 (I don't know if that patch is still up to date. If you like I can update it. If someone finds it useful then I'll commit it.) > But in second schema, u don't need to write schema on the wire as it is > present with client also. > > I have written one such sample schema: > {"type":"map","values":["int","long","float","double","string","boolean",{"type":"map","values":["int","long","float","double","string","boolean"]}]} > > Do you guys think writing something of this sort makes sense or is there any > better approach to this? A map like that is a totally reasonable approach when things vary a lot. If the schema is really different for each instance written then building a new schema each time might end up hurting performance. If there are actually only relatively few schemas that re-occur then they might be cached and reused. If some fields are always present then you might put those in a record and have a field in the record with a map like that for other stuff. This is a common approach. Every record might have a date and uid or somesuch, but other aspects may vary. Doug
