Hi Sean, My use case is to store incoming data(various sources) into a database like Cassandra. The data will be serialized using AVRO. My questions are:
1. What is the best way to do this ? 2. How should I keep the schema information along with each record ? For e.g. two columns , one storing data and another schema/fingerprints ? 3. I see fingerprints as one option but how to make use of it ; where to maintain the schema repository and how to add fingerprints to data 4. Also, I am wondering if there is ant feature to automatically generate a schema from an incoming data (CSV format) ? 5. Is there any recommended database to store data in AVRO format (relational or Nosql) ? I know I have asked a lot of questions ☺ .I will highly appreciate your response and suggestions. Thanks, Sachneet From: Sean Busbey [mailto:[email protected]] Sent: Wednesday, March 26, 2014 11:35 AM To: user@avro apache. org Subject: Re: Schema not getting saved along with Data Hi Sachneet! Can you describe your use case a little? Far and away the recommended way to use Avro is via one of the container files. The getting started guide for Java will walk you through writing and reading via the default container format: http://avro.apache.org/docs/current/gettingstartedjava.html On Wed, Mar 26, 2014 at 12:55 AM, Sachneet Singh Bains <[email protected]<mailto:[email protected]>> wrote: Thanks a lot Eric, this was useful. I was going through ‘Schema Fingerprints’. Are there any methods available (JAVA) that I can use to write these fingerprints along with data rather than the complete schema. I am looking at something like Writer.write(fingerprint,recrod) . What is the recommended way of using these fingerprints ? Thanks, Sachneet From: Eric Wasserman [mailto:[email protected]<mailto:[email protected]>] Sent: Tuesday, March 25, 2014 9:56 PM To: [email protected]<mailto:[email protected]> Subject: RE: Schema not getting saved along with Data Its a "must do". The real requirement is the reader of the serialized records must have *exactly* the schema that was used to write the records. [Note: The reader may also, optionally, specify an different reader's schema that it would like the Avro parser to use to translate the deserialized records into.] How you arrange for the parser to get the writer's schema varies with your usage. If you happen to use the org.apache.avro.file.DataFileWriter then it will prefix the file with the schema used to write all the records. The corresponding DataFileReader will use the prefixed schema to properly deserialize the records. If you are putting serialized records into some other store, e.g. a database, and there is a chance that the different records would be written with different schemas (or versions of schemas), then you would want to include an indicator of the writer's schema (e.g. a hash of the writer's schema or a foreign key to a schema's table) along with the record so that at read time you could provide the correct writer's schema to your org.apache.avro.io.DatumReader. ________________________________ From: Sachneet Singh Bains <[email protected]<mailto:[email protected]>> Sent: Tuesday, March 25, 2014 7:18 AM To: [email protected]<mailto:[email protected]> Subject: Schema not getting saved along with Data Hi, I am new to AVRO and going through the documentation. From http://avro.apache.org/docs/1.7.6/gettingstartedjava.html “Data in Avro is always stored with its corresponding schema” Does the above line convey a ‘explicitly must do’ or ‘implicitly done’ ? Is it always true even when we write single records to any stream or applies only when “Object Container Files” are used ? I tried writing some records to a file using DatumWriter and I see no schema saved along. Please resolve my confusion. Thanks, Sachneet ________________________________ NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
