Hi Amit, The Avro data file format requires the writer to know the schema from the start, because all records in the file are then written with the same schema. So there probably isn't an alternative to what you're doing -- to buffer as much as you can in memory, write it out to file when the memory buffer is full, and then start a new file.
You can't change the schema of a data file once it has been written, but you can run a background process which merges several data files together, and writes the result to a new file. You can make the merged file's schema the union of all the input file schemas, or you can write some application-specific code which combines the schemas into one, and evolve all the records into that merged schema. This can be done by streaming through the files -- you don't need to keep all the data in memory. Martin On 1 Apr 2014, at 21:55, amit nanda <[email protected]> wrote: > I have very dynamic data that i want to write to an avro file. The solution i > have is to store all that data in the memory and then calculate the schema, > and then start the writing. > > This causes the files to be smaller in size, because of the memory > limitations. > > What i am looking for is that i will start data as and when it is collected, > but how should i compute the schema in this case? Can i change the schema for > an avro file? > > Thanks > Amit
