Hi, Apologies if this has been asked before.
I've been trying to write a Parquet file using Avro as per Hadoop Definitive Guide book and it's working okay. I have written my application in Java and the file is saved on HDFS. What I really want to do is to learn how schema evolution works and I am evaluating whether we can do the following with Avro and Parquet. I want to have a single Parquet file and first write a bunch of records to it. Then when I receive more data, I hope to append those records to the same file. First, I don't know if this is possible. Second thing is that we know our schema will evolve. For example, we might add new fields to the schema and I am wondering whether it's possible to add new records with the new schema onto the same file which was originally written with old schema. What we basically want is to keep "the file" as a database. Can somebody please tell me if this is doable and if so could you also give me some code samples because I couldn't find any example codes where it appends new records to an existing parquet file using Avro as well as any examples of how to change the schema and write new records based on new schema to that file. Thanks Lloyd
