Hi All, In my current project there is a requirement to store avro data (json format) as parquet files. I was able to use AvroParquetWriter in separately to create the Parquet Files. The parquet files along with the data also had the 'avro schema' stored on them as a part of their footer.
But when tired using Spark streamng I could not find a way to store the data with the avro schema information. The closest that I got was to create a Dataframe using the json RDDs and store them as parquet. Here the parquet files had a spark specific schema in their footer. Is this the right approach or do I have a better one. Please guide me. We are using Spark 1.4.1. Thanks In Advance!!