Currently I am not doing anything, if anything change start from scratch. In general I doubt there are many options to account for schema changes. If you are reading files using impala, then it may allow if the schema changes are append only. Otherwise existing Parquet files have to be migrated to new schema.
----- Original Message ----- From: "Buntu Dev" <buntu...@gmail.com> To: "Soumitra Kumar" <kumar.soumi...@gmail.com> Cc: u...@spark.incubator.apache.org Sent: Tuesday, October 7, 2014 10:18:16 AM Subject: Re: Kafka->HDFS to store as Parquet format Thanks for the info Soumitra.. its a good start for me. Just wanted to know how you are managing schema changes/evolution as parquetSchema is provided to setSchema in the above sample code. On Tue, Oct 7, 2014 at 10:09 AM, Soumitra Kumar < kumar.soumi...@gmail.com > wrote: I have used it to write Parquet files as: val job = new Job val conf = job.getConfiguration conf.set (ParquetOutputFormat.COMPRESSION, CompressionCodecName.SNAPPY.name ()) ExampleOutputFormat.setSchema (job, MessageTypeParser.parseMessageType (parquetSchema)) rdd saveAsNewAPIHadoopFile (rddToFileName (outputDir, em, time), classOf[Void], classOf[Group], classOf[ExampleOutputFormat], conf) ----- Original Message ----- From: "bdev" < buntu...@gmail.com > To: u...@spark.incubator.apache.org Sent: Tuesday, October 7, 2014 9:51:40 AM Subject: Re: Kafka->HDFS to store as Parquet format After a bit of looking around, I found saveAsNewAPIHadoopFile could be used to specify the ParquetOutputFormat. Has anyone used it to convert JSON to Parquet format or any pointers are welcome, thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-HDFS-to-store-as-Parquet-format-tp15768p15852.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org