Currently I am not doing anything, if anything change start from scratch.

In general I doubt there are many options to account for schema changes. If you 
are reading files using impala, then it may allow if the schema changes are 
append only. Otherwise existing Parquet files have to be migrated to new schema.

----- Original Message -----
From: "Buntu Dev" <buntu...@gmail.com>
To: "Soumitra Kumar" <kumar.soumi...@gmail.com>
Cc: u...@spark.incubator.apache.org
Sent: Tuesday, October 7, 2014 10:18:16 AM
Subject: Re: Kafka->HDFS to store as Parquet format


Thanks for the info Soumitra.. its a good start for me. 


Just wanted to know how you are managing schema changes/evolution as 
parquetSchema is provided to setSchema in the above sample code. 


On Tue, Oct 7, 2014 at 10:09 AM, Soumitra Kumar < kumar.soumi...@gmail.com > 
wrote: 


I have used it to write Parquet files as: 

val job = new Job 
val conf = job.getConfiguration 
conf.set (ParquetOutputFormat.COMPRESSION, CompressionCodecName.SNAPPY.name ()) 
ExampleOutputFormat.setSchema (job, MessageTypeParser.parseMessageType 
(parquetSchema)) 
rdd saveAsNewAPIHadoopFile (rddToFileName (outputDir, em, time), classOf[Void], 
classOf[Group], classOf[ExampleOutputFormat], conf) 



----- Original Message ----- 
From: "bdev" < buntu...@gmail.com > 
To: u...@spark.incubator.apache.org 
Sent: Tuesday, October 7, 2014 9:51:40 AM 
Subject: Re: Kafka->HDFS to store as Parquet format 

After a bit of looking around, I found saveAsNewAPIHadoopFile could be used 
to specify the ParquetOutputFormat. Has anyone used it to convert JSON to 
Parquet format or any pointers are welcome, thanks! 



-- 
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-HDFS-to-store-as-Parquet-format-tp15768p15852.html
 
Sent from the Apache Spark User List mailing list archive at Nabble.com. 

--------------------------------------------------------------------- 
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
For additional commands, e-mail: user-h...@spark.apache.org 



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to