Hi everyone, I am using Spark 1.0.0 and I am facing some issues with handling binary snappy compressed avro files which I get form HDFS. I know there are improved mechanisms to handle these files on more recent version of Spark, but updating is not an option since I am operating on a Cloudera cluster with no admin privileges.
I would simply like to get some of these avro files, create de RDD and then do simple SQL queries to their content. By following Spark SQL 1.0.0 Programming Guide, we have: */val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext._ val myData = sc.textFile("/example/mydir/MyFile1.avro") ### QUESTION ### ### How to dynamically define the schema from the Avro header?? ### # # val Schema = myData.registerAsTable("MyDB") val query = sql("SELECT * FROM MyDB") query.collect().foreach(println)/* so, how would you modify this to make it work (considering the Spark version)? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org