Spark SQL 1.0.0 - RDD from snappy compress avro file

cjdc Fri, 28 Nov 2014 00:43:17 -0800

Hi everyone,

I am using Spark 1.0.0 and I am facing some issues with handling binary
snappy compressed avro files which I get form HDFS. I know there are
improved mechanisms to handle these files on more recent version of Spark,
but updating is not an option since I am operating on a Cloudera cluster
with no admin privileges.


I would simply like to get some of these avro files, create de RDD and then
do simple SQL queries to their content.
By following Spark SQL 1.0.0 Programming Guide, we have:

*/val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._

val myData = sc.textFile("/example/mydir/MyFile1.avro")
### QUESTION ###
### How to dynamically define the schema from the Avro header?? ###
#
# val Schema = 


myData.registerAsTable("MyDB")

val query = sql("SELECT * FROM MyDB")
query.collect().foreach(println)/*

so, how would you modify this to make it work (considering the Spark
version)?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark SQL 1.0.0 - RDD from snappy compress avro file

Reply via email to