Re: Spark SQL 1.0.0 - RDD from snappy compress avro file

2014-12-03 Thread cjdc
Ideas?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998p20267.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL 1.0.0 - RDD from snappy compress avro file

2014-12-01 Thread cjdc
btw the same error from above also happen on 1.1.0 (just tested)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998p20106.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL 1.0.0 - RDD from snappy compress avro file

2014-12-01 Thread cjdc
Hi Vikas and Simone,

thanks for the replies.
Yeah I understand this would be easier with 1.2 but this is completely out
of my control. I really have to work with 1.0.0.

About Simone's approach, during the imports I get:
/scala> import org.apache.avro.mapreduce.{ AvroJob, AvroKeyInputFormat,
AvroKeyOutputFormat }
:17: error: object mapreduce is not a member of package
org.apache.avro
   import org.apache.avro.mapreduce.{ AvroJob, AvroKeyInputFormat,
AvroKeyOutputFormat }
  ^

scala> import org.apache.avro.mapred.AvroKey
:17: error: object mapred is not a member of package
org.apache.avro
   import org.apache.avro.mapred.AvroKey
  ^
scala> import com.twitter.chill.avro.AvroSerializer
:18: error: object avro is not a member of package
com.twitter.chill
   import com.twitter.chill.avro.AvroSerializer
^/






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998p20073.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL 1.0.0 - RDD from snappy compress avro file

2014-11-29 Thread Simone Franzini
Did you have a look at my reply in this thread?

http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-read-this-avro-file-using-spark-amp-scala-td19400.html

I am using 1.1.0 though, so not sure if that code would work entirely with
1.0.0, but you can try.


Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

On Sat, Nov 29, 2014 at 5:43 AM, Vikas Agarwal 
wrote:

> Just in case it helps: https://github.com/databricks/spark-avro
>
> On Fri, Nov 28, 2014 at 8:48 PM, cjdc  wrote:
>
>> To make it simpler, for now forget the snappy compression. Just assume
>> they
>> are binary Avro files...
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998p20008.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> Regards,
> Vikas Agarwal
> 91 – 9928301411
>
> InfoObjects, Inc.
> Execution Matters
> http://www.infoobjects.com
> 2041 Mission College Boulevard, #280
> Santa Clara, CA 95054
> +1 (408) 988-2000 Work
> +1 (408) 716-2726 Fax
>
>


Re: Spark SQL 1.0.0 - RDD from snappy compress avro file

2014-11-29 Thread Vikas Agarwal
Just in case it helps: https://github.com/databricks/spark-avro

On Fri, Nov 28, 2014 at 8:48 PM, cjdc  wrote:

> To make it simpler, for now forget the snappy compression. Just assume they
> are binary Avro files...
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998p20008.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Regards,
Vikas Agarwal
91 – 9928301411

InfoObjects, Inc.
Execution Matters
http://www.infoobjects.com
2041 Mission College Boulevard, #280
Santa Clara, CA 95054
+1 (408) 988-2000 Work
+1 (408) 716-2726 Fax


Re: Spark SQL 1.0.0 - RDD from snappy compress avro file

2014-11-28 Thread cjdc
To make it simpler, for now forget the snappy compression. Just assume they
are binary Avro files...





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998p20008.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark SQL 1.0.0 - RDD from snappy compress avro file

2014-11-28 Thread cjdc
Hi everyone,

I am using Spark 1.0.0 and I am facing some issues with handling binary
snappy compressed avro files which I get form HDFS. I know there are
improved mechanisms to handle these files on more recent version of Spark,
but updating is not an option since I am operating on a Cloudera cluster
with no admin privileges.

I would simply like to get some of these avro files, create de RDD and then
do simple SQL queries to their content.
By following Spark SQL 1.0.0 Programming Guide, we have:

*/val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._

val myData = sc.textFile("/example/mydir/MyFile1.avro")
### QUESTION ###
### How to dynamically define the schema from the Avro header?? ###
#
# val Schema = 


myData.registerAsTable("MyDB")

val query = sql("SELECT * FROM MyDB")
query.collect().foreach(println)/*

so, how would you modify this to make it work (considering the Spark
version)?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org