Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-09 Thread Ranga
Resolution: After realizing that the SerDe (OpenCSV) was causing all the fields to be defined as "String" type, I modified the Hive "load" statement to use the default serializer. I was able to modify the CSV input file to use a different delimiter. Although, this is a workaround, I am able to proc

Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Ranga
This is a bit strange. When I print the schema for the RDD, it reflects the correct data type for each column. But doing any kind of mathematical calculation seems to result in ClassCastException. Here is a sample that results in the exception: select c1, c2 ... cast (c18 as int) * cast (c21 as int

Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Ranga
Sorry. Its 1.1.0. After digging a bit more into this, it seems like the OpenCSV Deseralizer converts all the columns to a String type. This maybe throwing the execution off. Planning to create a class and map the rows to this custom class. Will keep this thread updated. On Wed, Oct 8, 2014 at 5:11

Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Michael Armbrust
Which version of Spark are you running? On Wed, Oct 8, 2014 at 4:18 PM, Ranga wrote: > Thanks Michael. Should the cast be done in the source RDD or while doing > the SUM? > To give a better picture here is the code sequence: > > val sourceRdd = sql("select ... from source-hive-table") > sourceRd

Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Ranga
Thanks Michael. Should the cast be done in the source RDD or while doing the SUM? To give a better picture here is the code sequence: val sourceRdd = sql("select ... from source-hive-table") sourceRdd.registerAsTable("sourceRDD") val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1,

Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Michael Armbrust
Using SUM on a string should automatically cast the column. Also you can use CAST to change the datatype . What version of Spark are you running? This could be https://issues.apache.or

Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Ranga
Hi I am in the process of migrating some logic in pig scripts to Spark-SQL. As part of this process, I am creating a few "Select...Group By" query and registering them as tables using the SchemaRDD.registerAsTable feature. When using such a registered table in a subsequent "Select...Group By" quer