This is a bit strange. When I print the schema for the RDD, it reflects the correct data type for each column. But doing any kind of mathematical calculation seems to result in ClassCastException. Here is a sample that results in the exception: select c1, c2 ... cast (c18 as int) * cast (c21 as int) ... from table
Any other pointers? Thanks for the help. - Ranga On Wed, Oct 8, 2014 at 5:20 PM, Ranga <sra...@gmail.com> wrote: > Sorry. Its 1.1.0. > After digging a bit more into this, it seems like the OpenCSV Deseralizer > converts all the columns to a String type. This maybe throwing the > execution off. Planning to create a class and map the rows to this custom > class. Will keep this thread updated. > > On Wed, Oct 8, 2014 at 5:11 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> Which version of Spark are you running? >> >> On Wed, Oct 8, 2014 at 4:18 PM, Ranga <sra...@gmail.com> wrote: >> >>> Thanks Michael. Should the cast be done in the source RDD or while doing >>> the SUM? >>> To give a better picture here is the code sequence: >>> >>> val sourceRdd = sql("select ... from source-hive-table") >>> sourceRdd.registerAsTable("sourceRDD") >>> val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1, c2) >>> // This query throws the exception when I collect the results >>> >>> I tried adding the cast to the aggRdd query above and that didn't help. >>> >>> >>> - Ranga >>> >>> On Wed, Oct 8, 2014 at 3:52 PM, Michael Armbrust <mich...@databricks.com >>> > wrote: >>> >>>> Using SUM on a string should automatically cast the column. Also you >>>> can use CAST to change the datatype >>>> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-TypeConversionFunctions> >>>> . >>>> >>>> What version of Spark are you running? This could be >>>> https://issues.apache.org/jira/browse/SPARK-1994 >>>> >>>> On Wed, Oct 8, 2014 at 3:47 PM, Ranga <sra...@gmail.com> wrote: >>>> >>>>> Hi >>>>> >>>>> I am in the process of migrating some logic in pig scripts to >>>>> Spark-SQL. As part of this process, I am creating a few "Select...Group >>>>> By" >>>>> query and registering them as tables using the SchemaRDD.registerAsTable >>>>> feature. >>>>> When using such a registered table in a subsequent "Select...Group By" >>>>> query, I get a "ClassCastException". >>>>> java.lang.ClassCastException: java.lang.String cannot be cast to >>>>> java.lang.Integer >>>>> >>>>> This happens when I use the "Sum" function on one of the columns. Is >>>>> there anyway to specify the data type for the columns when the >>>>> registerAsTable function is called? Are there other approaches that I >>>>> should be looking at? >>>>> >>>>> Thanks for your help. >>>>> >>>>> >>>>> >>>>> - Ranga >>>>> >>>> >>>> >>> >> >