Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-09 Thread Ranga
Resolution:
After realizing that the SerDe (OpenCSV) was causing all the fields to be
defined as "String" type, I modified the Hive "load" statement to use the
default serializer. I was able to modify the CSV input file to use a
different delimiter. Although, this is a workaround, I am able to proceed
with this for now.


- Ranga

On Wed, Oct 8, 2014 at 9:18 PM, Ranga  wrote:

> This is a bit strange. When I print the schema for the RDD, it reflects
> the correct data type for each column. But doing any kind of mathematical
> calculation seems to result in ClassCastException. Here is a sample that
> results in the exception:
> select c1, c2
> ...
> cast (c18 as int) * cast (c21 as int)
> ...
> from table
>
> Any other pointers? Thanks for the help.
>
>
> - Ranga
>
> On Wed, Oct 8, 2014 at 5:20 PM, Ranga  wrote:
>
>> Sorry. Its 1.1.0.
>> After digging a bit more into this, it seems like the OpenCSV Deseralizer
>> converts all the columns to a String type. This maybe throwing the
>> execution off. Planning to create a class and map the rows to this custom
>> class. Will keep this thread updated.
>>
>> On Wed, Oct 8, 2014 at 5:11 PM, Michael Armbrust 
>> wrote:
>>
>>> Which version of Spark are you running?
>>>
>>> On Wed, Oct 8, 2014 at 4:18 PM, Ranga  wrote:
>>>
 Thanks Michael. Should the cast be done in the source RDD or while
 doing the SUM?
 To give a better picture here is the code sequence:

 val sourceRdd = sql("select ... from source-hive-table")
 sourceRdd.registerAsTable("sourceRDD")
 val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1,
 c2)  // This query throws the exception when I collect the results

 I tried adding the cast to the aggRdd query above and that didn't help.


 - Ranga

 On Wed, Oct 8, 2014 at 3:52 PM, Michael Armbrust <
 mich...@databricks.com> wrote:

> Using SUM on a string should automatically cast the column.  Also you
> can use CAST to change the datatype
> 
> .
>
> What version of Spark are you running?  This could be
> https://issues.apache.org/jira/browse/SPARK-1994
>
> On Wed, Oct 8, 2014 at 3:47 PM, Ranga  wrote:
>
>> Hi
>>
>> I am in the process of migrating some logic in pig scripts to
>> Spark-SQL. As part of this process, I am creating a few "Select...Group 
>> By"
>> query and registering them as tables using the SchemaRDD.registerAsTable
>> feature.
>> When using such a registered table in a subsequent "Select...Group
>> By" query, I get a "ClassCastException".
>> java.lang.ClassCastException: java.lang.String cannot be cast to
>> java.lang.Integer
>>
>> This happens when I use the "Sum" function on one of the columns. Is
>> there anyway to specify the data type for the columns when the
>> registerAsTable function is called? Are there other approaches that I
>> should be looking at?
>>
>> Thanks for your help.
>>
>>
>>
>> - Ranga
>>
>
>

>>>
>>
>


Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Ranga
This is a bit strange. When I print the schema for the RDD, it reflects the
correct data type for each column. But doing any kind of mathematical
calculation seems to result in ClassCastException. Here is a sample that
results in the exception:
select c1, c2
...
cast (c18 as int) * cast (c21 as int)
...
from table

Any other pointers? Thanks for the help.


- Ranga

On Wed, Oct 8, 2014 at 5:20 PM, Ranga  wrote:

> Sorry. Its 1.1.0.
> After digging a bit more into this, it seems like the OpenCSV Deseralizer
> converts all the columns to a String type. This maybe throwing the
> execution off. Planning to create a class and map the rows to this custom
> class. Will keep this thread updated.
>
> On Wed, Oct 8, 2014 at 5:11 PM, Michael Armbrust 
> wrote:
>
>> Which version of Spark are you running?
>>
>> On Wed, Oct 8, 2014 at 4:18 PM, Ranga  wrote:
>>
>>> Thanks Michael. Should the cast be done in the source RDD or while doing
>>> the SUM?
>>> To give a better picture here is the code sequence:
>>>
>>> val sourceRdd = sql("select ... from source-hive-table")
>>> sourceRdd.registerAsTable("sourceRDD")
>>> val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1, c2)
>>>  // This query throws the exception when I collect the results
>>>
>>> I tried adding the cast to the aggRdd query above and that didn't help.
>>>
>>>
>>> - Ranga
>>>
>>> On Wed, Oct 8, 2014 at 3:52 PM, Michael Armbrust >> > wrote:
>>>
 Using SUM on a string should automatically cast the column.  Also you
 can use CAST to change the datatype
 
 .

 What version of Spark are you running?  This could be
 https://issues.apache.org/jira/browse/SPARK-1994

 On Wed, Oct 8, 2014 at 3:47 PM, Ranga  wrote:

> Hi
>
> I am in the process of migrating some logic in pig scripts to
> Spark-SQL. As part of this process, I am creating a few "Select...Group 
> By"
> query and registering them as tables using the SchemaRDD.registerAsTable
> feature.
> When using such a registered table in a subsequent "Select...Group By"
> query, I get a "ClassCastException".
> java.lang.ClassCastException: java.lang.String cannot be cast to
> java.lang.Integer
>
> This happens when I use the "Sum" function on one of the columns. Is
> there anyway to specify the data type for the columns when the
> registerAsTable function is called? Are there other approaches that I
> should be looking at?
>
> Thanks for your help.
>
>
>
> - Ranga
>


>>>
>>
>


Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Ranga
Sorry. Its 1.1.0.
After digging a bit more into this, it seems like the OpenCSV Deseralizer
converts all the columns to a String type. This maybe throwing the
execution off. Planning to create a class and map the rows to this custom
class. Will keep this thread updated.

On Wed, Oct 8, 2014 at 5:11 PM, Michael Armbrust 
wrote:

> Which version of Spark are you running?
>
> On Wed, Oct 8, 2014 at 4:18 PM, Ranga  wrote:
>
>> Thanks Michael. Should the cast be done in the source RDD or while doing
>> the SUM?
>> To give a better picture here is the code sequence:
>>
>> val sourceRdd = sql("select ... from source-hive-table")
>> sourceRdd.registerAsTable("sourceRDD")
>> val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1, c2)
>>  // This query throws the exception when I collect the results
>>
>> I tried adding the cast to the aggRdd query above and that didn't help.
>>
>>
>> - Ranga
>>
>> On Wed, Oct 8, 2014 at 3:52 PM, Michael Armbrust 
>> wrote:
>>
>>> Using SUM on a string should automatically cast the column.  Also you
>>> can use CAST to change the datatype
>>> 
>>> .
>>>
>>> What version of Spark are you running?  This could be
>>> https://issues.apache.org/jira/browse/SPARK-1994
>>>
>>> On Wed, Oct 8, 2014 at 3:47 PM, Ranga  wrote:
>>>
 Hi

 I am in the process of migrating some logic in pig scripts to
 Spark-SQL. As part of this process, I am creating a few "Select...Group By"
 query and registering them as tables using the SchemaRDD.registerAsTable
 feature.
 When using such a registered table in a subsequent "Select...Group By"
 query, I get a "ClassCastException".
 java.lang.ClassCastException: java.lang.String cannot be cast to
 java.lang.Integer

 This happens when I use the "Sum" function on one of the columns. Is
 there anyway to specify the data type for the columns when the
 registerAsTable function is called? Are there other approaches that I
 should be looking at?

 Thanks for your help.



 - Ranga

>>>
>>>
>>
>


Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Michael Armbrust
Which version of Spark are you running?

On Wed, Oct 8, 2014 at 4:18 PM, Ranga  wrote:

> Thanks Michael. Should the cast be done in the source RDD or while doing
> the SUM?
> To give a better picture here is the code sequence:
>
> val sourceRdd = sql("select ... from source-hive-table")
> sourceRdd.registerAsTable("sourceRDD")
> val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1, c2)
>  // This query throws the exception when I collect the results
>
> I tried adding the cast to the aggRdd query above and that didn't help.
>
>
> - Ranga
>
> On Wed, Oct 8, 2014 at 3:52 PM, Michael Armbrust 
> wrote:
>
>> Using SUM on a string should automatically cast the column.  Also you can
>> use CAST to change the datatype
>> 
>> .
>>
>> What version of Spark are you running?  This could be
>> https://issues.apache.org/jira/browse/SPARK-1994
>>
>> On Wed, Oct 8, 2014 at 3:47 PM, Ranga  wrote:
>>
>>> Hi
>>>
>>> I am in the process of migrating some logic in pig scripts to Spark-SQL.
>>> As part of this process, I am creating a few "Select...Group By" query and
>>> registering them as tables using the SchemaRDD.registerAsTable feature.
>>> When using such a registered table in a subsequent "Select...Group By"
>>> query, I get a "ClassCastException".
>>> java.lang.ClassCastException: java.lang.String cannot be cast to
>>> java.lang.Integer
>>>
>>> This happens when I use the "Sum" function on one of the columns. Is
>>> there anyway to specify the data type for the columns when the
>>> registerAsTable function is called? Are there other approaches that I
>>> should be looking at?
>>>
>>> Thanks for your help.
>>>
>>>
>>>
>>> - Ranga
>>>
>>
>>
>


Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Ranga
Thanks Michael. Should the cast be done in the source RDD or while doing
the SUM?
To give a better picture here is the code sequence:

val sourceRdd = sql("select ... from source-hive-table")
sourceRdd.registerAsTable("sourceRDD")
val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1, c2)
 // This query throws the exception when I collect the results

I tried adding the cast to the aggRdd query above and that didn't help.


- Ranga

On Wed, Oct 8, 2014 at 3:52 PM, Michael Armbrust 
wrote:

> Using SUM on a string should automatically cast the column.  Also you can
> use CAST to change the datatype
> 
> .
>
> What version of Spark are you running?  This could be
> https://issues.apache.org/jira/browse/SPARK-1994
>
> On Wed, Oct 8, 2014 at 3:47 PM, Ranga  wrote:
>
>> Hi
>>
>> I am in the process of migrating some logic in pig scripts to Spark-SQL.
>> As part of this process, I am creating a few "Select...Group By" query and
>> registering them as tables using the SchemaRDD.registerAsTable feature.
>> When using such a registered table in a subsequent "Select...Group By"
>> query, I get a "ClassCastException".
>> java.lang.ClassCastException: java.lang.String cannot be cast to
>> java.lang.Integer
>>
>> This happens when I use the "Sum" function on one of the columns. Is
>> there anyway to specify the data type for the columns when the
>> registerAsTable function is called? Are there other approaches that I
>> should be looking at?
>>
>> Thanks for your help.
>>
>>
>>
>> - Ranga
>>
>
>


Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Michael Armbrust
Using SUM on a string should automatically cast the column.  Also you can
use CAST to change the datatype

.

What version of Spark are you running?  This could be
https://issues.apache.org/jira/browse/SPARK-1994

On Wed, Oct 8, 2014 at 3:47 PM, Ranga  wrote:

> Hi
>
> I am in the process of migrating some logic in pig scripts to Spark-SQL.
> As part of this process, I am creating a few "Select...Group By" query and
> registering them as tables using the SchemaRDD.registerAsTable feature.
> When using such a registered table in a subsequent "Select...Group By"
> query, I get a "ClassCastException".
> java.lang.ClassCastException: java.lang.String cannot be cast to
> java.lang.Integer
>
> This happens when I use the "Sum" function on one of the columns. Is there
> anyway to specify the data type for the columns when the registerAsTable
> function is called? Are there other approaches that I should be looking at?
>
> Thanks for your help.
>
>
>
> - Ranga
>