can you show the output of df.printSchema? Just a guess but I think I ran
into something similar with a column that was part of a path in parquet.
E.g. we had an account_id in the parquet file data itself which was of type
string but we also named the files in the following manner
/somepath/account_id=.../file.parquet. Since Spark uses the paths for
partition discovery, it was actually inferring that account_id is a numeric
type and upon reading the data, we ran into the exception you're describing
(this is in Spark 1.4)..

On Fri, Oct 9, 2015 at 7:55 PM, Abhisheks <smartsho...@gmail.com> wrote:

> Hi there,
>
> I have saved my records in to parquet format and am using Spark1.5. But
> when
> I try to fetch the columns it throws exception*
> java.lang.ClassCastException: java.lang.Long cannot be cast to
> org.apache.spark.unsafe.types.UTF8String*.
>
> This filed is saved as String while writing parquet. so here is the sample
> code and output for the same..
>
> logger.info("troubling thing is ::" +
> sqlContext.sql(fileSelectQuery).schema().toString());
> DataFrame df= sqlContext.sql(fileSelectQuery);
> JavaRDD<Row> rdd2 = df.toJavaRDD();
>
> First Line in the code (Logger) prints this:
> troubling thing is ::StructType(StructField(batch_id,StringType,true))
>
> But the moment after it the execption comes up.
>
> Any idea why it is treating the filed as Long? (yeah one unique thing about
> column is it is always a number e.g. Time-stamp).
>
> Any help is appreciated.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to