[jira] [Updated] (SPARK-3231) select on a table in parquet format containing smallint as a field type does not work

2015-09-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3231:
-
Assignee: Alex Rovner

> select on a table in parquet format containing smallint as a field type does 
> not work
> -
>
> Key: SPARK-3231
> URL: https://issues.apache.org/jira/browse/SPARK-3231
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
> Environment: The table is created through Hive-0.13.
> SparkSql 1.1 is used.
>Reporter: chirag aggarwal
>Assignee: Alex Rovner
> Fix For: 1.5.0
>
>
> A table is created through hive. This table has a field of type smallint. The 
> format of the table is parquet.
> select on this table works perfectly on hive shell.
> But, when the select is run on this table from spark-sql, then the query 
> fails.
> Steps to reproduce the issue:
> --
> hive> create table abct (a smallint, b int) row format delimited fields 
> terminated by '|' stored as textfile;
> A text file is stored in hdfs for this table.
> hive> create table abc (a smallint, b int) stored as parquet; 
> hive> insert overwrite table abc select * from abct;
> hive> select * from abc;
> 2 1
> 2 2
> 2 3
> spark-sql> select * from abc;
> 10:08:46 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0 in stage 33.0 (TID 2340) had a not serializable 
> result: org.apache.hadoop.io.IntWritable
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1158)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1147)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1146)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1146)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:685)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:685)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:685)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1364)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> But, if the type of this table is now changed to int, then spark-sql gives 
> the correct results.
> hive> alter table abc change a a int;
> spark-sql> select * from abc;
> 2 1
> 2 2
> 2 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3231) select on a table in parquet format containing smallint as a field type does not work

2014-08-26 Thread chirag aggarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chirag aggarwal updated SPARK-3231:
---

Summary: select on a table in parquet format containing smallint as a field 
type does not work  (was: select on a table in parquet format containing 
smallest as a field type does not work)

 select on a table in parquet format containing smallint as a field type does 
 not work
 -

 Key: SPARK-3231
 URL: https://issues.apache.org/jira/browse/SPARK-3231
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
 Environment: The table is created through Hive-0.13.
 SparkSql 1.1 is used.
Reporter: chirag aggarwal

 A table is created through hive. This table has a field of type smallint. The 
 format of the table is parquet.
 select on this table works perfectly on hive shell.
 But, when the select is run on this table from spark-sql, then the query 
 fails.
 Steps to reproduce the issue:
 --
 hive create table abct (a smallint, b int) row format delimited fields 
 terminated by '|' stored as textfile;
 A text file is stored in hdfs for this table.
 hive create table abc (a smallint, b int) stored as parquet; 
 hive insert overwrite table abc select * from abct;
 hive select * from abc;
 2 1
 2 2
 2 3
 spark-sql select * from abc;
 10:08:46 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to 
 stage failure: Task 0.0 in stage 33.0 (TID 2340) had a not serializable 
 result: org.apache.hadoop.io.IntWritable
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1158)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1147)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1146)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1146)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:685)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:685)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:685)
   at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1364)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 But, if the type of this table is now changed to int, then spark-sql gives 
 the correct results.
 hive alter table abc change a a int;
 spark-sql select * from abc;
 2 1
 2 2
 2 3



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org