[jira] [Commented] (SPARK-5498) [SPARK-SQL]when the partition schema does not match table schema,it throws java.lang.ClassCastException and so on

2018-03-16 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403176#comment-16403176
 ] 

Dongjoon Hyun commented on SPARK-5498:
--

And, SPARK-5498 has resolved 3 years ago. Please open a new issue if you need.

> [SPARK-SQL]when the partition schema does not match table schema,it throws 
> java.lang.ClassCastException and so on
> -
>
> Key: SPARK-5498
> URL: https://issues.apache.org/jira/browse/SPARK-5498
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0, 2.2.0
>Reporter: jeanlyn
>Assignee: jeanlyn
>Priority: Major
> Fix For: 1.4.0, 3.0.0
>
>
> when the partition schema does not match table schema,it will thows exception 
> when the task is running.For example,we modify the type of column from int to 
> bigint by the sql *ALTER TABLE table_with_partition CHANGE COLUMN key key 
> BIGINT* ,then we query the patition data which was stored before the 
> changing,we would get the exception:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 27.0 failed 4 times, most recent failure: Lost task 0.3 in stage 27.0 
> (TID 30, BJHC-HADOOP-HERA-16950.jeanlyn.local): java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to 
> org.apache.spark.sql.catalyst.expressions.MutableInt
> at 
> org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.setInt(SpecificMutableRow.scala:241)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:322)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:314)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
> at 
> org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141)
> at 
> org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
> at 
> 

[jira] [Commented] (SPARK-5498) [SPARK-SQL]when the partition schema does not match table schema,it throws java.lang.ClassCastException and so on

2018-03-16 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403175#comment-16403175
 ] 

Dongjoon Hyun commented on SPARK-5498:
--

[~liutang123]. Spark should not do this kind of risky thing. Hive 2.3.2 also 
disallows incompatible schema changes like the following. 

{code}
hive> CREATE TABLE test_par(a string) PARTITIONED BY (b bigint) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
OK
Time taken: 0.262 seconds

hive> ALTER TABLE test_par CHANGE a a bigint RESTRICT;
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following 
columns have types incompatible with the existing columns in their respective 
positions :
a

hive> SELECT VERSION();
OK
2.3.2 r857a9fd8ad725a53bd95c1b2d6612f9b1155f44d
Time taken: 0.711 seconds, Fetched: 1 row(s)
{code}

> [SPARK-SQL]when the partition schema does not match table schema,it throws 
> java.lang.ClassCastException and so on
> -
>
> Key: SPARK-5498
> URL: https://issues.apache.org/jira/browse/SPARK-5498
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0, 2.2.0
>Reporter: jeanlyn
>Assignee: jeanlyn
>Priority: Major
> Fix For: 1.4.0, 3.0.0
>
>
> when the partition schema does not match table schema,it will thows exception 
> when the task is running.For example,we modify the type of column from int to 
> bigint by the sql *ALTER TABLE table_with_partition CHANGE COLUMN key key 
> BIGINT* ,then we query the patition data which was stored before the 
> changing,we would get the exception:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 27.0 failed 4 times, most recent failure: Lost task 0.3 in stage 27.0 
> (TID 30, BJHC-HADOOP-HERA-16950.jeanlyn.local): java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to 
> org.apache.spark.sql.catalyst.expressions.MutableInt
> at 
> org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.setInt(SpecificMutableRow.scala:241)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:322)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:314)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
> at 
> org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141)
> at 
> org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at 

[jira] [Commented] (SPARK-5498) [SPARK-SQL]when the partition schema does not match table schema,it throws java.lang.ClassCastException and so on

2018-03-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401717#comment-16401717
 ] 

Apache Spark commented on SPARK-5498:
-

User 'liutang123' has created a pull request for this issue:
https://github.com/apache/spark/pull/20846

> [SPARK-SQL]when the partition schema does not match table schema,it throws 
> java.lang.ClassCastException and so on
> -
>
> Key: SPARK-5498
> URL: https://issues.apache.org/jira/browse/SPARK-5498
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0, 2.2.0
>Reporter: jeanlyn
>Assignee: jeanlyn
>Priority: Major
> Fix For: 1.4.0, 3.0.0
>
>
> when the partition schema does not match table schema,it will thows exception 
> when the task is running.For example,we modify the type of column from int to 
> bigint by the sql *ALTER TABLE table_with_partition CHANGE COLUMN key key 
> BIGINT* ,then we query the patition data which was stored before the 
> changing,we would get the exception:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 27.0 failed 4 times, most recent failure: Lost task 0.3 in stage 27.0 
> (TID 30, BJHC-HADOOP-HERA-16950.jeanlyn.local): java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to 
> org.apache.spark.sql.catalyst.expressions.MutableInt
> at 
> org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.setInt(SpecificMutableRow.scala:241)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:322)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:314)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
> at 
> org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141)
> at 
> org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
> at 
> 

[jira] [Commented] (SPARK-5498) [SPARK-SQL]when the partition schema does not match table schema,it throws java.lang.ClassCastException and so on

2015-01-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298648#comment-14298648
 ] 

Apache Spark commented on SPARK-5498:
-

User 'jeanlyn' has created a pull request for this issue:
https://github.com/apache/spark/pull/4289

 [SPARK-SQL]when the partition schema does not match table schema,it throws 
 java.lang.ClassCastException and so on
 -

 Key: SPARK-5498
 URL: https://issues.apache.org/jira/browse/SPARK-5498
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: jeanlyn

 when the partition schema does not match table schema,it will thows exception 
 when the task is running.For example,we modify the type of column from int to 
 bigint by the sql *ALTER TABLE table_with_partition CHANGE COLUMN key key 
 BIGINT* ,then we query the patition data which was stored before the 
 changing,we would get the exception:
 {noformat}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 27.0 failed 4 times, most recent failure: Lost task 0.3 in stage 27.0 
 (TID 30, BJHC-HADOOP-HERA-16950.jeanlyn.local): java.lang.ClassCastException: 
 org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to 
 org.apache.spark.sql.catalyst.expressions.MutableInt
 at 
 org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.setInt(SpecificMutableRow.scala:241)
 at 
 org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286)
 at 
 org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286)
 at 
 org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:322)
 at 
 org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:314)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
 at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
 at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
 at 
 scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
 at scala.collection.AbstractIterator.to(Iterator.scala:1157)
 at 
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
 at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
 at 
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
 at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
 at 
 org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141)
 at 
 org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141)
 at 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
 at 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 at org.apache.spark.scheduler.Task.run(Task.scala:56)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Driver stacktrace:
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
 at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
 at