[jira] [Commented] (SPARK-5498) [SPARK-SQL]when the partition schema does not match table schema,it throws java.lang.ClassCastException and so on
[ https://issues.apache.org/jira/browse/SPARK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403176#comment-16403176 ] Dongjoon Hyun commented on SPARK-5498: -- And, SPARK-5498 has resolved 3 years ago. Please open a new issue if you need. > [SPARK-SQL]when the partition schema does not match table schema,it throws > java.lang.ClassCastException and so on > - > > Key: SPARK-5498 > URL: https://issues.apache.org/jira/browse/SPARK-5498 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0, 2.2.0 >Reporter: jeanlyn >Assignee: jeanlyn >Priority: Major > Fix For: 1.4.0, 3.0.0 > > > when the partition schema does not match table schema,it will thows exception > when the task is running.For example,we modify the type of column from int to > bigint by the sql *ALTER TABLE table_with_partition CHANGE COLUMN key key > BIGINT* ,then we query the patition data which was stored before the > changing,we would get the exception: > {noformat} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 27.0 failed 4 times, most recent failure: Lost task 0.3 in stage 27.0 > (TID 30, BJHC-HADOOP-HERA-16950.jeanlyn.local): java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to > org.apache.spark.sql.catalyst.expressions.MutableInt > at > org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.setInt(SpecificMutableRow.scala:241) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:322) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:314) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141) > at > org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202) > at >
[jira] [Commented] (SPARK-5498) [SPARK-SQL]when the partition schema does not match table schema,it throws java.lang.ClassCastException and so on
[ https://issues.apache.org/jira/browse/SPARK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403175#comment-16403175 ] Dongjoon Hyun commented on SPARK-5498: -- [~liutang123]. Spark should not do this kind of risky thing. Hive 2.3.2 also disallows incompatible schema changes like the following. {code} hive> CREATE TABLE test_par(a string) PARTITIONED BY (b bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; OK Time taken: 0.262 seconds hive> ALTER TABLE test_par CHANGE a a bigint RESTRICT; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions : a hive> SELECT VERSION(); OK 2.3.2 r857a9fd8ad725a53bd95c1b2d6612f9b1155f44d Time taken: 0.711 seconds, Fetched: 1 row(s) {code} > [SPARK-SQL]when the partition schema does not match table schema,it throws > java.lang.ClassCastException and so on > - > > Key: SPARK-5498 > URL: https://issues.apache.org/jira/browse/SPARK-5498 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0, 2.2.0 >Reporter: jeanlyn >Assignee: jeanlyn >Priority: Major > Fix For: 1.4.0, 3.0.0 > > > when the partition schema does not match table schema,it will thows exception > when the task is running.For example,we modify the type of column from int to > bigint by the sql *ALTER TABLE table_with_partition CHANGE COLUMN key key > BIGINT* ,then we query the patition data which was stored before the > changing,we would get the exception: > {noformat} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 27.0 failed 4 times, most recent failure: Lost task 0.3 in stage 27.0 > (TID 30, BJHC-HADOOP-HERA-16950.jeanlyn.local): java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to > org.apache.spark.sql.catalyst.expressions.MutableInt > at > org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.setInt(SpecificMutableRow.scala:241) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:322) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:314) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141) > at > org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at
[jira] [Commented] (SPARK-5498) [SPARK-SQL]when the partition schema does not match table schema,it throws java.lang.ClassCastException and so on
[ https://issues.apache.org/jira/browse/SPARK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401717#comment-16401717 ] Apache Spark commented on SPARK-5498: - User 'liutang123' has created a pull request for this issue: https://github.com/apache/spark/pull/20846 > [SPARK-SQL]when the partition schema does not match table schema,it throws > java.lang.ClassCastException and so on > - > > Key: SPARK-5498 > URL: https://issues.apache.org/jira/browse/SPARK-5498 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0, 2.2.0 >Reporter: jeanlyn >Assignee: jeanlyn >Priority: Major > Fix For: 1.4.0, 3.0.0 > > > when the partition schema does not match table schema,it will thows exception > when the task is running.For example,we modify the type of column from int to > bigint by the sql *ALTER TABLE table_with_partition CHANGE COLUMN key key > BIGINT* ,then we query the patition data which was stored before the > changing,we would get the exception: > {noformat} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 27.0 failed 4 times, most recent failure: Lost task 0.3 in stage 27.0 > (TID 30, BJHC-HADOOP-HERA-16950.jeanlyn.local): java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to > org.apache.spark.sql.catalyst.expressions.MutableInt > at > org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.setInt(SpecificMutableRow.scala:241) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:322) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:314) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141) > at > org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202) > at >
[jira] [Commented] (SPARK-5498) [SPARK-SQL]when the partition schema does not match table schema,it throws java.lang.ClassCastException and so on
[ https://issues.apache.org/jira/browse/SPARK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298648#comment-14298648 ] Apache Spark commented on SPARK-5498: - User 'jeanlyn' has created a pull request for this issue: https://github.com/apache/spark/pull/4289 [SPARK-SQL]when the partition schema does not match table schema,it throws java.lang.ClassCastException and so on - Key: SPARK-5498 URL: https://issues.apache.org/jira/browse/SPARK-5498 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: jeanlyn when the partition schema does not match table schema,it will thows exception when the task is running.For example,we modify the type of column from int to bigint by the sql *ALTER TABLE table_with_partition CHANGE COLUMN key key BIGINT* ,then we query the patition data which was stored before the changing,we would get the exception: {noformat} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 4 times, most recent failure: Lost task 0.3 in stage 27.0 (TID 30, BJHC-HADOOP-HERA-16950.jeanlyn.local): java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to org.apache.spark.sql.catalyst.expressions.MutableInt at org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.setInt(SpecificMutableRow.scala:241) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$13$$anonfun$apply$4.apply(TableReader.scala:286) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:322) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:314) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141) at org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at