[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644283#comment-15644283 ] Aihua Xu commented on HIVE-15054: - Thanks Rui. > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Fix For: 2.2.0 > > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch, > HIVE-15054.3.patch, HIVE-15054.4.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637310#comment-15637310 ] Hive QA commented on HIVE-15054: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12837165/HIVE-15054.4.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10628 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] (batchId=91) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1968/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1968/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1968/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12837165 - PreCommit-HIVE-Build > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch, > HIVE-15054.3.patch, HIVE-15054.4.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633088#comment-15633088 ] Hive QA commented on HIVE-15054: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12836843/HIVE-15054.4.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1945/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1945/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1945/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2016-11-03 15:23:59.713 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + MAVEN_OPTS='-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-1945/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2016-11-03 15:23:59.716 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 345353c HIVE-15039: A better job monitor console output for HoS (Rui reviewed by Xuefu and Ferdinand) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 345353c HIVE-15039: A better job monitor console output for HoS (Rui reviewed by Xuefu and Ferdinand) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2016-11-03 15:24:00.603 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Going to apply patch with: patch -p1 patching file ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HivePairFlatMapFunction.java + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven ANTLR Parser Generator Version 3.4 org/apache/hadoop/hive/metastore/parser/Filter.g DataNucleus Enhancer (version 4.1.6) for API "JDO" DataNucleus Enhancer : Classpath >> /usr/share/maven/boot/plexus-classworlds-2.x.jar ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MOrder ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MColumnDescriptor ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MStringList ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MStorageDescriptor ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartition ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MIndex ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MRole ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MRoleMap ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MGlobalPrivilege ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDBPrivilege ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTablePrivilege ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartitionPrivilege ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTableColumnPrivilege ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartitionColumnPrivilege ENHANCED (Persistable) :
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627560#comment-15627560 ] Rui Li commented on HIVE-15054: --- Thanks for updating [~aihuaxu]. +1 > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch, > HIVE-15054.3.patch, HIVE-15054.4.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622519#comment-15622519 ] Aihua Xu commented on HIVE-15054: - [~lirui] I think what you provided is better. Just updated the comments in the patch. > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch, > HIVE-15054.3.patch, HIVE-15054.4.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15621481#comment-15621481 ] Rui Li commented on HIVE-15054: --- Thanks [~aihuaxu] for the investigation and update! The patch looks good. But I find the comments a little bit confusing. How about something like this {code} // Hive requires this TaskAttemptId to be unique. MR's TaskAttemptId is composed of "attempt_timestamp_jobNum_m/r_taskNum_attemptNum". The counterpart for Spark should be "attempt_timestamp_stageNum_m/r_partitionId_attemptNum". When there're multiple attempts for a task, Hive will rely on the partitionId to figure out if the data are duplicate or not (see org.apache.hadoop.hive.ql.exec.Utils.removeTempOrDuplicateFiles) when collecting the final outputs {code} > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch, > HIVE-15054.3.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617224#comment-15617224 ] Hive QA commented on HIVE-15054: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12835872/HIVE-15054.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10626 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=91) org.apache.hive.spark.client.TestSparkClient.testJobSubmission (batchId=272) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1873/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1873/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1873/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12835872 - PreCommit-HIVE-Build > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch, > HIVE-15054.3.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616426#comment-15616426 ] Aihua Xu commented on HIVE-15054: - [~lirui] You are right that hive needs to use the same id to figure out it's the same task. Spark has different taskId for different task attempt. Seems partitionId is the closest choice. > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch, > HIVE-15054.3.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615310#comment-15615310 ] Aihua Xu commented on HIVE-15054: - Yeah. Seems hive is using 05 part to track if they are the same task or not somewhere. Originally I thought that info was only used for the final filename. I will take another look how that was used in hive. > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614491#comment-15614491 ] Rui Li commented on HIVE-15054: --- This is how MR generates the attempt Id: https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/mapreduce/TaskAttemptID.html We'll be very close to the format (e.g. {{attempt_200707121733_0003_m_05_0}}) if we use partitionId_attemptNumber. Except we're using stage Id instead of job Id. > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614437#comment-15614437 ] Rui Li commented on HIVE-15054: --- Hi [~aihuaxu], I don't know for sure either. My hunch is that since taskAttemptId are all unique numbers, how can hive know whether two attempts are for the same task or not? > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613083#comment-15613083 ] Aihua Xu commented on HIVE-15054: - [~lirui] It's strange that with taskAttemptId_0, it will cause sample10.q e.g. to produce different result while partitionId_attemptNumber is fine. Can't figure out why. Do you have any insights? I can see that some of our tests could fail if somehow the first attempt fails and generates the file like 00_1 since some test output list the file names, but I guess MR would have the same problem as well. > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610342#comment-15610342 ] Aihua Xu commented on HIVE-15054: - I'm still trying understand the difference between them. Seems both should work, but I'm seeing some test problems. > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610276#comment-15610276 ] Rui Li commented on HIVE-15054: --- I could be wrong, but from what I remember, partitionId_attemptNumber is closer to MR's taskId_attemptNumber. If task 0 has two attempts, we'll have 0_0 and 0_1. If we use taskAttemptId, that'll be two unique numbers. Not sure if hive can understand the two attempts are for the same task/partition. > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609895#comment-15609895 ] Hive QA commented on HIVE-15054: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12835401/HIVE-15054.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10621 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[auto_sortmerge_join_16] (batchId=157) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[root_dir_external_table] (batchId=157) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_16] (batchId=114) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[sample10] (batchId=112) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[temp_table_gb1] (batchId=106) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] (batchId=116) org.apache.hive.beeline.TestBeelineArgParsing.testAddLocalJarWithoutAddDriverClazz[0] (batchId=164) org.apache.hive.beeline.TestBeelineArgParsing.testAddLocalJar[0] (batchId=164) org.apache.hive.beeline.TestBeelineArgParsing.testAddLocalJar[1] (batchId=164) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1829/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1829/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1829/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12835401 - PreCommit-HIVE-Build > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch, HIVE-15054.2.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609278#comment-15609278 ] Aihua Xu commented on HIVE-15054: - >From the test failures, actually hive in some cases are expecting >taskId_attemptNumber format although attemptNumber is not used. I will change >to {{taskAttemptId_0}}. > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608952#comment-15608952 ] Aihua Xu commented on HIVE-15054: - The test result is https://builds.apache.org/job/PreCommit-HIVE-Build/1820, but somehow it's not getting posted here. > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608832#comment-15608832 ] Aihua Xu commented on HIVE-15054: - Hive doesn't use attemptNumber. I feel taskAttemptId is better since it matches with what is expected and other engines like tez and MR (expecting taskId and attemptId here). That info helps in the diagnostics. How do you think? > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608577#comment-15608577 ] Rui Li commented on HIVE-15054: --- Thanks for the explanations Aihua. I'm not sure which format is better: {{partitionId_attemptNumber}}, or just {{taskAttemptId}}. Does hive rely on the attemptNumber to identify the multiple outputs for the same task/partition? > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608509#comment-15608509 ] Hive QA commented on HIVE-15054: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12835320/HIVE-15054.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10621 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[auto_sortmerge_join_16] (batchId=157) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[root_dir_external_table] (batchId=157) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_16] (batchId=114) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_join] (batchId=100) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[sample10] (batchId=112) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[temp_table_gb1] (batchId=106) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] (batchId=116) org.apache.hive.beeline.TestBeelineArgParsing.testAddLocalJarWithoutAddDriverClazz[0] (batchId=164) org.apache.hive.beeline.TestBeelineArgParsing.testAddLocalJar[0] (batchId=164) org.apache.hive.beeline.TestBeelineArgParsing.testAddLocalJar[1] (batchId=164) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1820/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1820/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1820/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12835320 - PreCommit-HIVE-Build > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608359#comment-15608359 ] Aihua Xu commented on HIVE-15054: - [~lirui] Thanks for taking a look. It would be hard to repro. It depends on which state the first executor is when it's aborted or dies. You will see such issue when the task is done with the writing the data to a tmp file and renaming to the file tmp file but spark kills such task in your case or the executor loses the connection at that time. The case I have seen is, the connection to the executor times out but the executor is almost done with its work (the result is finished writing and renamed to the final tmp file and only thing left is to report to the driver that the task is done). If the rename doesn't happen, then you won't see such issue. > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607064#comment-15607064 ] Rui Li commented on HIVE-15054: --- Yeah I tried something similar in HIVE-13066. I'll mark it as dup and let's fix it here. [~aihuaxu], what I don't understand is why the error doesn't always happen. I wasn't able to reproduce HIVE-13066, and I guess not all tasks with multiple attempts will trigger the issue here. Could you do some more investigation to find out? Thanks. > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605709#comment-15605709 ] Xuefu Zhang commented on HIVE-15054: [~aihuaxu] thanks for looking at this. [~lirui]/[~chengxiang li], I think we attempted to fix this before. Can you review the patch? > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15054) Hive insertion query execution fails on Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605705#comment-15605705 ] Aihua Xu commented on HIVE-15054: - + [~xuefuz] [~csun] [~jxiang] and [~szehon] > Hive insertion query execution fails on Hive on Spark > - > > Key: HIVE-15054 > URL: https://issues.apache.org/jira/browse/HIVE-15054 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15054.1.patch > > > The query of {{insert overwrite table tbl1}} sometimes will fail with the > following errors. Seems we are constructing taskAttemptId with partitionId > which is not unique if there are multiple attempts. > {noformat} > ava.lang.IllegalStateException: Hit error while closing operators - failing > tree: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename > output from: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_task_tmp.-ext-10002/_tmp.002148_0 > to: > hdfs://table1/.hive-staging_hive_2016-06-14_01-53-17_386_3231646810118049146-9/_tmp.-ext-10002/002148_0 > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:202) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)