[jira] [Assigned] (SPARK-28106) Spark SQL add jar with wrong hdfs path, SparkContext still add it to jar path ,and cause Task Failed
[ https://issues.apache.org/jira/browse/SPARK-28106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned SPARK-28106: --- Assignee: angerszhu > Spark SQL add jar with wrong hdfs path, SparkContext still add it to jar path > ,and cause Task Failed > > > Key: SPARK-28106 > URL: https://issues.apache.org/jira/browse/SPARK-28106 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Minor > Attachments: image-2019-06-19-21-23-22-061.png, > image-2019-06-20-11-49-13-691.png, image-2019-06-20-11-50-36-418.png, > image-2019-06-20-11-51-06-889.png > > > When we use SparkSQL, about add jar command, if we add a wrong path of HDFS > such as "add jar hdfs:///home/hadoop/test/test.jar", when execute it: > * In hive case , HiveClientImple call add jar, when runHiveSql() called, it > will cause error but will still run next code , then call > SparkContext.addJar, but this method don't have a path check when path schema > is HDFS , then do other sql, TaskDescribtion will carry jarPath of > SparkContext's registered JarPath. Then it will carry wrong path then cause > error happen > * None hive case, the same, will only check local path but not check hdfs > path. > > {code:java} > 19/06/19 19:55:12 INFO SessionState: converting to local > hdfs://home/hadoop/aaa.jar > Failed to read external resource hdfs://home/hadoop/aaa.jar > 19/06/19 19:55:12 ERROR SessionState: Failed to read external resource > hdfs://home/hadoop/aaa.jar > java.lang.RuntimeException: Failed to read external resource > hdfs://home/hadoop/aaa.jar > at > org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288) > atorg.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1242) > at > org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163) > at > org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1149) > at > org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:866) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:835) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258) > at > org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:835) > at > org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:825) > at > org.apache.spark.sql.hive.client.HiveClientImpl.addJar(HiveClientImpl.scala:983) > at > org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:112) > at > org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195) > at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3365) > at > org.apache.spark.sql.execution.SQLExecution$.withCustomJobTag(SQLExecution.scala:119) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:79) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:143) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364) > at org.apache.spark.sql.Dataset.(Dataset.scala:195) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:80) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:233) > at >
[jira] [Assigned] (SPARK-28106) Spark SQL add jar with wrong hdfs path, SparkContext still add it to jar path ,and cause Task Failed
[ https://issues.apache.org/jira/browse/SPARK-28106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28106: Assignee: Apache Spark > Spark SQL add jar with wrong hdfs path, SparkContext still add it to jar path > ,and cause Task Failed > > > Key: SPARK-28106 > URL: https://issues.apache.org/jira/browse/SPARK-28106 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > When we use SparkSQL, about add jar command, if we add a wrong path of HDFS > such as "add jar hdfs:///home/hadoop/test/test.jar", when execute it: > * In hive case , HiveClientImple call add jar, when runHiveSql() called, it > will cause error but will still run next code , then call > SparkContext.addJar, but this method don't have a path check when path schema > is HDFS , then do other sql, TaskDescribtion will carry jarPath of > SparkContext's registered JarPath. Then it will carry wrong path then cause > error happen > * None hive case, the same, will only check local path but not check hdfs > path. > > {code:java} > 19/06/19 19:55:12 INFO SessionState: converting to local > hdfs://home/hadoop/aaa.jar > Failed to read external resource hdfs://home/hadoop/aaa.jar > 19/06/19 19:55:12 ERROR SessionState: Failed to read external resource > hdfs://home/hadoop/aaa.jar > java.lang.RuntimeException: Failed to read external resource > hdfs://home/hadoop/aaa.jar > at > org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288) > at > org.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1242) > at > org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163) > at > org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1149) > at > org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:866) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:835) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258) > at > org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:835) > at > org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:825) > at > org.apache.spark.sql.hive.client.HiveClientImpl.addJar(HiveClientImpl.scala:983) > at > org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:112) > at > org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195) > at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3365) > at > org.apache.spark.sql.execution.SQLExecution$.withCustomJobTag(SQLExecution.scala:119) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:79) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:143) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364) > at org.apache.spark.sql.Dataset.(Dataset.scala:195) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:80) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:233) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175) > at >
[jira] [Assigned] (SPARK-28106) Spark SQL add jar with wrong hdfs path, SparkContext still add it to jar path ,and cause Task Failed
[ https://issues.apache.org/jira/browse/SPARK-28106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28106: Assignee: (was: Apache Spark) > Spark SQL add jar with wrong hdfs path, SparkContext still add it to jar path > ,and cause Task Failed > > > Key: SPARK-28106 > URL: https://issues.apache.org/jira/browse/SPARK-28106 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: angerszhu >Priority: Major > > When we use SparkSQL, about add jar command, if we add a wrong path of HDFS > such as "add jar hdfs:///home/hadoop/test/test.jar", when execute it: > * In hive case , HiveClientImple call add jar, when runHiveSql() called, it > will cause error but will still run next code , then call > SparkContext.addJar, but this method don't have a path check when path schema > is HDFS , then do other sql, TaskDescribtion will carry jarPath of > SparkContext's registered JarPath. Then it will carry wrong path then cause > error happen > * None hive case, the same, will only check local path but not check hdfs > path. > > {code:java} > 19/06/19 19:55:12 INFO SessionState: converting to local > hdfs://home/hadoop/aaa.jar > Failed to read external resource hdfs://home/hadoop/aaa.jar > 19/06/19 19:55:12 ERROR SessionState: Failed to read external resource > hdfs://home/hadoop/aaa.jar > java.lang.RuntimeException: Failed to read external resource > hdfs://home/hadoop/aaa.jar > at > org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288) > at > org.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1242) > at > org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163) > at > org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1149) > at > org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:866) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:835) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258) > at > org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:835) > at > org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:825) > at > org.apache.spark.sql.hive.client.HiveClientImpl.addJar(HiveClientImpl.scala:983) > at > org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:112) > at > org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195) > at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3365) > at > org.apache.spark.sql.execution.SQLExecution$.withCustomJobTag(SQLExecution.scala:119) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:79) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:143) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364) > at org.apache.spark.sql.Dataset.(Dataset.scala:195) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:80) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:233) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175) > at >