[jira] [Commented] (SPARK-35635) concurrent insert statements from multiple beeline fail with job aborted exception

2023-07-08 Thread jeanlyn (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741309#comment-17741309
 ] 

jeanlyn commented on SPARK-35635:
-

When the tasks are running concurrently, the "_temporary" will be attempted to 
be deleted multiple times, which may result in job failure. Is it more 
appropriate to reopen this issue? [~gurwls223] 

> concurrent insert statements from multiple beeline fail with job aborted 
> exception
> --
>
> Key: SPARK-35635
> URL: https://issues.apache.org/jira/browse/SPARK-35635
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
> Environment: Spark 3.1.1
>Reporter: Chetan Bhat
>Priority: Minor
>
> Create tables - 
> CREATE TABLE J1_TBL (
>  i integer,
>  j integer,
>  t string
> ) USING parquet;
> CREATE TABLE J2_TBL (
>  i integer,
>  k integer
> ) USING parquet;
> From 4 concurrent beeline sessions execute the insert into select queries - 
> INSERT INTO J1_TBL VALUES (1, 4, 'one');
> INSERT INTO J1_TBL VALUES (2, 3, 'two');
> INSERT INTO J1_TBL VALUES (3, 2, 'three');
> INSERT INTO J1_TBL VALUES (4, 1, 'four');
> INSERT INTO J1_TBL VALUES (5, 0, 'five');
> INSERT INTO J1_TBL VALUES (6, 6, 'six');
> INSERT INTO J1_TBL VALUES (7, 7, 'seven');
> INSERT INTO J1_TBL VALUES (8, 8, 'eight');
> INSERT INTO J1_TBL VALUES (0, NULL, 'zero');
> INSERT INTO J1_TBL VALUES (NULL, NULL, 'null');
> INSERT INTO J1_TBL VALUES (NULL, 0, 'zero');
> INSERT INTO J2_TBL VALUES (1, -1);
> INSERT INTO J2_TBL VALUES (2, 2);
> INSERT INTO J2_TBL VALUES (3, -3);
> INSERT INTO J2_TBL VALUES (2, 4);
> INSERT INTO J2_TBL VALUES (5, -5);
> INSERT INTO J2_TBL VALUES (5, -5);
> INSERT INTO J2_TBL VALUES (0, NULL);
> INSERT INTO J2_TBL VALUES (NULL, NULL);
> INSERT INTO J2_TBL VALUES (NULL, 0);
>  
> Issue : concurrent insert statements from multiple beeline fail with job 
> aborted exception.
> 0: jdbc:hive2://10.19.89.222:23040/> INSERT INTO J1_TBL VALUES (8, 8, 
> 'eight');
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.SparkException: Job aborted.
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:366)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3$$Lambda$1781/750578465.apply$mcV$sp(Unknown
>  Source)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:45)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted.
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:109)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:107)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:121)
>  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
>  at 

[jira] [Commented] (SPARK-35635) concurrent insert statements from multiple beeline fail with job aborted exception

2023-07-08 Thread jeanlyn (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741296#comment-17741296
 ] 

jeanlyn commented on SPARK-35635:
-

We encounter the same issue when concurrent writing in deference partition on 
same table.

> concurrent insert statements from multiple beeline fail with job aborted 
> exception
> --
>
> Key: SPARK-35635
> URL: https://issues.apache.org/jira/browse/SPARK-35635
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
> Environment: Spark 3.1.1
>Reporter: Chetan Bhat
>Priority: Minor
>
> Create tables - 
> CREATE TABLE J1_TBL (
>  i integer,
>  j integer,
>  t string
> ) USING parquet;
> CREATE TABLE J2_TBL (
>  i integer,
>  k integer
> ) USING parquet;
> From 4 concurrent beeline sessions execute the insert into select queries - 
> INSERT INTO J1_TBL VALUES (1, 4, 'one');
> INSERT INTO J1_TBL VALUES (2, 3, 'two');
> INSERT INTO J1_TBL VALUES (3, 2, 'three');
> INSERT INTO J1_TBL VALUES (4, 1, 'four');
> INSERT INTO J1_TBL VALUES (5, 0, 'five');
> INSERT INTO J1_TBL VALUES (6, 6, 'six');
> INSERT INTO J1_TBL VALUES (7, 7, 'seven');
> INSERT INTO J1_TBL VALUES (8, 8, 'eight');
> INSERT INTO J1_TBL VALUES (0, NULL, 'zero');
> INSERT INTO J1_TBL VALUES (NULL, NULL, 'null');
> INSERT INTO J1_TBL VALUES (NULL, 0, 'zero');
> INSERT INTO J2_TBL VALUES (1, -1);
> INSERT INTO J2_TBL VALUES (2, 2);
> INSERT INTO J2_TBL VALUES (3, -3);
> INSERT INTO J2_TBL VALUES (2, 4);
> INSERT INTO J2_TBL VALUES (5, -5);
> INSERT INTO J2_TBL VALUES (5, -5);
> INSERT INTO J2_TBL VALUES (0, NULL);
> INSERT INTO J2_TBL VALUES (NULL, NULL);
> INSERT INTO J2_TBL VALUES (NULL, 0);
>  
> Issue : concurrent insert statements from multiple beeline fail with job 
> aborted exception.
> 0: jdbc:hive2://10.19.89.222:23040/> INSERT INTO J1_TBL VALUES (8, 8, 
> 'eight');
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.SparkException: Job aborted.
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:366)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3$$Lambda$1781/750578465.apply$mcV$sp(Unknown
>  Source)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:45)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted.
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:109)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:107)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:121)
>  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
>  at org.apache.spark.sql.Dataset$$Lambda$1650/1168893915.apply(Unknown Source)
>  at 

[jira] [Commented] (SPARK-35635) concurrent insert statements from multiple beeline fail with job aborted exception

2021-06-06 Thread Chetan Bhat (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358356#comment-17358356
 ] 

Chetan Bhat commented on SPARK-35635:
-

Yes thats the issue. That has to be taken care from the system during 
concurrent query execution.

> concurrent insert statements from multiple beeline fail with job aborted 
> exception
> --
>
> Key: SPARK-35635
> URL: https://issues.apache.org/jira/browse/SPARK-35635
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
> Environment: Spark 3.1.1
>Reporter: Chetan Bhat
>Priority: Minor
>
> Create tables - 
> CREATE TABLE J1_TBL (
>  i integer,
>  j integer,
>  t string
> ) USING parquet;
> CREATE TABLE J2_TBL (
>  i integer,
>  k integer
> ) USING parquet;
> From 4 concurrent beeline sessions execute the insert into select queries - 
> INSERT INTO J1_TBL VALUES (1, 4, 'one');
> INSERT INTO J1_TBL VALUES (2, 3, 'two');
> INSERT INTO J1_TBL VALUES (3, 2, 'three');
> INSERT INTO J1_TBL VALUES (4, 1, 'four');
> INSERT INTO J1_TBL VALUES (5, 0, 'five');
> INSERT INTO J1_TBL VALUES (6, 6, 'six');
> INSERT INTO J1_TBL VALUES (7, 7, 'seven');
> INSERT INTO J1_TBL VALUES (8, 8, 'eight');
> INSERT INTO J1_TBL VALUES (0, NULL, 'zero');
> INSERT INTO J1_TBL VALUES (NULL, NULL, 'null');
> INSERT INTO J1_TBL VALUES (NULL, 0, 'zero');
> INSERT INTO J2_TBL VALUES (1, -1);
> INSERT INTO J2_TBL VALUES (2, 2);
> INSERT INTO J2_TBL VALUES (3, -3);
> INSERT INTO J2_TBL VALUES (2, 4);
> INSERT INTO J2_TBL VALUES (5, -5);
> INSERT INTO J2_TBL VALUES (5, -5);
> INSERT INTO J2_TBL VALUES (0, NULL);
> INSERT INTO J2_TBL VALUES (NULL, NULL);
> INSERT INTO J2_TBL VALUES (NULL, 0);
>  
> Issue : concurrent insert statements from multiple beeline fail with job 
> aborted exception.
> 0: jdbc:hive2://10.19.89.222:23040/> INSERT INTO J1_TBL VALUES (8, 8, 
> 'eight');
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.SparkException: Job aborted.
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:366)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3$$Lambda$1781/750578465.apply$mcV$sp(Unknown
>  Source)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:45)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted.
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:109)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:107)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:121)
>  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
>  at org.apache.spark.sql.Dataset$$Lambda$1650/1168893915.apply(Unknown Source)
>  at