Re: Concurrency issue in SQLExecution.withNewExecutionId
@Andrew_Or-2 I am using Scala futures. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Concurrency-issue-in-SQLExecution-withNewExecutionId-tp14035p14068.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Concurrency issue in SQLExecution.withNewExecutionId
Look at this code: https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala#L42 and https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala#L87 This exception is there to prevent "nested `withNewExecutionId`" but what if there is two concurrent commands that happens to run on the same thread? Then the thread local getLocalProperty will returns an execution id, triggering that exception. This is not hypothetical, one of our spark job crash randomly with the following stack trace (Using Spark 1.5, it ran without problem in Spark 1.4.1): java.lang.IllegalArgumentException: spark.sql.execution.id is already set at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) Also imagine the following: future { df1.count() } future { df2.count() } Could we double check this if this an issue? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Concurrency-issue-in-SQLExecution-withNewExecutionId-tp14035.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Concurrency issue in SQLExecution.withNewExecutionId
@Olivier, did you use scala's parallel collections by any chance? If not, what form of concurrency were you using? 2015-09-10 13:01 GMT-07:00 Andrew Or: > Thanks for reporting this, I have filed > https://issues.apache.org/jira/browse/SPARK-10548. > > 2015-09-10 9:09 GMT-07:00 Olivier Toupin : > >> Look at this code: >> >> >> https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala#L42 >> >> and >> >> >> https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala#L87 >> >> This exception is there to prevent "nested `withNewExecutionId`" but what >> if >> there is two concurrent commands that happens to run on the same thread? >> Then the thread local getLocalProperty will returns an execution id, >> triggering that exception. >> >> This is not hypothetical, one of our spark job crash randomly with the >> following stack trace (Using Spark 1.5, it ran without problem in Spark >> 1.4.1): >> >> java.lang.IllegalArgumentException: spark.sql.execution.id is already set >> at >> >> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) >> at >> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) >> at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) >> >> >> Also imagine the following: >> >> future { df1.count() } >> future { df2.count() } >> >> Could we double check this if this an issue? >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/Concurrency-issue-in-SQLExecution-withNewExecutionId-tp14035.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >
Re: Concurrency issue in SQLExecution.withNewExecutionId
Thanks for reporting this, I have filed https://issues.apache.org/jira/browse/SPARK-10548. 2015-09-10 9:09 GMT-07:00 Olivier Toupin: > Look at this code: > > > https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala#L42 > > and > > > https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala#L87 > > This exception is there to prevent "nested `withNewExecutionId`" but what > if > there is two concurrent commands that happens to run on the same thread? > Then the thread local getLocalProperty will returns an execution id, > triggering that exception. > > This is not hypothetical, one of our spark job crash randomly with the > following stack trace (Using Spark 1.5, it ran without problem in Spark > 1.4.1): > > java.lang.IllegalArgumentException: spark.sql.execution.id is already set > at > > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) > at > org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) > at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) > > > Also imagine the following: > > future { df1.count() } > future { df2.count() } > > Could we double check this if this an issue? > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Concurrency-issue-in-SQLExecution-withNewExecutionId-tp14035.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >