Github user gaborgsomogyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20888#discussion_r180691534
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala ---
@@ -164,10 +164,13 @@ class DataFrameRangeSuite extends QueryTest with
SharedSQLContext with Eventuall
sparkContext.addSparkListener(listener)
for (codegen <- Seq(true, false)) {
withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key ->
codegen.toString()) {
- DataFrameRangeSuite.stageToKill = -1
+ DataFrameRangeSuite.stageToKill =
DataFrameRangeSuite.INVALID_STAGE_ID
val ex = intercept[SparkException] {
spark.range(0, 100000000000L, 1, 1).map { x =>
- DataFrameRangeSuite.stageToKill = TaskContext.get().stageId()
+ val taskContext = TaskContext.get()
+ if (!taskContext.isInterrupted()) {
--- End diff --
To answer your question the whole point of this change is to block
`DataFrameRangeSuite.stageToKill` overwrite while the first iteration's thread
is running after `DataFrameRangeSuite.stageToKill =
DataFrameRangeSuite.INVALID_STAGE_ID` happened. This would work also but would
be less trivial.
I agree if `DataFrameRangeSuite.stageToKill` object member removed and
switch to `onTaskStart` then the whole mumbo-jumbo is not required.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]