Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22018#discussion_r208824418
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
---
@@ -297,7 +297,7 @@ object InMemoryFileIndex extends Logging {
val missingFiles = mutable.ArrayBuffer.empty[String]
val filteredLeafStatuses = allLeafStatuses.filterNot(
status => shouldFilterOut(status.getPath.getName))
- val resolvedLeafStatuses = filteredLeafStatuses.flatMap {
+ val resolvedLeafStatuses = filteredLeafStatuses.par.flatMap {
--- End diff --
Parallel Scala collections are not interruptible in some cases as a
consequence of that if you use them on executors, tasks cannot be canceled
properly. You can do an experiment yourself and run the code in a lambda
function:
https://github.com/apache/spark/blob/131ca146ed390cd0109cd6e8c95b61e418507080/core/src/test/scala/org/apache/spark/util/ThreadUtilsSuite.scala#L143-L150
When you cancel the job, threads will be still blocking on the sleep call.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]