Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/21913
  
    > What problem does this solve?
    
    @srowen `readParquetFootersInParallel` is called in executors. When a task 
is cancelled, it will still keeping run. If it reads lots of files, it will 
take pretty long and occupy the task slow. This change is basically to make it 
interruptible.
    
    @MaxGekk Maybe add the following test to ThreadUtilsSuite. It shows what 
this PR is fixing.
    
    ```
      test("parmap should be interruptible") {
        val t = new Thread() {
          setDaemon(true)
    
          override def run() {
            try {
              // "par" is uninterruptible. The following will keep running even 
if the thread is
              // interrupted. We should prefer to use "ThreadUtils.parmap".
              //
              // (1 to 10).par.flatMap { i =>
              //   Thread.sleep(100000)
              //   1 to i
              // }
              //
              ThreadUtils.parmap(1 to 10, "test", 2) { i =>
                Thread.sleep(100000)
                1 to i
              }.flatten
            } catch {
              case _: InterruptedException => // excepted
            }
          }
        }
        t.start()
        eventually(timeout(10.seconds)) {
          assert(t.isAlive)
        }
        t.interrupt()
        eventually(timeout(10.seconds)) {
          assert(!t.isAlive)
        }
      }
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to