[GitHub] spark pull request: [SPARK-14467][SQL] Interleave CPU and IO bette...

markhamstra Fri, 08 Apr 2016 06:43:16 -0700

Github user markhamstra commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12243#discussion_r59025794
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala
 ---
    @@ -46,37 +50,80 @@ case class PartitionedFile(
      */
     case class FilePartition(index: Int, files: Seq[PartitionedFile]) extends 
Partition
     
    +object FileScanRDD {
    +  private val ioExecutionContext = ExecutionContext.fromExecutorService(
    +    ThreadUtils.newDaemonCachedThreadPool("FileScanRDD", 16))
    --- End diff --
    
    Shouldn't it be the total number of cores the user is willing to dedicate 
to a single Job?  This looks to be similar to an issue in ParquetRelation where 
a `parallelize` call can end up tying up all of the cores (defaultParallelism) 
on a single Job.  While this PR should allow better progress to be made during 
that kind of blocking, I'm thinking that what we really need is to implement 
what was suggested a while ago in the scheduling pools: a max cores limit in 
addition to the current min cores.  With that in place and the max cores value 
exposed to these large IO operations, users who care about not blocking 
concurrent Jobs can use pools that neither consume all the available cores nor 
oversubscribe the cores that the pool does have.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-14467][SQL] Interleave CPU and IO bette...

Reply via email to