[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

BryanCutler Fri, 16 Jun 2017 15:12:36 -0700

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18281#discussion_r122543278
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
    @@ -325,8 +350,13 @@ final class OneVsRest @Since("1.4.0") (
           multiclassLabeled.persist(StorageLevel.MEMORY_AND_DISK)
         }
     
    +    val iters = Range(0, numClasses).par
    +    iters.tasksupport = new ForkJoinTaskSupport(
    +      new ForkJoinPool(Math.min(getParallelism, numClasses))
    +    )
    +
         // create k columns, one for each binary classifier.
    -    val models = Range(0, numClasses).par.map { index =>
    +    val models = iters.map { index =>
    --- End diff --
    
    They don't necessarily add anything to here, but they are a more standard 
way of parallelism in Spark over using `TaskSupport` and it's more flexible for 
setting an ExecutorService.  I'm not sure if you can set `TaskSupport` to 
`sameThreadExecutor` or what really happens behind the scenes if you make a 
`ThreadPoolTaskSupport` with 1 thread.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

Reply via email to