[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

hhbyyh Thu, 14 Sep 2017 16:55:17 -0700

Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/19208
  
    It's OK to me to include the "dump model to disk" 
https://github.com/apache/spark/pull/18313 in this or other PR (or not).
    
    After reading the discussion, I feel it's an overkill to support a feature 
like this in two ways (keeping in memory and dumping to disk). Allowing user to 
register a custom action after each batch of `est.fit(trainingDataset, epm)` 
looks like a more general solution to me, in there user may dump models to 
disk, collect it for later use, or evaluate with other metric. 
    
    If you want to stick to this way which I'm not a fan of, I would only 
suggest to add the logic to estimate the memory the models will cost and stops 
the application if OOM is foreseeable.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

Reply via email to