Peter Rudenko created SPARK-5804: ------------------------------------ Summary: Explicitly manage cache in Crossvalidation k-fold loop Key: SPARK-5804 URL: https://issues.apache.org/jira/browse/SPARK-5804 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 1.3.0 Reporter: Peter Rudenko Priority: Minor
On a big dataset explicitly unpersist train and validation folds allows to load more data into memory in the next loop iteration. On my environment (single node 8Gb worker RAM, 2 GB dataset file, 3 folds for cross validation), saved more than 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org