[ https://issues.apache.org/jira/browse/SPARK-19422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhengruifeng resolved SPARK-19422. ---------------------------------- Resolution: Not A Problem > Cache input data in algorithms > ------------------------------ > > Key: SPARK-19422 > URL: https://issues.apache.org/jira/browse/SPARK-19422 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.2.0 > Reporter: zhengruifeng > Priority: Major > > Now some algorithms cache the input dataset if it was not cached any more > {{StorageLevel.NONE}}: > {{FeedForwardTrainer}}, {{LogisticRegression}}, {{OneVsRest}}, {{KMeans}}, > {{AFTSurvivalRegression}}, {{IsotonicRegression}}, {{LinearRegression}} with > non-WSL solver > It maybe reasonable to cache input for others: > {{DecisionTreeClassifier}}, {{GBTClassifier}}, {{RandomForestClassifier}}, > {{LinearSVC}} > {{BisectingKMeans}}, {{GaussianMixture}}, {{LDA}} > {{DecisionTreeRegressor}}, {{GBTRegressor}}, {{GeneralizedLinearRegression}} > with IRLS solver, {{RandomForestRegressor}} > {{NaiveBayes}} is not included since it only make one pass on the data. > {{MultilayerPerceptronClassifier}} is not included since the data is cached > in {{FeedForwardTrainer.train}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org