[jira] [Assigned] (SPARK-23841) NodeIdCache should unpersist the last cached nodeIdsForInstances

2018-04-10 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-23841:
-

Assignee: zhengruifeng

> NodeIdCache should unpersist the last cached nodeIdsForInstances
> 
>
> Key: SPARK-23841
> URL: https://issues.apache.org/jira/browse/SPARK-23841
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
>
> NodeIdCache forget to unpersist the last cached intermediate dataset:
>  
> {code:java}
> scala> import org.apache.spark.ml.classification._
> import org.apache.spark.ml.classification._
> scala> val df = 
> spark.read.format("libsvm").load("/Users/zrf/Dev/OpenSource/spark/data/mllib/sample_libsvm_data.txt")
> 2018-04-02 11:48:25 WARN  LibSVMFileFormat:66 - 'numFeatures' option not 
> specified, determining the number of features by going though the input. If 
> you know the number in advance, please specify it via 'numFeatures' option to 
> avoid the extra scan.
> 2018-04-02 11:48:31 WARN  ObjectStore:568 - Failed to get database 
> global_temp, returning NoSuchObjectException
> df: org.apache.spark.sql.DataFrame = [label: double, features: vector]
> scala> val rf = new RandomForestClassifier().setCacheNodeIds(true)
> rf: org.apache.spark.ml.classification.RandomForestClassifier = 
> rfc_aab2b672546b
> scala> val rfm = rf.fit(df)
> rfm: org.apache.spark.ml.classification.RandomForestClassificationModel = 
> RandomForestClassificationModel (uid=rfc_aab2b672546b) with 20 trees
> scala> sc.getPersistentRDDs
> res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(56 -> 
> MapPartitionsRDD[56] at map at NodeIdCache.scala:102){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23841) NodeIdCache should unpersist the last cached nodeIdsForInstances

2018-04-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23841:


Assignee: Apache Spark

> NodeIdCache should unpersist the last cached nodeIdsForInstances
> 
>
> Key: SPARK-23841
> URL: https://issues.apache.org/jira/browse/SPARK-23841
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Minor
>
> NodeIdCache forget to unpersist the last cached intermediate dataset:
>  
> {code:java}
> scala> import org.apache.spark.ml.classification._
> import org.apache.spark.ml.classification._
> scala> val df = 
> spark.read.format("libsvm").load("/Users/zrf/Dev/OpenSource/spark/data/mllib/sample_libsvm_data.txt")
> 2018-04-02 11:48:25 WARN  LibSVMFileFormat:66 - 'numFeatures' option not 
> specified, determining the number of features by going though the input. If 
> you know the number in advance, please specify it via 'numFeatures' option to 
> avoid the extra scan.
> 2018-04-02 11:48:31 WARN  ObjectStore:568 - Failed to get database 
> global_temp, returning NoSuchObjectException
> df: org.apache.spark.sql.DataFrame = [label: double, features: vector]
> scala> val rf = new RandomForestClassifier().setCacheNodeIds(true)
> rf: org.apache.spark.ml.classification.RandomForestClassifier = 
> rfc_aab2b672546b
> scala> val rfm = rf.fit(df)
> rfm: org.apache.spark.ml.classification.RandomForestClassificationModel = 
> RandomForestClassificationModel (uid=rfc_aab2b672546b) with 20 trees
> scala> sc.getPersistentRDDs
> res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(56 -> 
> MapPartitionsRDD[56] at map at NodeIdCache.scala:102){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23841) NodeIdCache should unpersist the last cached nodeIdsForInstances

2018-04-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23841:


Assignee: (was: Apache Spark)

> NodeIdCache should unpersist the last cached nodeIdsForInstances
> 
>
> Key: SPARK-23841
> URL: https://issues.apache.org/jira/browse/SPARK-23841
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: zhengruifeng
>Priority: Minor
>
> NodeIdCache forget to unpersist the last cached intermediate dataset:
>  
> {code:java}
> scala> import org.apache.spark.ml.classification._
> import org.apache.spark.ml.classification._
> scala> val df = 
> spark.read.format("libsvm").load("/Users/zrf/Dev/OpenSource/spark/data/mllib/sample_libsvm_data.txt")
> 2018-04-02 11:48:25 WARN  LibSVMFileFormat:66 - 'numFeatures' option not 
> specified, determining the number of features by going though the input. If 
> you know the number in advance, please specify it via 'numFeatures' option to 
> avoid the extra scan.
> 2018-04-02 11:48:31 WARN  ObjectStore:568 - Failed to get database 
> global_temp, returning NoSuchObjectException
> df: org.apache.spark.sql.DataFrame = [label: double, features: vector]
> scala> val rf = new RandomForestClassifier().setCacheNodeIds(true)
> rf: org.apache.spark.ml.classification.RandomForestClassifier = 
> rfc_aab2b672546b
> scala> val rfm = rf.fit(df)
> rfm: org.apache.spark.ml.classification.RandomForestClassificationModel = 
> RandomForestClassificationModel (uid=rfc_aab2b672546b) with 20 trees
> scala> sc.getPersistentRDDs
> res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(56 -> 
> MapPartitionsRDD[56] at map at NodeIdCache.scala:102){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org