[jira] [Created] (SPARK-30548) Cached blockInfo in BlockMatrix.scala is never released

2020-01-17 Thread Dong Wang (Jira)
Dong Wang created SPARK-30548: - Summary: Cached blockInfo in BlockMatrix.scala is never released Key: SPARK-30548 URL: https://issues.apache.org/jira/browse/SPARK-30548 Project: Spark Issue

[jira] [Commented] (SPARK-29878) Improper cache strategies in GraphX

2020-01-07 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010317#comment-17010317 ] Dong Wang commented on SPARK-29878: --- So are these unnecessary caches tolerable? These cached data is

[jira] [Updated] (SPARK-30444) The same job will be computated for many times when using Dataset.show()

2020-01-06 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-30444: -- Description: When I run the example sql.SparkSQLExample, df.show() at line 60 would trigger an

[jira] [Updated] (SPARK-30444) The same job will be computated for many times when using Dataset.show()

2020-01-06 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-30444: -- Description: When I run the example sql.SparkSQLExample, df.show() at line 60 would trigger an

[jira] [Created] (SPARK-30444) The same job will be computated for many times when using Dataset.show()

2020-01-06 Thread Dong Wang (Jira)
Dong Wang created SPARK-30444: - Summary: The same job will be computated for many times when using Dataset.show() Key: SPARK-30444 URL: https://issues.apache.org/jira/browse/SPARK-30444 Project: Spark

[jira] [Created] (SPARK-29878) Improper cache strategies in GraphX

2019-11-13 Thread Dong Wang (Jira)
Dong Wang created SPARK-29878: - Summary: Improper cache strategies in GraphX Key: SPARK-29878 URL: https://issues.apache.org/jira/browse/SPARK-29878 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-29872) Improper cache strategy in examples

2019-11-12 Thread Dong Wang (Jira)
Dong Wang created SPARK-29872: - Summary: Improper cache strategy in examples Key: SPARK-29872 URL: https://issues.apache.org/jira/browse/SPARK-29872 Project: Spark Issue Type: Improvement

[jira] [Comment Edited] (SPARK-29856) Conditional unnecessary persist on RDDs in ML algorithms

2019-11-12 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972294#comment-16972294 ] Dong Wang edited comment on SPARK-29856 at 11/12/19 11:32 AM: -- But there is

[jira] [Commented] (SPARK-29856) Conditional unnecessary persist on RDDs in ML algorithms

2019-11-12 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972294#comment-16972294 ] Dong Wang commented on SPARK-29856: --- But there is only one action _collectAsMap()_ using

[jira] [Updated] (SPARK-29815) Missing persist in ml.tuning.CrossValidator.fit()

2019-11-11 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29815: -- Description: dataset.toDF.rdd in ml.tuning.CrossValidator.fit(dataset: Dataset[_]) will generate two

[jira] [Updated] (SPARK-29816) Missing persist in mllib.evaluation.BinaryClassificationMetrics.recallByThreshold()

2019-11-11 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29816: -- Description: The rdd scoreAndLabels.combineByKey is used by two actions: sortByKey and count(), so

[jira] [Created] (SPARK-29856) Conditional unnecessary persist on RDDs in ML algorithms

2019-11-11 Thread Dong Wang (Jira)
Dong Wang created SPARK-29856: - Summary: Conditional unnecessary persist on RDDs in ML algorithms Key: SPARK-29856 URL: https://issues.apache.org/jira/browse/SPARK-29856 Project: Spark Issue

[jira] [Updated] (SPARK-29844) Improper unpersist strategy in ml.recommendation.ASL.train

2019-11-11 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29844: -- Affects Version/s: (was: 3.0.0) 2.4.3 > Improper unpersist strategy in

[jira] [Updated] (SPARK-29844) Improper unpersist strategy in ml.recommendation.ASL.train

2019-11-11 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29844: -- Summary: Improper unpersist strategy in ml.recommendation.ASL.train (was: Wrong unpersist strategy

[jira] [Created] (SPARK-29844) Wrong unpersist strategy in ml.recommendation.ASL.train

2019-11-11 Thread Dong Wang (Jira)
Dong Wang created SPARK-29844: - Summary: Wrong unpersist strategy in ml.recommendation.ASL.train Key: SPARK-29844 URL: https://issues.apache.org/jira/browse/SPARK-29844 Project: Spark Issue

[jira] [Created] (SPARK-29832) Unnecessary persist on instances in ml.regression.IsotonicRegression.fit

2019-11-10 Thread Dong Wang (Jira)
Dong Wang created SPARK-29832: - Summary: Unnecessary persist on instances in ml.regression.IsotonicRegression.fit Key: SPARK-29832 URL: https://issues.apache.org/jira/browse/SPARK-29832 Project: Spark

[jira] [Updated] (SPARK-28781) Unneccesary persist in PeriodicCheckpointer.update()

2019-11-10 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-28781: -- Description: Once the fuction _update()_ is called, the RDD _newData_ is persisted at line 82.

[jira] [Updated] (SPARK-29823) Improper persist strategy in mllib.clustering.KMeans.run()

2019-11-10 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29823: -- Description: In mllib.clustering.KMeans.run(), the rdd {color:#de350b}_norms_{color} is persisted.

[jira] [Updated] (SPARK-29823) Improper persist strategy in mllib.clustering.KMeans.run()

2019-11-10 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29823: -- Description: In mllib.clustering.KMeans.run(), the rdd {color:#de350b}_norms_{color} is persisted.

[jira] [Updated] (SPARK-29823) Improper persist strategy in mllib.clustering.KMeans.run()

2019-11-10 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29823: -- Description: In mllib.clustering.KMeans.run(), the rdd {color:#de350b}_norms_{color} is persisted.

[jira] [Updated] (SPARK-29823) Improper persist strategy in mllib.clustering.KMeans.run()

2019-11-10 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29823: -- Description: In mllib.clustering.KMeans.run(), the rdd {color:#de350b}_norms_{color} is persisted.

[jira] [Updated] (SPARK-29827) Wrong persist strategy in mllib.clustering.BisectingKMeans.run

2019-11-10 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29827: -- Description: There are three persist misuses in mllib.clustering.BisectingKMeans.run. * First, the

[jira] [Updated] (SPARK-29827) Wrong persist strategy in mllib.clustering.BisectingKMeans.run

2019-11-10 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29827: -- Description: There are three persist misuses in mllib.clustering.BisectingKMeans.run. First, the rdd

[jira] [Updated] (SPARK-29828) Missing persist on ratings in ml.recommendation.ALS.train

2019-11-10 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29828: -- Description: There is a ratings.isEmpty() at the beginning of theml.recommendation.ALS.train().

[jira] [Updated] (SPARK-29828) Missing persist on ratings in ml.recommendation.ALS.train

2019-11-10 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29828: -- Summary: Missing persist on ratings in ml.recommendation.ALS.train (was: Missing persist in

[jira] [Updated] (SPARK-29828) Missing persist in ml.recommendation.ALS.train

2019-11-10 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29828: -- Description: Two missing persist issues in ml.recommendation.ALS.train(). 1. There is a

[jira] [Updated] (SPARK-29828) Missing persist in ml.recommendation.ALS.train

2019-11-10 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29828: -- Summary: Missing persist in ml.recommendation.ALS.train (was: Missing persist on ratings on ratings

[jira] [Created] (SPARK-29828) Missing persist on ratings on ratings in ml.recommendation.ALS.train

2019-11-10 Thread Dong Wang (Jira)
Dong Wang created SPARK-29828: - Summary: Missing persist on ratings on ratings in ml.recommendation.ALS.train Key: SPARK-29828 URL: https://issues.apache.org/jira/browse/SPARK-29828 Project: Spark

[jira] [Created] (SPARK-29827) Wrong persist strategy in mllib.clustering.BisectingKMeans.run

2019-11-10 Thread Dong Wang (Jira)
Dong Wang created SPARK-29827: - Summary: Wrong persist strategy in mllib.clustering.BisectingKMeans.run Key: SPARK-29827 URL: https://issues.apache.org/jira/browse/SPARK-29827 Project: Spark

[jira] [Created] (SPARK-29826) Missing persist on data in mllib.feature.ChiSqSelector.fit

2019-11-10 Thread Dong Wang (Jira)
Dong Wang created SPARK-29826: - Summary: Missing persist on data in mllib.feature.ChiSqSelector.fit Key: SPARK-29826 URL: https://issues.apache.org/jira/browse/SPARK-29826 Project: Spark Issue

[jira] [Created] (SPARK-29824) Missing persist on trainDataset in ml.classification.GBTClassifier.train()

2019-11-10 Thread Dong Wang (Jira)
Dong Wang created SPARK-29824: - Summary: Missing persist on trainDataset in ml.classification.GBTClassifier.train() Key: SPARK-29824 URL: https://issues.apache.org/jira/browse/SPARK-29824 Project: Spark

[jira] [Created] (SPARK-29823) Wrong persist strategy in mllib.clustering.KMeans.run()

2019-11-10 Thread Dong Wang (Jira)
Dong Wang created SPARK-29823: - Summary: Wrong persist strategy in mllib.clustering.KMeans.run() Key: SPARK-29823 URL: https://issues.apache.org/jira/browse/SPARK-29823 Project: Spark Issue

[jira] [Created] (SPARK-29817) Missing persist on docs in mllib.clustering.LDAOptimizer.initialize

2019-11-09 Thread Dong Wang (Jira)
Dong Wang created SPARK-29817: - Summary: Missing persist on docs in mllib.clustering.LDAOptimizer.initialize Key: SPARK-29817 URL: https://issues.apache.org/jira/browse/SPARK-29817 Project: Spark

[jira] [Created] (SPARK-29816) Missing persist in mllib.evaluation.BinaryClassificationMetrics.recallByThreshold()

2019-11-09 Thread Dong Wang (Jira)
Dong Wang created SPARK-29816: - Summary: Missing persist in mllib.evaluation.BinaryClassificationMetrics.recallByThreshold() Key: SPARK-29816 URL: https://issues.apache.org/jira/browse/SPARK-29816

[jira] [Created] (SPARK-29815) Missing persist in ml.tuning.CrossValidator.fit()

2019-11-09 Thread Dong Wang (Jira)
Dong Wang created SPARK-29815: - Summary: Missing persist in ml.tuning.CrossValidator.fit() Key: SPARK-29815 URL: https://issues.apache.org/jira/browse/SPARK-29815 Project: Spark Issue Type:

[jira] [Created] (SPARK-29814) Missing persist on sources in mllib.feature.PCA

2019-11-09 Thread Dong Wang (Jira)
Dong Wang created SPARK-29814: - Summary: Missing persist on sources in mllib.feature.PCA Key: SPARK-29814 URL: https://issues.apache.org/jira/browse/SPARK-29814 Project: Spark Issue Type:

[jira] [Created] (SPARK-29813) Missing persist in mllib.PrefixSpan.findFrequentItems()

2019-11-09 Thread Dong Wang (Jira)
Dong Wang created SPARK-29813: - Summary: Missing persist in mllib.PrefixSpan.findFrequentItems() Key: SPARK-29813 URL: https://issues.apache.org/jira/browse/SPARK-29813 Project: Spark Issue

[jira] [Created] (SPARK-29812) Missing persist on predictionAndLabels in MulticlassClassificationEvaluator

2019-11-09 Thread Dong Wang (Jira)
Dong Wang created SPARK-29812: - Summary: Missing persist on predictionAndLabels in MulticlassClassificationEvaluator Key: SPARK-29812 URL: https://issues.apache.org/jira/browse/SPARK-29812 Project: Spark

[jira] [Updated] (SPARK-29811) Missing persist on oldDataset in ml.RandomForestRegressor.train()

2019-11-09 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29811: -- Description: The rdd oldDataset in ml.regression.RandomForestRegressor.train() needs to be

[jira] [Updated] (SPARK-29811) Missing persist on oldDataset in ml.RandomForestRegressor.train()

2019-11-09 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-29811: -- Description: The rdd oldDataset in ml.regression.RandomForestRegressor.train() needs to be

[jira] [Created] (SPARK-29811) Missing persist on oldDataset in ml.RandomForestRegressor.train()

2019-11-09 Thread Dong Wang (Jira)
Dong Wang created SPARK-29811: - Summary: Missing persist on oldDataset in ml.RandomForestRegressor.train() Key: SPARK-29811 URL: https://issues.apache.org/jira/browse/SPARK-29811 Project: Spark

[jira] [Created] (SPARK-29810) Missing persist on retaggedInput in RandomForest.run()

2019-11-09 Thread Dong Wang (Jira)
Dong Wang created SPARK-29810: - Summary: Missing persist on retaggedInput in RandomForest.run() Key: SPARK-29810 URL: https://issues.apache.org/jira/browse/SPARK-29810 Project: Spark Issue Type:

[jira] [Created] (SPARK-29809) Missing persist in Word2Vec.fit()

2019-11-08 Thread Dong Wang (Jira)
Dong Wang created SPARK-29809: - Summary: Missing persist in Word2Vec.fit() Key: SPARK-29809 URL: https://issues.apache.org/jira/browse/SPARK-29809 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-28781) Unneccesary persist in PeriodicCheckpointer.update()

2019-08-20 Thread Dong Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-28781: -- Environment: (was: Once update(newData) is called, newData is persisted. However, only when the

[jira] [Created] (SPARK-28781) Unneccesary persist in PeriodicCheckpointer.update()

2019-08-20 Thread Dong Wang (Jira)
Dong Wang created SPARK-28781: - Summary: Unneccesary persist in PeriodicCheckpointer.update() Key: SPARK-28781 URL: https://issues.apache.org/jira/browse/SPARK-28781 Project: Spark Issue Type:

[jira] [Commented] (SPARK-6964) Support Cancellation in the Thrift Server

2015-11-20 Thread Dong Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018729#comment-15018729 ] Dong Wang commented on SPARK-6964: -- This PR improves the thrift server to accept cancel command, it does

[jira] [Closed] (SPARK-1682) Add gradient descent w/o sampling and RDA L1 updater

2014-11-10 Thread Dong Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang closed SPARK-1682. Resolution: Later revisit later Add gradient descent w/o sampling and RDA L1 updater

[jira] [Updated] (SPARK-1682) Add gradient descent w/o sampling and RDA L1 updater

2014-05-04 Thread Dong Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Wang updated SPARK-1682: - Description: The GradientDescent optimizer does sampling before a gradient step. When input data is

[jira] [Created] (SPARK-1682) LogisticRegressionWithSGD should support svmlight data and gradient descent w/o sampling

2014-04-30 Thread Dong Wang (JIRA)
Dong Wang created SPARK-1682: Summary: LogisticRegressionWithSGD should support svmlight data and gradient descent w/o sampling Key: SPARK-1682 URL: https://issues.apache.org/jira/browse/SPARK-1682