Dong Wang created SPARK-30548:
-
Summary: Cached blockInfo in BlockMatrix.scala is never released
Key: SPARK-30548
URL: https://issues.apache.org/jira/browse/SPARK-30548
Project: Spark
Issue
[
https://issues.apache.org/jira/browse/SPARK-29878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010317#comment-17010317
]
Dong Wang commented on SPARK-29878:
---
So are these unnecessary caches tolerable?
These cached data is
[
https://issues.apache.org/jira/browse/SPARK-30444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-30444:
--
Description:
When I run the example sql.SparkSQLExample, df.show() at line 60 would trigger
an
[
https://issues.apache.org/jira/browse/SPARK-30444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-30444:
--
Description:
When I run the example sql.SparkSQLExample, df.show() at line 60 would trigger
an
Dong Wang created SPARK-30444:
-
Summary: The same job will be computated for many times when using
Dataset.show()
Key: SPARK-30444
URL: https://issues.apache.org/jira/browse/SPARK-30444
Project: Spark
Dong Wang created SPARK-29878:
-
Summary: Improper cache strategies in GraphX
Key: SPARK-29878
URL: https://issues.apache.org/jira/browse/SPARK-29878
Project: Spark
Issue Type: Improvement
Dong Wang created SPARK-29872:
-
Summary: Improper cache strategy in examples
Key: SPARK-29872
URL: https://issues.apache.org/jira/browse/SPARK-29872
Project: Spark
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/SPARK-29856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972294#comment-16972294
]
Dong Wang edited comment on SPARK-29856 at 11/12/19 11:32 AM:
--
But there is
[
https://issues.apache.org/jira/browse/SPARK-29856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972294#comment-16972294
]
Dong Wang commented on SPARK-29856:
---
But there is only one action _collectAsMap()_ using
[
https://issues.apache.org/jira/browse/SPARK-29815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29815:
--
Description:
dataset.toDF.rdd in ml.tuning.CrossValidator.fit(dataset: Dataset[_]) will
generate two
[
https://issues.apache.org/jira/browse/SPARK-29816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29816:
--
Description:
The rdd scoreAndLabels.combineByKey is used by two actions: sortByKey and
count(), so
Dong Wang created SPARK-29856:
-
Summary: Conditional unnecessary persist on RDDs in ML algorithms
Key: SPARK-29856
URL: https://issues.apache.org/jira/browse/SPARK-29856
Project: Spark
Issue
[
https://issues.apache.org/jira/browse/SPARK-29844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29844:
--
Affects Version/s: (was: 3.0.0)
2.4.3
> Improper unpersist strategy in
[
https://issues.apache.org/jira/browse/SPARK-29844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29844:
--
Summary: Improper unpersist strategy in ml.recommendation.ASL.train (was:
Wrong unpersist strategy
Dong Wang created SPARK-29844:
-
Summary: Wrong unpersist strategy in ml.recommendation.ASL.train
Key: SPARK-29844
URL: https://issues.apache.org/jira/browse/SPARK-29844
Project: Spark
Issue
Dong Wang created SPARK-29832:
-
Summary: Unnecessary persist on instances in
ml.regression.IsotonicRegression.fit
Key: SPARK-29832
URL: https://issues.apache.org/jira/browse/SPARK-29832
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-28781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-28781:
--
Description:
Once the fuction _update()_ is called, the RDD _newData_ is persisted at line
82.
[
https://issues.apache.org/jira/browse/SPARK-29823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29823:
--
Description:
In mllib.clustering.KMeans.run(), the rdd {color:#de350b}_norms_{color} is
persisted.
[
https://issues.apache.org/jira/browse/SPARK-29823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29823:
--
Description:
In mllib.clustering.KMeans.run(), the rdd {color:#de350b}_norms_{color} is
persisted.
[
https://issues.apache.org/jira/browse/SPARK-29823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29823:
--
Description:
In mllib.clustering.KMeans.run(), the rdd {color:#de350b}_norms_{color} is
persisted.
[
https://issues.apache.org/jira/browse/SPARK-29823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29823:
--
Description:
In mllib.clustering.KMeans.run(), the rdd {color:#de350b}_norms_{color} is
persisted.
[
https://issues.apache.org/jira/browse/SPARK-29827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29827:
--
Description:
There are three persist misuses in mllib.clustering.BisectingKMeans.run.
* First, the
[
https://issues.apache.org/jira/browse/SPARK-29827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29827:
--
Description:
There are three persist misuses in mllib.clustering.BisectingKMeans.run.
First, the rdd
[
https://issues.apache.org/jira/browse/SPARK-29828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29828:
--
Description:
There is a ratings.isEmpty() at the beginning of
theml.recommendation.ALS.train().
[
https://issues.apache.org/jira/browse/SPARK-29828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29828:
--
Summary: Missing persist on ratings in ml.recommendation.ALS.train (was:
Missing persist in
[
https://issues.apache.org/jira/browse/SPARK-29828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29828:
--
Description:
Two missing persist issues in ml.recommendation.ALS.train().
1. There is a
[
https://issues.apache.org/jira/browse/SPARK-29828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29828:
--
Summary: Missing persist in ml.recommendation.ALS.train (was: Missing
persist on ratings on ratings
Dong Wang created SPARK-29828:
-
Summary: Missing persist on ratings on ratings in
ml.recommendation.ALS.train
Key: SPARK-29828
URL: https://issues.apache.org/jira/browse/SPARK-29828
Project: Spark
Dong Wang created SPARK-29827:
-
Summary: Wrong persist strategy in
mllib.clustering.BisectingKMeans.run
Key: SPARK-29827
URL: https://issues.apache.org/jira/browse/SPARK-29827
Project: Spark
Dong Wang created SPARK-29826:
-
Summary: Missing persist on data in mllib.feature.ChiSqSelector.fit
Key: SPARK-29826
URL: https://issues.apache.org/jira/browse/SPARK-29826
Project: Spark
Issue
Dong Wang created SPARK-29824:
-
Summary: Missing persist on trainDataset in
ml.classification.GBTClassifier.train()
Key: SPARK-29824
URL: https://issues.apache.org/jira/browse/SPARK-29824
Project: Spark
Dong Wang created SPARK-29823:
-
Summary: Wrong persist strategy in mllib.clustering.KMeans.run()
Key: SPARK-29823
URL: https://issues.apache.org/jira/browse/SPARK-29823
Project: Spark
Issue
Dong Wang created SPARK-29817:
-
Summary: Missing persist on docs in
mllib.clustering.LDAOptimizer.initialize
Key: SPARK-29817
URL: https://issues.apache.org/jira/browse/SPARK-29817
Project: Spark
Dong Wang created SPARK-29816:
-
Summary: Missing persist in
mllib.evaluation.BinaryClassificationMetrics.recallByThreshold()
Key: SPARK-29816
URL: https://issues.apache.org/jira/browse/SPARK-29816
Dong Wang created SPARK-29815:
-
Summary: Missing persist in ml.tuning.CrossValidator.fit()
Key: SPARK-29815
URL: https://issues.apache.org/jira/browse/SPARK-29815
Project: Spark
Issue Type:
Dong Wang created SPARK-29814:
-
Summary: Missing persist on sources in mllib.feature.PCA
Key: SPARK-29814
URL: https://issues.apache.org/jira/browse/SPARK-29814
Project: Spark
Issue Type:
Dong Wang created SPARK-29813:
-
Summary: Missing persist in mllib.PrefixSpan.findFrequentItems()
Key: SPARK-29813
URL: https://issues.apache.org/jira/browse/SPARK-29813
Project: Spark
Issue
Dong Wang created SPARK-29812:
-
Summary: Missing persist on predictionAndLabels in
MulticlassClassificationEvaluator
Key: SPARK-29812
URL: https://issues.apache.org/jira/browse/SPARK-29812
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-29811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29811:
--
Description:
The rdd oldDataset in ml.regression.RandomForestRegressor.train() needs to be
[
https://issues.apache.org/jira/browse/SPARK-29811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-29811:
--
Description:
The rdd oldDataset in ml.regression.RandomForestRegressor.train() needs to be
Dong Wang created SPARK-29811:
-
Summary: Missing persist on oldDataset in
ml.RandomForestRegressor.train()
Key: SPARK-29811
URL: https://issues.apache.org/jira/browse/SPARK-29811
Project: Spark
Dong Wang created SPARK-29810:
-
Summary: Missing persist on retaggedInput in RandomForest.run()
Key: SPARK-29810
URL: https://issues.apache.org/jira/browse/SPARK-29810
Project: Spark
Issue Type:
Dong Wang created SPARK-29809:
-
Summary: Missing persist in Word2Vec.fit()
Key: SPARK-29809
URL: https://issues.apache.org/jira/browse/SPARK-29809
Project: Spark
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/SPARK-28781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-28781:
--
Environment: (was: Once update(newData) is called, newData is
persisted. However, only when the
Dong Wang created SPARK-28781:
-
Summary: Unneccesary persist in PeriodicCheckpointer.update()
Key: SPARK-28781
URL: https://issues.apache.org/jira/browse/SPARK-28781
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018729#comment-15018729
]
Dong Wang commented on SPARK-6964:
--
This PR improves the thrift server to accept cancel command, it does
[
https://issues.apache.org/jira/browse/SPARK-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang closed SPARK-1682.
Resolution: Later
revisit later
Add gradient descent w/o sampling and RDA L1 updater
[
https://issues.apache.org/jira/browse/SPARK-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong Wang updated SPARK-1682:
-
Description:
The GradientDescent optimizer does sampling before a gradient step. When input
data is
Dong Wang created SPARK-1682:
Summary: LogisticRegressionWithSGD should support svmlight data
and gradient descent w/o sampling
Key: SPARK-1682
URL: https://issues.apache.org/jira/browse/SPARK-1682
49 matches
Mail list logo