Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10691012
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/util/BatchFileInputFormat.java ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10691105
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/util/BatchFileInputFormat.java ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10691283
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/util/BatchFileRecordReader.java ---
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10692570
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/util/BatchFileRecordReader.java ---
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/166#issuecomment-37930965
In fact, if we set the `numInnerIteration = 1`, which is the default
setting, then the `GradientDescentWithLocalUpdate` is identical to
`GradientDescent`. However, I
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/166#discussion_r10734823
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescentWithLocalUpdate.scala
---
@@ -0,0 +1,147 @@
+/*
+ * Licensed
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/164#issuecomment-38128624
@mengxr Your advise makes sense. I remove the merge process away from
`smallTextFiles()`, and rewrite the reading logic in `RecoderReader`.
---
If your project is set
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/164#issuecomment-38129529
Ah... It seems that Jenkins causes problem. The last two commits test
failed due to this error:
Fetching upstream changes from https://github.com/apache
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10786231
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/input/BatchFilesRecordReader.java ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10786751
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/util/SmallTextFilesSuite.scala ---
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/166#discussion_r10787371
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescentWithLocalUpdate.scala
---
@@ -0,0 +1,147 @@
+/*
+ * Licensed
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/164#issuecomment-38245425
@mengxr There are 2 java files in my PR, and another 2 scala files - the
MLUtils.scala and the test suite. I just find the scala code style in the
[style
page](https
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10829857
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/util/WholeTextFileSuite.scala ---
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/166#issuecomment-38296681
I use the new method to enlarge local update. Test on SVM and
LogisticRegression looks as good as the first version, without the worry of
OOM. This method can get better
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/166#issuecomment-38345315
I have test the original/1-version/2-version LR and SVM, here is the result:
(Note that original version runs 100 iterations, while the other two run 10
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10871469
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/input/WholeTextFileInputFormat.java
---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/164#issuecomment-38519654
@mengxr I talked to @liancheng about the placement of WholeTextFiles
interface, we have no idea of whether it is a commonly used interface or
not at that time, so
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/164#discussion_r10966132
--- Diff:
mllib/src/main/java/org/apache/spark/mllib/input/WholeTextFileInputFormat.java
---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/164#issuecomment-38761534
Sure, let me update it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/164#issuecomment-38765347
Oh... Is that OK? That's strange...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user yinxusen closed the pull request at:
https://github.com/apache/spark/pull/164
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user yinxusen opened a pull request:
https://github.com/apache/spark/pull/252
[SPARK-1133] Add whole text files reader in MLlib
Here is a pointer to the former
[PR164](https://github.com/apache/spark/pull/164).
I add the pull request for the JIRA issue
[SPARK-1133
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/252#issuecomment-38773117
It seems that the test process is suddenly aborted. Can we retest it?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/245#discussion_r11013983
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/regression/RidgeRegression.scala ---
@@ -67,44 +70,50 @@ class RidgeRegressionWithSGD private
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/252#issuecomment-38871767
I think it is OK, @rxin shall we merge it? :)
2014-3-27 PM4:40Ãà UCB AMPLab notificati...@github.comôµÃ£º
All automated tests passed.
Refer
GitHub user yinxusen opened a pull request:
https://github.com/apache/spark/pull/268
[WIP] [SPARK-1328] Add vector statistics
As with the new vector system in MLlib, we find that it is good to add some
new APIs to precess the `RDD[Vector]`. Beside, the former implementation
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/252#issuecomment-39047305
Hi @mateiz , here is my explanation:
* Hadoop has no such input formant, but Mahout has. It is called
`org.apache.mahout.text.SequenceFilesFromDirectory
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/268#issuecomment-39110100
@mengxr I am not very sure of the concept of sparse vector. In your
example, do you mean the column is `Vector(1.0, 0.0, 2.0, 0.0, 3.0, 0.0, 0.0)`
or
`RDD
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/252#issuecomment-39122031
@mengxr I add a `sc.hadoopConfiguration.setLong(fs.local.block.size, 32)`
in the test code, which can limit the block size to 32B, while the `fileLengths
= Array(10, 100
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/268#issuecomment-39174557
@mengxr Ah... I totally understand your mean. Code is on the way.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/252#issuecomment-39208378
Hi @mateiz @mengxr , what do you think about the test? Besides, we could
also judge it from the hadoop-common code of
[`CombineFileInputFormat`](https://github.com
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/268#discussion_r11161910
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/rdd/VectorRDDFunctionsSuite.scala
---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/286#issuecomment-39281046
Yep, I find that each time I do `sbt clean gen-idea` or `sbt update` or
even `sbt testOnly xxx`, I can do the cooking, take a shower, and have a rest.
---
If your
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/268#discussion_r11190187
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala ---
@@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/268#discussion_r11191344
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala ---
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/252#issuecomment-39286892
Sorry for the misoperation just now, I almost deleted the wrong file.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/252#discussion_r11193752
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -372,6 +373,37 @@ class SparkContext(
}
/**
+ * Read
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/252#issuecomment-39294300
How about textFiles() ? @liancheng recommended it just now.
2014-4-2 PM2:40Ãà Patrick Wendell notificati...@github.comôµÃ£º
sc.textFileRecords
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/268#discussion_r11196822
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/rdd/VectorRDDFunctionsSuite.scala
---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/268#discussion_r11197597
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala ---
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/268#discussion_r11210688
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/rdd/VectorRDDFunctionsSuite.scala
---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/252#issuecomment-39396075
Yep. Vote for `wholeTextFiles` too. Let me fix these now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/268#discussion_r11238923
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala ---
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/268#issuecomment-39527499
@mengxr Yes, I think `RowRDDMatrix` is a good position. Just put this
method together with SVD and PCA. Indeed, `RDD[Vector]` is a kind of matrix.
What should I
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/252#issuecomment-39623475
Thanks @mateiz and @mengxr !
I'll take care of the new issue.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/268#issuecomment-39822920
@mengxr Yep, I have substituted the population variance with sample
variance. See line 97. in VectorRDDStatistics.
---
If your project is set up for it, you can reply
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/268#discussion_r11382104
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala ---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/268#issuecomment-39941843
Sure, I'll do it now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/268#discussion_r11427699
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala ---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/268#issuecomment-39954694
Well, the `git rebase` is very tricky... @mengxr You can have a look.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/268#discussion_r11431836
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
---
@@ -28,6 +28,171 @@ import org.apache.spark.rdd.RDD
import
GitHub user yinxusen opened a pull request:
https://github.com/apache/spark/pull/376
[SPARK-1415] Hadoop min split for wholeTextFiles()
JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-1415).
New Hadoop API of `InputFormat` does not provide the `minSplits
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/376#issuecomment-40043078
@mateiz , I have to modify some APIs so as to add the `minSplits`. I am not
sure whether the modification is good or not. Could you have a look at it?
---
If your
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/376#discussion_r11469988
--- Diff:
core/src/main/scala/org/apache/spark/input/WholeTextFileInputFormat.scala ---
@@ -44,4 +47,15 @@ private[spark] class WholeTextFileInputFormat
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/268#discussion_r11470053
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
---
@@ -19,13 +19,144 @@ package
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/376#discussion_r11470152
--- Diff:
core/src/main/scala/org/apache/spark/input/WholeTextFileInputFormat.scala ---
@@ -44,4 +47,15 @@ private[spark] class WholeTextFileInputFormat
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/376#discussion_r11470220
--- Diff:
core/src/main/scala/org/apache/spark/input/WholeTextFileInputFormat.scala ---
@@ -44,4 +47,15 @@ private[spark] class WholeTextFileInputFormat
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/376#issuecomment-40044469
How about to add a subclass called `WholeTextFileRDD` extends from
`NewHadoopRDD`, and use the `setMaxSplitSize` only for this subclass?
---
If your project is set up
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/268#issuecomment-40051673
@mateiz I have fixed the issues. You can merge it if looks good to you.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/268#issuecomment-40158443
Conflict in MLUtils and RowMatrix. I think it is OK now.
2014-4-11 AM5:31äº Matei Zaharia notificati...@github.comåéï¼
Hey, unfortunately
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/376#issuecomment-40273076
@mateiz I have to admit that I ignore the importance of providing the
`minSplits`. I encountered a problem just now. I have 20,000 files and call
`wholeTextFiles(dir
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/376#issuecomment-40299867
@mateiz Yep, agree with you. The test failed caused by
`org.apache.spark.streaming.CheckpointSuite`. Does it an occasionally error?
Maybe I should rebase
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/376#issuecomment-40307008
Well.. I got this two wired errors. Build time out.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/166#issuecomment-40668271
I rewrite the 2 versions of `GradientDescent` with `Vector` instead of
`Array`. Lasso is easy to test now thanks for @mengxr 's refactoring of code.
I run
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/444#issuecomment-40788214
Yep, I think python shell's document should be update same time.
sys.version_info only became a named tuple in 2.7. To get this to work in 2.6,
it needs to be accessed
Github user yinxusen closed the pull request at:
https://github.com/apache/spark/pull/166
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/166#issuecomment-40922790
I'd like to close the PR, for the offline discussion with @mengxr . The
code will be stay in my github repo, for those who still interested in it.
---
If your project
GitHub user yinxusen opened a pull request:
https://github.com/apache/spark/pull/463
fix bugs of dot in python
If there are no `transpose()` in `self.theta`, a
*ValueError: matrices are not aligned*
is occurring. The former test case just ignore this situation
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/422#issuecomment-40984061
Several comments:
==
code here (scala code)
http://54.82.240.23:4000/mllib-linear-methods.html#linear-support-vector-machine-svm
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/422#issuecomment-40985126
Append 2 unsolved problems:
Code here (python code): http://54.82.240.23:4000/mllib-clustering.html
`clusters = KMeans.train(parsedData, 2, maxIterations
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r11833024
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/ADMMLasso.scala ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r11833060
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/ADMMLasso.scala ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r11833194
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala ---
@@ -87,6 +85,49 @@ class LassoWithSGD private
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r11833231
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala ---
@@ -189,3 +230,70 @@ object LassoWithSGD {
sc.stop
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r11833249
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala ---
@@ -189,3 +230,70 @@ object LassoWithSGD {
sc.stop
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r11833279
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala ---
@@ -189,3 +230,70 @@ object LassoWithSGD {
sc.stop
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r11833298
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/regression/LassoSuite.scala ---
@@ -44,8 +44,11 @@ class LassoSuite extends FunSuite
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/458#issuecomment-41000185
Cool, could you share your data-generator code to me, and let me take care
of the `Nan` problem? Besides, could you provide the total running time of SGD
and ADMM when
GitHub user yinxusen opened a pull request:
https://github.com/apache/spark/pull/476
JIRA issue: [SPARK-1405](https://issues.apache.org/jira/browse/SPARK-1405)
Gibbs sampling based Latent Dirichlet Allocation (LDA) for MLlib
(This PR is based on a joint work done with @liancheng
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/481#issuecomment-41012462
@dbtsai , @mengxr is improving mllib document for spark 1.0. So the
documents will be ready recently. See here
https://github.com/apache/spark/pull/422 .
---
If your
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/458#issuecomment-41148756
@coderxiang I do some experiments on your dataset.
* For MLlib, you should first rewrite your labels {+1, -1} into {+1, 0}.
[Reference
here](http://54.82.240.23:4000
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/458#issuecomment-41262172
I do the preprocess of your data, make it with zero-mean, unit norm. But
Lasso also performances poorly, with Infinity results or rising losses.
Since Lasso
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r12023334
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/ADMMLasso.scala ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r12023346
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/ADMMLasso.scala ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r12023345
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/ADMMLasso.scala ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r12023354
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/ADMMLasso.scala ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r12023367
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/ADMMLasso.scala ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r12023374
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/ADMMLasso.scala ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r12023381
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/ADMMLasso.scala ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r12023406
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/ADMMLasso.scala ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/458#discussion_r12023410
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/ADMMLasso.scala ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/476#discussion_r12127051
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala
---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/476#discussion_r12127214
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala
---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/476#discussion_r12132841
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/expectation/GibbsSampling.scala ---
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/476#issuecomment-41772255
Yep, thanks @jegonzal and @etrain , I'll try to fix these issues and look
forward to the next step updating and discussion.
---
If your project is set up for it, you
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/5049#issuecomment-82011218
@mengxr Don't we need extra unittest? Does doctest well enough?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/5049#discussion_r26542049
--- Diff: python/pyspark/mllib/common.py ---
@@ -70,8 +70,8 @@ def _py2java(sc, obj):
obj = _to_java_object_rdd(obj)
elif isinstance
GitHub user yinxusen opened a pull request:
https://github.com/apache/spark/pull/4951
[SPARK-5986][MLLib] Add save/load for k-means
This PR adds save/load for K-means as described in SPARK-5986. Python
version will be added in another PR.
You can merge this pull request into a Git
Github user yinxusen commented on a diff in the pull request:
https://github.com/apache/spark/pull/4951#discussion_r26083656
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala ---
@@ -58,4 +66,59 @@ class KMeansModel (val clusterCenters: Array
GitHub user yinxusen opened a pull request:
https://github.com/apache/spark/pull/5181
[SPARK-6526][ML] Add Normalizer transformer in ML package
See [SPARK-6526](https://issues.apache.org/jira/browse/SPARK-6526).
@mengxr Should we add test suite for this transformer
1 - 100 of 1127 matches
Mail list logo