[GitHub] spark pull request: [SPARK-1212, Part II] Support sparse data in M...

2014-04-02 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/245#issuecomment-39383137 Thanks @mateiz ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-1133] Add whole text files reader in ML...

2014-04-02 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/252#issuecomment-39393472 `textFiles` is definitely confusing because `textFile` can also read multiple files. I vote for `wholeTextFiles`. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11235724 --- Diff: .gitignore --- @@ -47,3 +47,4 @@ spark-*-bin.tar.gz unit-tests.log /lib/ rat-results.txt +sbt/sbt-launch-0.13.1.jar.part

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11235747 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11235843 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11235878 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11235926 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11235949 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11235956 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11235979 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11236003 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11236019 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11236033 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11236050 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11236109 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11236148 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11236156 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11236175 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11236213 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-02 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11239093 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11271141 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11271166 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-03 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11271293 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/rdd/VectorRDDFunctionsSuite.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/268#issuecomment-39501883 @yinxusen Thanks for updating the implementation! One minor question: should we return sample variance instead of population variance. There is no big difference if we

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-03 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/268#issuecomment-39502081 And please take a look at https://github.com/apache/spark/pull/296 . I'm not sure where we should put this method. To make life simpler for Java users, we better p

[GitHub] spark pull request: [SPARK-1212, Part II] Support sparse data in M...

2014-04-04 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/245#issuecomment-39613996 @srowen I don't think jblas's DoubleMatrix is exposed in public APIs. But if there are, yes, we should clean them before v1.0. We will mark some APIs

[GitHub] spark pull request: SPARK-1216. Add a OneHotEncoder for handling c...

2014-04-05 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/304#issuecomment-39647746 The error message is > Job aborted: Task 0.0:0 failed 1 times (most recent failure: Exception failure in TID 0 on host localhost: java.lang.ClassCastExcept

[GitHub] spark pull request: SPARK-1216. Add a OneHotEncoder for handling c...

2014-04-05 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/304#issuecomment-39647753 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [WIP] SPARK-1430: Support sparse data in Pytho...

2014-04-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/341#issuecomment-39695377 @mateiz This is great! For your question about `LabeledPoint`. Label is not part of the features. So I don't quite understand > It may get annoying once

[GitHub] spark pull request: [WIP] SPARK-1430: Support sparse data in Pytho...

2014-04-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/341#issuecomment-39697448 Oh, I see your point now. Yes, it is annoying to deal with indices in that way. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...

2014-04-07 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/345 [SPARK-1434] [MLLIB] change labelParser from anonymous function to trait This is a patch to address @mateiz 's comment in https://github.com/apache/spark/pull/245 MLUtils#loadLibSVMData

[GitHub] spark pull request: SPARK-1216. Add a OneHotEncoder for handling c...

2014-04-07 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/304#issuecomment-39757828 @sryza Could you put `sc.stop()` at the end of your test or use LocalSparkContext for your test suite? I believe that caused the problem. --- If your project is set up

[GitHub] spark pull request: [SPARK-1212, Part II] Support sparse data in M...

2014-04-07 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/245#issuecomment-39758360 @srowen Thanks for taking a closer look! For graphx interfaces, let's ask @rxin and @jegonzal to see whether they want to hide DoubleMatrix from public inter

[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...

2014-04-07 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/345#issuecomment-39769188 @manishamde Could you take a look at the annotations for decision tree? I marked all classes that users do not need as package private and a few interfaces developer

[GitHub] spark pull request: [SPARK-1357] [MLLIB] Annotate developer and ex...

2014-04-07 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/298#issuecomment-39769614 @manishamde Could you take a look at the annotations for decision tree? I marked all classes that users do not need as package private and a few interfaces developer

[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...

2014-04-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/345#discussion_r11366002 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LabelParsers.scala --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...

2014-04-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/345#discussion_r11366016 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LabelParsers.scala --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...

2014-04-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/345#discussion_r11366012 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LabelParsers.scala --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...

2014-04-07 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/345#issuecomment-39784920 @manishamde Sorry! I sent my comment to the wrong PR. You already found the right one ~ :) --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11368809 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/rdd/RowRDDMatrix.scala --- @@ -0,0 +1,327 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11369501 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/rdd/RowRDDMatrix.scala --- @@ -0,0 +1,327 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11369683 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11369703 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/rdd/RowRDDMatrix.scala --- @@ -0,0 +1,327 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-07 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11369711 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/rdd/RowRDDMatrix.scala --- @@ -0,0 +1,327 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1434] [MLLIB] change labelParser from a...

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/345#issuecomment-39818345 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/296#issuecomment-39818904 @mateiz , I added `computeSVD` and `computeGramianMatrix` to `IndexedRowMatrix`. Covariance and PCA computation are affected by empty rows. Maybe we can assume

[GitHub] spark pull request: [SPARK-1357] [MLLIB] Annotate developer and ex...

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/298#issuecomment-39820266 @manishamde I overlooked `usage` and `loadLabeledData` because I forgot to switch to object doc. Thanks for catching them! --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1157: L-BFGS Optimizer based on Breeze's...

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/353#issuecomment-39821081 @dbtsai Did you compare L-BFGS with MLlib's implementation of GD on some real data sets? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: SPARK-1157: L-BFGS Optimizer based on Breeze's...

2014-04-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11379843 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-1428: MLlib should convert non-float64 N...

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/356#issuecomment-39821991 @techaddict Is it easy to add a test to verify that it works? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/268#issuecomment-39822204 @yinxusen Yes, let's wait until #296 gets merged. Thanks for being patient! --- If your project is set up for it, you can reply to this email and have your reply a

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/268#issuecomment-39822477 Btw, can we use sample variance instead of population variance? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/296#issuecomment-39872433 Yes, maybe a user didn't make any ratings. It is easy to implement covariance by counting the number of active rows, but my concern is whether this is confusing to

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/296#issuecomment-39881873 Detecting empty rows needs a join, which is quite expensive. Also, adding empty rows will hurt performance if there are really many empty rows. I believe in most cases, if

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11409303 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -0,0 +1,340 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11416244 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/TallSkinnySVD.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11416286 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -0,0 +1,340 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11417214 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/TallSkinnySVD.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11417278 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11417260 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/TallSkinnyPCA.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11417359 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -0,0 +1,340 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/296#discussion_r11418407 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrixSuite.scala --- @@ -0,0 +1,120 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/296#issuecomment-39920979 The failed test is from Bagel. I'll re-run Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-1390] Refactoring of matrices backed by...

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/296#issuecomment-39920987 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1225, 1241] [MLLIB] Add AreaUnderCurve ...

2014-04-08 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/364 [SPARK-1225, 1241] [MLLIB] Add AreaUnderCurve and BinaryClassificationEvaluator This PR implements a generic version of `AreaUnderCurve` using the `RDD.sliding` implementation from https

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11420753 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala --- @@ -32,6 +34,17 @@ import org.apache.spark.mllib.util.MLUtils

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11420771 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala --- @@ -106,4 +119,58 @@ class MLUtilsSuite extends FunSuite with

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11420777 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala --- @@ -18,6 +18,8 @@ package org.apache.spark.mllib.util import

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-39924028 I'm closing this PR since it is now part of the AreaUnderCurve PR. I moved sliding to mllib and mark it private. The only usage now is in AreaUnderCurve with window s

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-04-08 Thread mengxr
Github user mengxr closed the pull request at: https://github.com/apache/spark/pull/136 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-1093: Annotate developer and experimenta...

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/274#issuecomment-39929453 @pwendell Shall we use the package name `spark.annotation` instead of `spark.annotations`? We have `util` and `rdd` without `s`. --- If your project is set up for it, you

[GitHub] spark pull request: SPARK-1093: Annotate developer and experimenta...

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/274#issuecomment-39929978 That is the artifact name. Java uses `annotation`: http://docs.oracle.com/javase/7/docs/api/java/lang/annotation/Documented.html --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-1357] [MLLIB] Annotate developer and ex...

2014-04-09 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/298#issuecomment-39937050 I need to wait until your PR gets merged to use the annotation, but maybe I can try adding annotations blindly. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-09 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/268#issuecomment-39941590 @yinxusen #296 was merged. Could you move the method `computeSummaryStatistics` to `RowMatrix`? The return type should be renamed to either `MultivariateStatisticalSummary

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/268#discussion_r11427570 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/VectorRDDFunctions.scala --- @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1357] [MLLIB] Annotate developer and ex...

2014-04-09 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/298#issuecomment-39943767 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-09 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/268#issuecomment-40004876 @yinxusen I sent a PR to your repo with updated interface names and tests. Please merge it if it looks good to you. I moved `MultivariateStatisticalSummary` to `mllib.stat

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

2014-04-09 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/268#issuecomment-40005107 Btw, I used equal in tests because the results should be exact with the numbers there. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11456751 --- Diff: core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala --- @@ -48,6 +48,19 @@ class RandomSamplerSuite extends FunSuite with

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11456771 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -22,8 +22,18 @@ import breeze.linalg.{Vector => BV, DenseVector =&g

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11456781 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -22,8 +22,18 @@ import breeze.linalg.{Vector => BV, DenseVector =&g

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11457042 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -172,6 +182,20 @@ object MLUtils { } /** + * Return a k

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11457050 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -172,6 +182,20 @@ object MLUtils { } /** + * Return a k

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11457456 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -172,6 +182,20 @@ object MLUtils { } /** + * Return a k

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11457437 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -172,6 +182,20 @@ object MLUtils { } /** + * Return a k

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11457496 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala --- @@ -106,4 +109,40 @@ class MLUtilsSuite extends FunSuite with

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11457537 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala --- @@ -106,4 +109,40 @@ class MLUtilsSuite extends FunSuite with

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11457634 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala --- @@ -106,4 +109,40 @@ class MLUtilsSuite extends FunSuite with

[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r11457753 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala --- @@ -106,4 +109,40 @@ class MLUtilsSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11457830 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11457867 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11458037 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11457976 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11458103 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11458125 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11458182 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11458258 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11458457 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11458695 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11458730 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation

<    4   5   6   7   8   9   10   11   12   13   >