[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...

2016-12-29 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15212 Thanks @jkbradley , I will send a follow-up PR for your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #16434: [SPARK-17645][MLLIB][ML][FOLLOW-UP] document mino...

2016-12-29 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/16434 [SPARK-17645][MLLIB][ML][FOLLOW-UP] document minor change ## What changes were proposed in this pull request? This is a follow-up pr for #15212 to address @jkbradley comments on Document change

[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...

2016-12-29 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15212 hi @jkbradley @yanboliang , I have created a follow up PR for this PR. https://github.com/apache/spark/pull/16434 I have not added FDR test in ML Suite. The main reason is the current data set is

[GitHub] spark pull request #16452: [ML] fix getThresholds logic error

2017-01-02 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/16452 [ML] fix getThresholds logic error ## What changes were proposed in this pull request? The logic of getThresholds in ML LogisticRegression is not right, and it doesn't match wit

[GitHub] spark issue #16452: [ML] fix getThresholds logic error

2017-01-02 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/16452 If both threshold and thresholds are not set, the master will return thresholds. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16452: [ML] fix getThresholds logic error

2017-01-02 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/16452 @sethah , thanks, I got it wrong. I will close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #16452: [ML] fix getThresholds logic error

2017-01-02 Thread mpjlu
Github user mpjlu closed the pull request at: https://github.com/apache/spark/pull/16452 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #16434: [SPARK-17645][MLLIB][ML][FOLLOW-UP] document minor chang...

2017-01-05 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/16434 Hi @jkbradley , I have updated this PR per your comments. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16434: [SPARK-17645][MLLIB][ML][FOLLOW-UP] document minor chang...

2017-01-05 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/16434 Thanks @jkbradley @srowen , I have added a code snippet for verifying with R. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #15444: [SPARK-17870][MLLIB][ML]Change statistic to pValu...

2016-10-11 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/15444 [SPARK-17870][MLLIB][ML]Change statistic to pValue for SelectKBest and SelectPercentile because of DoF difference ## What changes were proposed in this pull request? For feature selection

[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML][WIP]add feature selector method...

2016-10-16 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15212 Hi @yanboliang @srowen , this is the last two feature selection methods based on ChiSquare, which is similar to the method in scikit learn. But there is a bug about SelectFDR in scikit learn. I have

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML][WIP]add feature selector...

2016-10-19 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r84049606 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -72,11 +72,15 @@ private[feature] trait ChiSqSelectorParams extends

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML][WIP]add feature selector...

2016-10-19 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r84049805 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -72,11 +72,15 @@ private[feature] trait ChiSqSelectorParams extends

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML][WIP]add feature selector...

2016-10-20 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r84232802 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -243,6 +245,19 @@ class ChiSqSelector @Since("2.1.0")

[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...

2016-11-22 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15212 hi @yanboliang , @srowen @jkbradley , I have updated this PR, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML]add feature selector method base...

2016-10-23 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15212 Hi @yanboliang and @srowen , could you please review whether this PR includes all your comments. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85310501 --- Diff: docs/ml-features.md --- @@ -1333,14 +1333,14 @@ for more details on the API. `ChiSqSelector` stands for Chi-Squared feature selection. It

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85310797 --- Diff: docs/ml-features.md --- @@ -1333,14 +1333,14 @@ for more details on the API. `ChiSqSelector` stands for Chi-Squared feature selection. It

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85310862 --- Diff: docs/mllib-feature-extraction.md --- @@ -227,22 +227,19 @@ both speed and statistical learning behavior. [`ChiSqSelector`](api/scala/index.html

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85310898 --- Diff: docs/mllib-feature-extraction.md --- @@ -227,22 +227,19 @@ both speed and statistical learning behavior. [`ChiSqSelector`](api/scala/index.html

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85311619 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -44,67 +44,78 @@ private[feature] trait ChiSqSelectorParams extends

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85311677 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -44,67 +44,78 @@ private[feature] trait ChiSqSelectorParams extends

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85311930 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -171,18 +171,19 @@ object ChiSqSelectorModel extends Loader

[GitHub] spark issue #18904: [SPARK-21624]optimzie RF communicaiton cost

2018-01-15 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18904 ![image](https://user-images.githubusercontent.com/13826327/34948104-2fa1982a-fa47-11e7-9312-f1935cca758b.png) This is one of my test results. Now, I am not working on Spark MLLIB, and don&#

[GitHub] spark issue #18904: [SPARK-21624]optimzie RF communicaiton cost

2018-01-15 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18904 This is another case. Table 1 shows the improvement of random tree algorithm with sparse expression. We can see that when we use sparse expression, I/O can be reduced by 61% and total run time

[GitHub] spark pull request #18904: [SPARK-21624]optimzie RF communicaiton cost

2018-01-15 Thread mpjlu
Github user mpjlu closed the pull request at: https://github.com/apache/spark/pull/18904 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18904: [SPARK-21624]optimzie RF communicaiton cost

2018-01-15 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18904 Because I don't have the environment to continue this work, I will close it. --- - To unsubscribe, e-mail: reviews-uns

[GitHub] spark issue #19516: [SPARK-22277][ML]fix the bug of ChiSqSelector on prepari...

2018-01-15 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/19516 Because I don't have the environment to continue this work, I will close it. Thanks. --- - To unsubscribe, e-mail: re

[GitHub] spark issue #19536: [SPARK-6685][ML]Use DSYRK to compute AtA in ALS

2018-01-15 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/19536 Because I don't have the environment to continue this work, I will close it. Thanks. --- - To unsubscribe, e-mail: re

[GitHub] spark issue #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2018-01-15 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/19337 Because I don't have the environment to continue this work, I will close it. Thanks. --- - To unsubscribe, e-mail: re

[GitHub] spark issue #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendForAll by...

2018-01-15 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18624 Because I don't have the environment to continue this work, I will close it. Thanks. --- - To unsubscribe, e-mail: re

[GitHub] spark issue #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2018-01-15 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/17739 Because I don't have the environment to continue this work, I will close it. Thanks. --- - To unsubscribe, e-mail: re

[GitHub] spark pull request #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2018-01-15 Thread mpjlu
Github user mpjlu closed the pull request at: https://github.com/apache/spark/pull/17739 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19536: [SPARK-6685][ML]Use DSYRK to compute AtA in ALS

2018-01-15 Thread mpjlu
Github user mpjlu closed the pull request at: https://github.com/apache/spark/pull/19536 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19516: [SPARK-22277][ML]fix the bug of ChiSqSelector on ...

2018-01-15 Thread mpjlu
Github user mpjlu closed the pull request at: https://github.com/apache/spark/pull/19516 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2018-01-15 Thread mpjlu
Github user mpjlu closed the pull request at: https://github.com/apache/spark/pull/19337 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendFo...

2018-01-15 Thread mpjlu
Github user mpjlu closed the pull request at: https://github.com/apache/spark/pull/18624 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18904: [SPARK-21624]optimzie RF communicaiton cost

2018-01-15 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18904 Thanks @MLnick, I will be glad if you can continue it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-07-04 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/17742 I find why F2j BLAS is much faster than Native BLAS for xiangrui's method (use GEMM) here. https://issues.apache.org/jira/browse/SPARK-21305 --- If your project is set up for it, you can

[GitHub] spark pull request #18551: [SPARK-21305][ML][MLLIB]Add options to disable mu...

2017-07-06 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/18551 [SPARK-21305][ML][MLLIB]Add options to disable multi-threading of native BLAS ## What changes were proposed in this pull request? Many ML/MLLIB algorithms use native BLAS (like Intel MKL

[GitHub] spark issue #18551: [SPARK-21305][ML][MLLIB]Add options to disable multi-thr...

2017-07-06 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18551 Thanks, @srowen . I have updated the doc. I also validated the current option in spark-env.sh, it works. Thanks. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #18551: [SPARK-21305][ML][MLLIB]Add options to disable mu...

2017-07-09 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/18551#discussion_r126323336 --- Diff: docs/ml-guide.md --- @@ -61,6 +61,11 @@ To configure `netlib-java` / Breeze to use system optimised binaries, include project and read the

[GitHub] spark issue #18551: [SPARK-21305][ML][MLLIB]Add options to disable multi-thr...

2017-07-09 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18551 hi @felixcheung , I have tested one case, write a single thread java program, and call native blas. The performance is much better to disable native blas multi-threading (the total program

[GitHub] spark issue #18551: [SPARK-21305][ML][MLLIB]Add options to disable multi-thr...

2017-07-09 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18551 hi @srowen , I understand Felix's point. I mean if you only have 1 task in C/C++, and 2 CPUs, setting native BLAS to use 2 CPUs will be faster. But in JVM env, even you only have one task, and 2

[GitHub] spark issue #18551: [SPARK-21305][ML][MLLIB]Add options to disable multi-thr...

2017-07-10 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18551 Hi @srowen , Thanks very much for your review. I will revise the document of this PR to soften the language. According to my profiling data, I guess, when the native BLAS is loaded (or when a

[GitHub] spark issue #18551: [SPARK-21305][ML][MLLIB]Add options to disable multi-thr...

2017-07-11 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18551 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #18551: [SPARK-21305][ML][MLLIB]Add options to disable mu...

2017-07-11 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/18551#discussion_r126665323 --- Diff: docs/ml-guide.md --- @@ -61,6 +61,12 @@ To configure `netlib-java` / Breeze to use system optimised binaries, include project and read the

[GitHub] spark issue #18551: [SPARK-21305][ML][MLLIB]Add options to disable multi-thr...

2017-07-11 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18551 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-07-12 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/17742 I have rewritten recommendForAll with BLAS GEMM, and get about 20%-30% performance improvement. https://issues.apache.org/jira/browse/SPARK-21389 --- If your project is set up for it, you can

[GitHub] spark pull request #18620: [MINOR][ML][MLLIB] add poll function for BoundedP...

2017-07-13 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/18620 [MINOR][ML][MLLIB] add poll function for BoundedPriorityQueue ## What changes were proposed in this pull request? The most of BoundedPriorityQueue usages in ML/MLLIB are: Get the value of

[GitHub] spark issue #18620: [MINOR][ML][MLLIB] add poll function for BoundedPriority...

2017-07-13 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 Yes, my following PR will use it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18620: [MINOR][ML][MLLIB] add poll function for BoundedPriority...

2017-07-13 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 Ok, thanks @srowen . I will create a JIRA, and show the usage and performance comparing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendFo...

2017-07-13 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/18624 [SPARK-21389][ML][MLLIB] Optimize ALS recommendForAll by gemm with about 50% performance improvement ## What changes were proposed in this pull request? In Spark 2.2, we have optimized ALS

[GitHub] spark pull request #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendFo...

2017-07-13 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/18624#discussion_r127214361 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -286,40 +288,124 @@ object

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-13 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 Hi @srowen , I have added Test Suite for BoundedPriorityQueue. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-07-13 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/17742 I have submitted PR for ALS optimization with GEMM. and it is ready for review. The performance is about 50% improvement comparing with the master method. https://github.com/apache/spark/pull

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-14 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendForAll by...

2017-07-14 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18624 An user block, after Cartesian, will generate many blocks(Number of Item blocks), all these blocks should be aggregated. Thanks. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendForAll by...

2017-07-14 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18624 We need the value is in order here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendForAll by...

2017-07-14 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18624 If no poll, we have to use toArray.sorted, which performance is bad. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendForAll by...

2017-07-16 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18624 I have checked the results with the master method, the recommendation results are right. The master TestSuite is too simple, should be updated. I will update it. Thanks. --- If your

[GitHub] spark pull request #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendFo...

2017-07-17 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/18624#discussion_r127641933 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -286,40 +288,120 @@ object

[GitHub] spark pull request #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendFo...

2017-07-17 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/18624#discussion_r127669102 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -286,40 +288,120 @@ object

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-17 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 Keep it or close it, both is ok for me. We have much discussion on: https://issues.apache.org/jira/browse/SPARK-21401 --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-17 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 I have tested much about poll and toArray.sorted. If the queue is much ordered (suppose offer 2000 times for queue size 20). Use pq.toArray.sorted is faster. If the queue is much disordered

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-17 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 Hi @MLnick , pq.toArray.sorted also used in other places, like word2vector and LDA, how about waiting for my other benchmark results. Then decide to close it or not. Thanks. --- If your

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-17 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 Hi @MLnick , @srowen . My test showing: pq.poll is not significantly faster than pq.toArray.sortBy, but significantly faster than pq.toArray.sorted. Seems not each pq.toArray.sorted (such as

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-18 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 I also very confused about this. You can change https://github.com/apache/spark/pull/18624 to sorted and test. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-18 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 My micro benchmark (write a program only test pq.toArray.sorted and pq.Array.sortBy and pq.poll), not find significant performance difference. Only in the Spark job, there is big difference. Confused

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-18 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 I am ok to close this. Thanks @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18620: [SPARK-21401][ML][MLLIB] add poll function for BoundedPr...

2017-07-18 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18620 Thanks @srowen , my test also said pq.poll is a little faster on some cases. One possible benefit here is if we provide pq.poll, user's first choice may use pq.poll, not pq.toArray.sorted,

[GitHub] spark issue #18624: [SPARK-21389][ML][MLLIB] Optimize ALS recommendForAll by...

2017-07-18 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18624 Hi @srowen @MLnick @jkbradley @mengxr @yanboliang Is this change acceptable? if it is acceptable, I will update ALS ML code following this method. Also update Test Suite, which are too simple

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75851296 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -69,21 +73,26 @@ class ChiSqSelectorModel @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75851527 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -171,14 +180,48 @@ object ChiSqSelectorModel extends Loader

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75851763 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +232,21 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75852138 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -91,8 +137,17 @@ final class ChiSqSelector @Since("1.6.0"

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75856793 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -91,8 +137,17 @@ final class ChiSqSelector @Since("1.6.0"

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75858450 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -91,8 +137,17 @@ final class ChiSqSelector @Since("1.6.0"

[GitHub] spark pull request #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in T...

2016-08-24 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/14785 [SPARK-17207][MLLIB]fix comparing Vector bug in TestingUtils ## What changes were proposed in this pull request? fix comparing Vector bug in TestingUtils. There is the same bug for

[GitHub] spark pull request #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in T...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14785#discussion_r76035442 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala --- @@ -154,7 +154,11 @@ object TestingUtils { */ def absTol

[GitHub] spark pull request #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in T...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14785#discussion_r76037277 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala --- @@ -154,7 +154,11 @@ object TestingUtils { */ def absTol

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r76041373 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +232,21 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r76059026 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +232,21 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r76059098 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +232,21 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r76065118 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +232,21 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r76068163 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +232,21 @@ class ChiSqSelector @Since("

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-08-24 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Sure, I can update the Python API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in TestingU...

2016-08-24 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14785 Sure, I will fix it, and add test cases. thanks. @dbtsai , --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in TestingU...

2016-08-25 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14785 Hi @dbtsai , PR 2294 added Matrix comparing in TestingUtils, but did not add any test cases in TestingUtilsSuite. I did not add test cases for Matrix comparing in the PR either. If Matrix

[GitHub] spark pull request #14824: [ML][MLLIB]The require condition and message does...

2016-08-26 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/14824 [ML][MLLIB]The require condition and message doesn't match in SparseMatrix. ## What changes were proposed in this pull request? The require condition and message doesn't matc

[GitHub] spark pull request #14824: [ML][MLLIB]The require condition and message does...

2016-08-26 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14824#discussion_r76417535 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -455,9 +455,11 @@ class SparseMatrix @Since("2.0.0") (

[GitHub] spark pull request #14824: [ML][MLLIB]The require condition and message does...

2016-08-26 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14824#discussion_r76421555 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -454,10 +454,15 @@ class SparseMatrix @Since("

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-08-29 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Hi @srowen , I have added Python API and test cases for ChiSqSelector. Could you kindly review it again. Thanks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-29 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r76624379 --- Diff: python/pyspark/mllib/feature.py --- @@ -276,24 +276,64 @@ class ChiSqSelector(object): """ Creates a ChiSquared f

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-09-01 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r77137991 --- Diff: python/pyspark/mllib/feature.py --- @@ -276,24 +276,64 @@ class ChiSqSelector(object): """ Creates a ChiSquared f

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-09-04 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r77473261 --- Diff: python/pyspark/mllib/feature.py --- @@ -276,24 +276,64 @@ class ChiSqSelector(object): """ Creates a ChiSquared f

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-09-05 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r77532997 --- Diff: python/pyspark/mllib/feature.py --- @@ -271,29 +271,74 @@ def transform(self, vector): """

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-09-05 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r77536408 --- Diff: python/pyspark/mllib/feature.py --- @@ -305,7 +350,12 @@ def fit(self, data): treated as categorical for each distinct

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-09-06 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Hi @yanboliang , could you please kindly review the python code of this PR. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request #15058: [MLLIB]Add setBins for BinaryClassificationMetric...

2016-09-12 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/15058 [MLLIB]Add setBins for BinaryClassificationMetrics ## What changes were proposed in this pull request? Add a setBins method for BinaryClassificationMetrics. BinaryClassificationMetrics

  1   2   3   >