[jira] [Created] (SPARK-17017) Add a chiSquare Selector based on False Positive Rate (FPR) test

2016-08-11 Thread Peng Meng (JIRA)
Peng Meng created SPARK-17017: - Summary: Add a chiSquare Selector based on False Positive Rate (FPR) test Key: SPARK-17017 URL: https://issues.apache.org/jira/browse/SPARK-17017 Project: Spark

[jira] [Updated] (SPARK-17017) Add a chiSquare Selector based on False Positive Rate (FPR) test

2016-08-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-17017: -- Affects Version/s: (was: 2.0.0) > Add a chiSquare Selector based on False Positive Rate (FPR) test

[jira] [Updated] (SPARK-17017) Add a chiSquare Selector based on False Positive Rate (FPR) test

2016-08-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-17017: -- Target Version/s: (was: 2.1.0) > Add a chiSquare Selector based on False Positive Rate (FPR) test >

[jira] [Created] (SPARK-16843) Select features according to a percentile of the highest scores of ChiSqSelector

2016-08-01 Thread Peng Meng (JIRA)
Peng Meng created SPARK-16843: - Summary: Select features according to a percentile of the highest scores of ChiSqSelector Key: SPARK-16843 URL: https://issues.apache.org/jira/browse/SPARK-16843 Project:

[jira] [Updated] (SPARK-16843) Select features according to a percentile of the highest scores of ChiSqSelector

2016-08-01 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-16843: -- Affects Version/s: (was: 2.0.0) 2.1.0 > Select features according to a

[jira] [Updated] (SPARK-16843) Select features according to a percentile of the highest scores of ChiSqSelector

2016-08-01 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-16843: -- Target Version/s: (was: 2.0.1) > Select features according to a percentile of the highest scores of

[jira] [Updated] (SPARK-16843) Select features according to a percentile of the highest scores of ChiSqSelector

2016-08-01 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-16843: -- Priority: Minor (was: Major) > Select features according to a percentile of the highest scores of >

[jira] [Updated] (SPARK-16843) Select features according to a percentile of the highest scores of ChiSqSelector

2016-08-01 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-16843: -- Fix Version/s: (was: 2.0.1) 2.1.0 > Select features according to a percentile

[jira] [Commented] (SPARK-17207) Comparing Vector in relative tolerance or absolute tolerance in UnitTests error

2016-08-24 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434527#comment-15434527 ] Peng Meng commented on SPARK-17207: --- Thanks Owen, I am testing the code with array length check. will

[jira] [Commented] (SPARK-17207) Comparing Vector in relative tolerance or absolute tolerance in UnitTests error

2016-08-25 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436874#comment-15436874 ] Peng Meng commented on SPARK-17207: --- Hi,

[jira] [Commented] (SPARK-17207) Comparing Vector in relative tolerance or absolute tolerance in UnitTests error

2016-08-25 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437013#comment-15437013 ] Peng Meng commented on SPARK-17207: --- Ok, thanks. I will fix CountVectorizerSuite test error in this PR.

[jira] [Commented] (SPARK-17207) Comparing Vector in relative tolerance or absolute tolerance in UnitTests error

2016-08-25 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436986#comment-15436986 ] Peng Meng commented on SPARK-17207: --- This is the bug information:

[jira] [Commented] (SPARK-17462) Check for places within MLlib which should use VersionUtils to parse Spark version strings

2016-09-09 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477019#comment-15477019 ] Peng Meng commented on SPARK-17462: --- Hi [~josephkb], will you work on this, if not, I can work on it.

[jira] [Commented] (SPARK-17462) Check for places within MLlib which should use VersionUtils to parse Spark version strings

2016-09-12 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15483336#comment-15483336 ] Peng Meng commented on SPARK-17462: --- hi [~josephkb], I am busy this days, I am glad VinceShieh can help

[jira] [Created] (SPARK-17505) Add setBins for BinaryClassificationMetrics in mlllb/evaluation

2016-09-12 Thread Peng Meng (JIRA)
Peng Meng created SPARK-17505: - Summary: Add setBins for BinaryClassificationMetrics in mlllb/evaluation Key: SPARK-17505 URL: https://issues.apache.org/jira/browse/SPARK-17505 Project: Spark

[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info

2016-09-08 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15475769#comment-15475769 ] Peng Meng commented on SPARK-6160: -- Hi [~josephkb], I have some discussion with [~srowen] about keeping

[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info

2016-09-08 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15475709#comment-15475709 ] Peng Meng commented on SPARK-6160: -- hi Joseph K. Bradley > ChiSqSelector should keep test statistic info

[jira] [Issue Comment Deleted] (SPARK-6160) ChiSqSelector should keep test statistic info

2016-09-08 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-6160: - Comment: was deleted (was: hi Joseph K. Bradley) > ChiSqSelector should keep test statistic info >

[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info

2016-09-08 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15475860#comment-15475860 ] Peng Meng commented on SPARK-6160: -- hi [~GayathriMurali], are you still working on this, if not, I can

[jira] [Created] (SPARK-17645) Add feature selector methods based on: False Discovery Rate (FDR) and Family Wise Error rate (FWE)

2016-09-23 Thread Peng Meng (JIRA)
Peng Meng created SPARK-17645: - Summary: Add feature selector methods based on: False Discovery Rate (FDR) and Family Wise Error rate (FWE) Key: SPARK-17645 URL: https://issues.apache.org/jira/browse/SPARK-17645

[jira] [Commented] (SPARK-17207) Comparing Vector in relative tolerance or absolute tolerance in UnitTests error

2016-08-24 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434241#comment-15434241 ] Peng Meng commented on SPARK-17207: --- This is caused by two Vector zip problem: def absTol(eps:

[jira] [Created] (SPARK-17207) Comparing Vector in relative tolerance or absolute tolerance in UnitTests error

2016-08-23 Thread Peng Meng (JIRA)
Peng Meng created SPARK-17207: - Summary: Comparing Vector in relative tolerance or absolute tolerance in UnitTests error Key: SPARK-17207 URL: https://issues.apache.org/jira/browse/SPARK-17207 Project:

[jira] [Comment Edited] (SPARK-17207) Comparing Vector in relative tolerance or absolute tolerance in UnitTests error

2016-08-24 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434241#comment-15434241 ] Peng Meng edited comment on SPARK-17207 at 8/24/16 6:20 AM: This is caused by

[jira] [Commented] (SPARK-18062) ProbabilisticClassificationModel.normalizeToProbabilitiesInPlace should return probabilities when given all-0 vector

2016-10-23 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15600926#comment-15600926 ] Peng Meng commented on SPARK-18062: --- This relate to how to understand all-0 rawPrediction, all classes

[jira] [Commented] (SPARK-18088) ChiSqSelector FPR PR cleanups

2016-10-25 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605211#comment-15605211 ] Peng Meng commented on SPARK-18088: --- Hi [~josephkb] , I am not quite understand "Testing against only

[jira] [Commented] (SPARK-18088) ChiSqSelector FPR PR cleanups

2016-10-25 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605231#comment-15605231 ] Peng Meng commented on SPARK-18088: --- In the previous implementation, testing against only the

[jira] [Commented] (SPARK-18088) ChiSqSelector FPR PR cleanups

2016-10-26 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608322#comment-15608322 ] Peng Meng commented on SPARK-18088: --- I am neutral for changing the selectorType "KBest" to

[jira] [Updated] (SPARK-17870) ML/MLLIB: ChiSquareSelector based on Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-17870: -- Summary: ML/MLLIB: ChiSquareSelector based on Statistics.chiSqTest(RDD) is wrong (was: ML/MLLIB:

[jira] [Commented] (SPARK-17870) ML/MLLIB: Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565225#comment-15565225 ] Peng Meng commented on SPARK-17870: --- yes, the selectKBest and selectPercentile in scikit learn only use

[jira] [Commented] (SPARK-17870) ML/MLLIB: Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565251#comment-15565251 ] Peng Meng commented on SPARK-17870: --- The scikit learn code is here:

[jira] [Commented] (SPARK-17870) ML/MLLIB: Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565315#comment-15565315 ] Peng Meng commented on SPARK-17870: --- https://github.com/apache/spark/pull/1484#issuecomment-51024568 Hi

[jira] [Commented] (SPARK-17870) ML/MLLIB: Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565041#comment-15565041 ] Peng Meng commented on SPARK-17870: --- hi [~srowen], thanks very much for you quickly reply. yes,the

[jira] [Commented] (SPARK-17870) ML/MLLIB: ChiSquareSelector based on Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567170#comment-15567170 ] Peng Meng commented on SPARK-17870: --- hi [~avulanov], the question here is not use raw chi2 scores or

[jira] [Created] (SPARK-17870) ML/MLLIB: Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Peng Meng (JIRA)
Peng Meng created SPARK-17870: - Summary: ML/MLLIB: Statistics.chiSqTest(RDD) is wrong Key: SPARK-17870 URL: https://issues.apache.org/jira/browse/SPARK-17870 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-20443) The blockSize of MLLIB ALS should be setting by the User

2017-04-24 Thread Peng Meng (JIRA)
Peng Meng created SPARK-20443: - Summary: The blockSize of MLLIB ALS should be setting by the User Key: SPARK-20443 URL: https://issues.apache.org/jira/browse/SPARK-20443 Project: Spark Issue

[jira] [Comment Edited] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981148#comment-15981148 ] Peng Meng edited comment on SPARK-20446 at 4/24/17 4:18 PM: Thanks [~mlnick],

[jira] [Commented] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982251#comment-15982251 ] Peng Meng commented on SPARK-20446: --- Yes, I compared with ML ALSModel.recommendAll. The data size is

[jira] [Created] (SPARK-21623) Comments of parentStats on ml/tree/impl/DTStatsAggregator.scala is wrong

2017-08-03 Thread Peng Meng (JIRA)
Peng Meng created SPARK-21623: - Summary: Comments of parentStats on ml/tree/impl/DTStatsAggregator.scala is wrong Key: SPARK-21623 URL: https://issues.apache.org/jira/browse/SPARK-21623 Project: Spark

[jira] [Created] (SPARK-21624) Optimize communication cost of RF/GBT/DT

2017-08-03 Thread Peng Meng (JIRA)
Peng Meng created SPARK-21624: - Summary: Optimize communication cost of RF/GBT/DT Key: SPARK-21624 URL: https://issues.apache.org/jira/browse/SPARK-21624 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-21624) Optimize communication cost of RF/GBT/DT

2017-08-03 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112374#comment-16112374 ] Peng Meng commented on SPARK-21624: --- ping [~josephkb] [~srowen] [~yanboliang] [~mlnick] [~yuhaoyan] >

[jira] [Commented] (SPARK-21624) Optimize communication cost of RF/GBT/DT

2017-08-03 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113793#comment-16113793 ] Peng Meng commented on SPARK-21624: --- Thanks [~mlnick], use Vector and compress is reasonable. I will

[jira] [Commented] (SPARK-21638) Warning message of RF is not accurate

2017-08-04 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114132#comment-16114132 ] Peng Meng commented on SPARK-21638: --- This is because "we not add the node to mutableNodesForGroup, but

[jira] [Commented] (SPARK-21638) Warning message of RF is not accurate

2017-08-04 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114153#comment-16114153 ] Peng Meng commented on SPARK-21638: --- In the example warning message, the split node shoud be 2621; >

[jira] [Updated] (SPARK-21638) Warning message of RF is not accurate

2017-08-04 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-21638: -- Description: When train RF model, there is many warning message like this: {quote}WARN RandomForest:

[jira] [Commented] (SPARK-21638) Warning message of RF is not accurate

2017-08-04 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114133#comment-16114133 ] Peng Meng commented on SPARK-21638: --- I will be back home now, will answer your question next week.

[jira] [Commented] (SPARK-21638) Warning message of RF is not accurate

2017-08-04 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114156#comment-16114156 ] Peng Meng commented on SPARK-21638: --- The first data should - nodeMemUsage > Warning message of RF is

[jira] [Created] (SPARK-21638) Warning message of RF is not accurate

2017-08-04 Thread Peng Meng (JIRA)
Peng Meng created SPARK-21638: - Summary: Warning message of RF is not accurate Key: SPARK-21638 URL: https://issues.apache.org/jira/browse/SPARK-21638 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-21638) Warning message of RF is not accurate

2017-08-04 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-21638: -- Description: When train RF model, there is many warning message like this: {quote}WARN RandomForest:

[jira] [Commented] (SPARK-21680) ML/MLLIB Vector compressed optimization

2017-08-09 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120115#comment-16120115 ] Peng Meng commented on SPARK-21680: --- Then we will have two toSparse: toSparse and toSparse(size) Do

[jira] [Commented] (SPARK-21680) ML/MLLIB Vector compressed optimization

2017-08-10 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121175#comment-16121175 ] Peng Meng commented on SPARK-21680: --- I mean if the user call toSparse(size), but the size is smaller

[jira] [Commented] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-10 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121284#comment-16121284 ] Peng Meng commented on SPARK-21688: --- MKL is just an example of native BLAS, if user has Openblas,

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-13 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085738#comment-16085738 ] Peng Meng commented on SPARK-21401: --- Yes, SPARK-21389 used pq.poll. pq.poll is just a small part of

[jira] [Updated] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-13 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-21401: -- Description: The most of BoundedPriorityQueue usages in ML/MLLIB are: Get the value of

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-13 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085874#comment-16085874 ] Peng Meng commented on SPARK-21401: --- Sure, I will add isEmpty and maybe some other functions, and tests

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089648#comment-16089648 ] Peng Meng commented on SPARK-21401: --- Yes, you don't need to do it for the vast majority of elements.

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089656#comment-16089656 ] Peng Meng commented on SPARK-21401: --- Got it, thanks [~srowen] > add poll function for

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089629#comment-16089629 ] Peng Meng commented on SPARK-21401: --- Thanks @srowen. I mean for BoundedPriorityQueue, you also can

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089638#comment-16089638 ] Peng Meng commented on SPARK-21401: --- I mean we totally rewrite the BoundedPriorityQueue, not use Java

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089437#comment-16089437 ] Peng Meng commented on SPARK-21401: --- I benchmarking just change pq.toArray.sorted. and pq.poll. pq.poll

[jira] [Comment Edited] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089586#comment-16089586 ] Peng Meng edited comment on SPARK-21401 at 7/17/17 10:10 AM: - I have tested

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089424#comment-16089424 ] Peng Meng commented on SPARK-21401: --- Hi [~srowen], for ALS optimization, the difference of using poll

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089578#comment-16089578 ] Peng Meng commented on SPARK-21401: --- Hi [~srowen], I got why my original test pq.toArray.sorted is very

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089586#comment-16089586 ] Peng Meng commented on SPARK-21401: --- I have tested much about poll and toArray.sorted. If the queue is

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089606#comment-16089606 ] Peng Meng commented on SPARK-21401: --- I think the BoundedPriorityQueue should be rewritten. there are

[jira] [Created] (SPARK-21389) ALS recommendForAll optimization uses Native BLAS

2017-07-12 Thread Peng Meng (JIRA)
Peng Meng created SPARK-21389: - Summary: ALS recommendForAll optimization uses Native BLAS Key: SPARK-21389 URL: https://issues.apache.org/jira/browse/SPARK-21389 Project: Spark Issue Type:

[jira] [Updated] (SPARK-21389) ALS recommendForAll optimization uses Native BLAS

2017-07-13 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-21389: -- Description: In Spark 2.2, we have optimized ALS recommendForAll, which uses a handwriting matrix

[jira] [Updated] (SPARK-21389) ALS recommendForAll optimization uses Native BLAS

2017-07-13 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-21389: -- Description: In Spark 2.2, we have optimized ALS recommendForAll, which uses a handwriting matrix

[jira] [Created] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-13 Thread Peng Meng (JIRA)
Peng Meng created SPARK-21401: - Summary: add poll function for BoundedPriorityQueue Key: SPARK-21401 URL: https://issues.apache.org/jira/browse/SPARK-21401 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-16 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089209#comment-16089209 ] Peng Meng commented on SPARK-21401: --- Hi [~srowen], here we also want to get a fully sorted list by get

[jira] [Commented] (SPARK-21476) RandomForest classification model not using broadcast in transform

2017-07-20 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094266#comment-16094266 ] Peng Meng commented on SPARK-21476: --- Seems transform should use transformImpl but not use? >

[jira] [Commented] (SPARK-21476) RandomForest classification model not using broadcast in transform

2017-07-20 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094603#comment-16094603 ] Peng Meng commented on SPARK-21476: --- I am optimizing RF and GBT these days, if no one works on it. I

[jira] [Commented] (SPARK-21476) RandomForest classification model not using broadcast in transform

2017-07-26 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101476#comment-16101476 ] Peng Meng commented on SPARK-21476: --- Hi [~sagraw], could you please test copy pasted the transform

[jira] [Comment Edited] (SPARK-21476) RandomForest classification model not using broadcast in transform

2017-07-26 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101476#comment-16101476 ] Peng Meng edited comment on SPARK-21476 at 7/26/17 10:06 AM: - Hi [~sagraw],

[jira] [Commented] (SPARK-21476) RandomForest classification model not using broadcast in transform

2017-07-26 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101253#comment-16101253 ] Peng Meng commented on SPARK-21476: --- Not each transform uses broadcast, do you have some experiment

[jira] [Commented] (SPARK-21476) RandomForest classification model not using broadcast in transform

2017-07-25 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099593#comment-16099593 ] Peng Meng commented on SPARK-21476: --- Hi @Suarabh, I am profiling RF transform performance. I change

[jira] [Comment Edited] (SPARK-21476) RandomForest classification model not using broadcast in transform

2017-07-25 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099593#comment-16099593 ] Peng Meng edited comment on SPARK-21476 at 7/25/17 6:55 AM: Hi @Suarabh, I am

[jira] [Commented] (SPARK-2465) Use long as user / item ID for ALS

2017-07-19 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094106#comment-16094106 ] Peng Meng commented on SPARK-2465: -- I think it is time to revisit this now. Some of our customers, such

[jira] [Created] (SPARK-21305) The BKM (best known methods) of using native BLAS to improvement ML/MLLIB performance

2017-07-04 Thread Peng Meng (JIRA)
Peng Meng created SPARK-21305: - Summary: The BKM (best known methods) of using native BLAS to improvement ML/MLLIB performance Key: SPARK-21305 URL: https://issues.apache.org/jira/browse/SPARK-21305

[jira] [Commented] (SPARK-21305) The BKM (best known methods) of using native BLAS to improvement ML/MLLIB performance

2017-07-04 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073801#comment-16073801 ] Peng Meng commented on SPARK-21305: --- yes, I will do that. Because different blas, the method to

[jira] [Commented] (SPARK-21305) The BKM (best known methods) of using native BLAS to improvement ML/MLLIB performance

2017-07-04 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073798#comment-16073798 ] Peng Meng commented on SPARK-21305: --- ping [~mlnick] , [~yanboliang], [~mengxr], [~srowen] > The BKM

[jira] [Comment Edited] (SPARK-21305) The BKM (best known methods) of using native BLAS to improvement ML/MLLIB performance

2017-07-04 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073801#comment-16073801 ] Peng Meng edited comment on SPARK-21305 at 7/4/17 3:39 PM: --- yes, I will do

[jira] [Commented] (SPARK-21305) The BKM (best known methods) of using native BLAS to improvement ML/MLLIB performance

2017-07-05 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074320#comment-16074320 ] Peng Meng commented on SPARK-21305: --- Thanks [~srowen] and [~yanboliang] I will disable native BLAS MT

[jira] [Commented] (SPARK-21305) The BKM (best known methods) of using native BLAS to improvement ML/MLLIB performance

2017-07-06 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076152#comment-16076152 ] Peng Meng commented on SPARK-21305: --- I tested Intel MKL and OpenBLAS by ALS Train and Prediction. ALS

[jira] [Comment Edited] (SPARK-21305) The BKM (best known methods) of using native BLAS to improvement ML/MLLIB performance

2017-07-06 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076152#comment-16076152 ] Peng Meng edited comment on SPARK-21305 at 7/6/17 8:22 AM: --- I tested Intel MKL

[jira] [Updated] (SPARK-20443) The blockSize of MLLIB ALS should be setting by the User

2017-04-24 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-20443: -- Description: The blockSize of MLLIB ALS is very important for ALS performance. In our test, when the

[jira] [Created] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Peng Meng (JIRA)
Peng Meng created SPARK-20446: - Summary: Optimize the process of MLLIB ALS recommendForAll Key: SPARK-20446 URL: https://issues.apache.org/jira/browse/SPARK-20446 Project: Spark Issue Type:

[jira] [Updated] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-20446: -- Description: The recommendForAll of MLLIB ALS is very slow. GC is a key problem of the current method.

[jira] [Commented] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981115#comment-15981115 ] Peng Meng commented on SPARK-20446: --- I think you said: https://github.com/apache/spark/pull/9980 Maybe

[jira] [Commented] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-24 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981148#comment-15981148 ] Peng Meng commented on SPARK-20446: --- Thanks [~mlnick], I also compared DataFrame Version ALS

[jira] [Commented] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-25 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983030#comment-15983030 ] Peng Meng commented on SPARK-20446: --- Thanks [~mlnick] , I agree with you. I am ok to close this ticket

[jira] [Comment Edited] (SPARK-20446) Optimize the process of MLLIB ALS recommendForAll

2017-04-25 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982251#comment-15982251 ] Peng Meng edited comment on SPARK-20446 at 4/25/17 3:06 PM: Yes, I compared

[jira] [Commented] (SPARK-20443) The blockSize of MLLIB ALS should be setting by the User

2017-04-25 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983059#comment-15983059 ] Peng Meng commented on SPARK-20443: --- Yes, based on my current test, I agree. But if the data size is

[jira] [Commented] (SPARK-11968) ALS recommend all methods spend most of time in GC

2017-04-25 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983074#comment-15983074 ] Peng Meng commented on SPARK-11968: --- Thanks [~mlnick] , I will post more results here. I latest result

[jira] [Commented] (SPARK-21680) ML/MLLIB Vector compressed optimization

2017-08-09 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120136#comment-16120136 ] Peng Meng commented on SPARK-21680: --- Ok, thanks, I will submit a PR. > ML/MLLIB Vector compressed

[jira] [Commented] (SPARK-21624) Optimize communication cost of RF/GBT/DT

2017-08-09 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120105#comment-16120105 ] Peng Meng commented on SPARK-21624: --- Hi [~mlnick], how do you think about this:

[jira] [Created] (SPARK-21680) ML/MLLIB Vector compressed optimization

2017-08-09 Thread Peng Meng (JIRA)
Peng Meng created SPARK-21680: - Summary: ML/MLLIB Vector compressed optimization Key: SPARK-21680 URL: https://issues.apache.org/jira/browse/SPARK-21680 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-21680) ML/MLLIB Vector compressed optimization

2017-08-09 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120947#comment-16120947 ] Peng Meng commented on SPARK-21680: --- Hi [~srowen], if add toSparse(size), for secure reason, it is

[jira] [Updated] (SPARK-21624) Optimize communication cost of RF/GBT/DT

2017-08-06 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-21624: -- Description: {quote}The implementation of RF is bound by either the cost of statistics computation

[jira] [Updated] (SPARK-21638) Warning message of RF is not accurate

2017-08-07 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-21638: -- Description: When train RF model, there is many warning message like this: {quote}WARN RandomForest:

[jira] [Commented] (SPARK-20764) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version

2017-05-22 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020662#comment-16020662 ] Peng Meng commented on SPARK-20764: --- I will submit a PR to cover more tests for model summary, thanks.

  1   2   >