[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module

2014-11-11 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207199#comment-14207199 ] Doris Xin commented on SPARK-4348: -- I fully support this. It took a lot of hacking just

[jira] [Created] (SPARK-3077) ChiSqTest bugs

2014-08-15 Thread Doris Xin (JIRA)
Doris Xin created SPARK-3077: Summary: ChiSqTest bugs Key: SPARK-3077 URL: https://issues.apache.org/jira/browse/SPARK-3077 Project: Spark Issue Type: Bug Components: MLlib

[jira] [Created] (SPARK-2993) colStats in Statistics as wrapper around MultivariateStatisticalSummary in Scala and Python

2014-08-12 Thread Doris Xin (JIRA)
Doris Xin created SPARK-2993: Summary: colStats in Statistics as wrapper around MultivariateStatisticalSummary in Scala and Python Key: SPARK-2993 URL: https://issues.apache.org/jira/browse/SPARK-2993

[jira] [Updated] (SPARK-2515) Chi-squared test

2014-08-11 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Xin updated SPARK-2515: - Summary: Chi-squared test (was: Hypothesis testing) Chi-squared test

[jira] [Created] (SPARK-2980) Python support for chi-squared test

2014-08-11 Thread Doris Xin (JIRA)
Doris Xin created SPARK-2980: Summary: Python support for chi-squared test Key: SPARK-2980 URL: https://issues.apache.org/jira/browse/SPARK-2980 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-2937) Separate out sampleByKeyExact in PairRDDFunctions as its own API

2014-08-09 Thread Doris Xin (JIRA)
Doris Xin created SPARK-2937: Summary: Separate out sampleByKeyExact in PairRDDFunctions as its own API Key: SPARK-2937 URL: https://issues.apache.org/jira/browse/SPARK-2937 Project: Spark

[jira] [Resolved] (SPARK-2851) Check API consistency for decision tree

2014-08-08 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Xin resolved SPARK-2851. -- Resolution: Done Check API consistency for decision tree ---

[jira] [Reopened] (SPARK-2851) Check API consistency for decision tree

2014-08-07 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Xin reopened SPARK-2851: -- Shouldn't have been auto-closed with the PR. Check API consistency for decision tree

[jira] [Created] (SPARK-2786) Python correlations

2014-08-01 Thread Doris Xin (JIRA)
Doris Xin created SPARK-2786: Summary: Python correlations Key: SPARK-2786 URL: https://issues.apache.org/jira/browse/SPARK-2786 Project: Spark Issue Type: Sub-task Reporter: Doris

[jira] [Created] (SPARK-2782) Spearman correlation computes wrong ranks when numPartitions RDD size

2014-07-31 Thread Doris Xin (JIRA)
Doris Xin created SPARK-2782: Summary: Spearman correlation computes wrong ranks when numPartitions RDD size Key: SPARK-2782 URL: https://issues.apache.org/jira/browse/SPARK-2782 Project: Spark

[jira] [Reopened] (SPARK-2512) Stratified sampling

2014-07-28 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Xin reopened SPARK-2512: -- Stratified sampling --- Key: SPARK-2512 URL:

[jira] [Created] (SPARK-2724) Python version of Random RDD without support for arbitrary distribution

2014-07-28 Thread Doris Xin (JIRA)
Doris Xin created SPARK-2724: Summary: Python version of Random RDD without support for arbitrary distribution Key: SPARK-2724 URL: https://issues.apache.org/jira/browse/SPARK-2724 Project: Spark

[jira] [Commented] (SPARK-2515) Hypothesis testing

2014-07-25 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074802#comment-14074802 ] Doris Xin commented on SPARK-2515: -- A toString method sounds like a really good idea here

[jira] [Created] (SPARK-2679) Ser/De for Double to enable calling Java API from python in MLlib

2014-07-24 Thread Doris Xin (JIRA)
Doris Xin created SPARK-2679: Summary: Ser/De for Double to enable calling Java API from python in MLlib Key: SPARK-2679 URL: https://issues.apache.org/jira/browse/SPARK-2679 Project: Spark

[jira] [Commented] (SPARK-2515) Hypothesis testing

2014-07-24 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073879#comment-14073879 ] Doris Xin commented on SPARK-2515: -- Here's the proposed API for chi-squared tests (lives

[jira] [Created] (SPARK-2656) Python version without support for exact sample size

2014-07-23 Thread Doris Xin (JIRA)
Doris Xin created SPARK-2656: Summary: Python version without support for exact sample size Key: SPARK-2656 URL: https://issues.apache.org/jira/browse/SPARK-2656 Project: Spark Issue Type:

[jira] [Closed] (SPARK-2599) almostEquals mllib.util.TestingUtils does not behave as expected when comparing against 0.0

2014-07-21 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Xin closed SPARK-2599. Resolution: Duplicate Refer to this issue: https://issues.apache.org/jira/browse/SPARK-2479 almostEquals

[jira] [Created] (SPARK-2599) almostEquals mllib.util.TestingUtils does not behave as expected when comparing against 0.0

2014-07-20 Thread Doris Xin (JIRA)
Doris Xin created SPARK-2599: Summary: almostEquals mllib.util.TestingUtils does not behave as expected when comparing against 0.0 Key: SPARK-2599 URL: https://issues.apache.org/jira/browse/SPARK-2599

[jira] [Commented] (SPARK-2512) Stratified sampling

2014-07-20 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067848#comment-14067848 ] Doris Xin commented on SPARK-2512: -- Hey Xiangrui can you close this one since there's

[jira] [Closed] (SPARK-2600) Correlations (Pearson, Spearman)

2014-07-20 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Xin closed SPARK-2600. Resolution: Implemented Correlations (Pearson, Spearman)

[jira] [Updated] (SPARK-2082) Stratified sampling implementation in PairRDDFunctions

2014-07-20 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Xin updated SPARK-2082: - Target Version/s: 1.1.0 Stratified sampling implementation in PairRDDFunctions

[jira] [Closed] (SPARK-2512) Stratified sampling

2014-07-20 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Xin closed SPARK-2512. Resolution: Duplicate Stratified sampling --- Key: SPARK-2512

[jira] [Commented] (SPARK-2599) almostEquals mllib.util.TestingUtils does not behave as expected when comparing against 0.0

2014-07-20 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068113#comment-14068113 ] Doris Xin commented on SPARK-2599: -- Found this in-depth article discussing the different

[jira] [Comment Edited] (SPARK-2599) almostEquals mllib.util.TestingUtils does not behave as expected when comparing against 0.0

2014-07-20 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068113#comment-14068113 ] Doris Xin edited comment on SPARK-2599 at 7/21/14 2:06 AM: ---

[jira] [Issue Comment Deleted] (SPARK-2308) Add KMeans MiniBatch clustering algorithm to MLlib

2014-07-08 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Xin updated SPARK-2308: - Comment: was deleted (was: Hey guys, Sorry to crash the party. I don't think small clusters are

[jira] [Updated] (SPARK-2359) Supporting common statistical functions in MLlib

2014-07-03 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Xin updated SPARK-2359: - Summary: Supporting common statistical functions in MLlib (was: Supporting common statistical

[jira] [Updated] (SPARK-2182) Scalastyle rule blocking unicode operators

2014-06-18 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Xin updated SPARK-2182: - Attachment: Screen Shot 2014-06-18 at 3.28.44 PM.png How I spotted it in Eclipse Scalastyle rule

[jira] [Created] (SPARK-2145) Add lower bound on sampling rate to guarantee sampling performance

2014-06-14 Thread Doris Xin (JIRA)
Doris Xin created SPARK-2145: Summary: Add lower bound on sampling rate to guarantee sampling performance Key: SPARK-2145 URL: https://issues.apache.org/jira/browse/SPARK-2145 Project: Spark

[jira] [Created] (SPARK-2082) Stratified sampling implementation in PairRDDFunctions

2014-06-09 Thread Doris Xin (JIRA)
Doris Xin created SPARK-2082: Summary: Stratified sampling implementation in PairRDDFunctions Key: SPARK-2082 URL: https://issues.apache.org/jira/browse/SPARK-2082 Project: Spark Issue Type: New

[jira] [Created] (SPARK-2088) NPE in toString when creationSiteInfo is null after deserialization

2014-06-09 Thread Doris Xin (JIRA)
Doris Xin created SPARK-2088: Summary: NPE in toString when creationSiteInfo is null after deserialization Key: SPARK-2088 URL: https://issues.apache.org/jira/browse/SPARK-2088 Project: Spark

[jira] [Updated] (SPARK-1939) Refactor takeSample method in RDD to use ScaSRS

2014-05-29 Thread Doris Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Xin updated SPARK-1939: - Summary: Refactor takeSample method in RDD to use ScaSRS (was: Improve takeSample method in RDD)

[jira] [Created] (SPARK-1939) Improve takeSample method in RDD

2014-05-27 Thread Doris Xin (JIRA)
Doris Xin created SPARK-1939: Summary: Improve takeSample method in RDD Key: SPARK-1939 URL: https://issues.apache.org/jira/browse/SPARK-1939 Project: Spark Issue Type: Improvement