[jira] [Comment Edited] (SPARK-22075) GBTs forgot to unpersist datasets cached by Checkpointer

2017-09-20 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172859#comment-16172859 ] zhengruifeng edited comment on SPARK-22075 at 9/20/17 9:48 AM:

[jira] [Updated] (SPARK-22075) GBTs forgot to unpersist datasets cached by Checkpointer

2017-09-20 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-22075: - Summary: GBTs forgot to unpersist datasets cached by Checkpointer (was: GBTs/Pregel forgot to un

[jira] [Updated] (SPARK-22075) GBTs forgot to unpersist datasets cached by Checkpointer

2017-09-20 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-22075: - Component/s: (was: GraphX) > GBTs forgot to unpersist datasets cached by Checkpointer > -

[jira] [Updated] (SPARK-22075) GBTs/Pregel forgot to unpersist datasets cached by Checkpointer

2017-09-20 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-22075: - Summary: GBTs/Pregel forgot to unpersist datasets cached by Checkpointer (was: GBTs forgot to un

[jira] [Updated] (SPARK-22075) GBTs forgot to unpersist datasets cached by PeriodicRDDCheckpointer

2017-09-20 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-22075: - Component/s: GraphX > GBTs forgot to unpersist datasets cached by PeriodicRDDCheckpointer > -

[jira] [Commented] (SPARK-22075) GBTs forgot to unpersist datasets cached by PeriodicRDDCheckpointer

2017-09-20 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172859#comment-16172859 ] zhengruifeng commented on SPARK-22075: -- Same issue seems exist in {{Pregel}}, each c

[jira] [Created] (SPARK-22075) GBTs forgot to unpersist datasets cached by PeriodicRDDCheckpointer

2017-09-19 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-22075: Summary: GBTs forgot to unpersist datasets cached by PeriodicRDDCheckpointer Key: SPARK-22075 URL: https://issues.apache.org/jira/browse/SPARK-22075 Project: Spark

[jira] [Comment Edited] (SPARK-21972) Allow users to control input data persistence in ML Estimators via a handlePersistence ml.Param

2017-09-19 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171342#comment-16171342 ] zhengruifeng edited comment on SPARK-21972 at 9/19/17 8:54 AM:

[jira] [Commented] (SPARK-21972) Allow users to control input data persistence in ML Estimators via a handlePersistence ml.Param

2017-09-19 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171342#comment-16171342 ] zhengruifeng commented on SPARK-21972: -- Since persistence handling is very algorithm

[jira] [Updated] (SPARK-22009) Using treeAggregate improve some algs

2017-09-14 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-22009: - Description: I test on a dataset of about 13M instances, and found that using `treeAggregate` gi

[jira] [Created] (SPARK-22009) Using treeAggregate improve some algs

2017-09-14 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-22009: Summary: Using treeAggregate improve some algs Key: SPARK-22009 URL: https://issues.apache.org/jira/browse/SPARK-22009 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-21879) Should Scalers handel NaN values?

2017-08-30 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-21879: Summary: Should Scalers handel NaN values? Key: SPARK-21879 URL: https://issues.apache.org/jira/browse/SPARK-21879 Project: Spark Issue Type: Question

[jira] [Updated] (SPARK-20711) MultivariateOnlineSummarizer incorrect min/max for NaN value

2017-08-29 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-20711: - Summary: MultivariateOnlineSummarizer incorrect min/max for NaN value (was: MultivariateOnlineSu

[jira] [Commented] (SPARK-20711) MultivariateOnlineSummarizer incorrect min/max for identical NaN feature

2017-08-29 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146480#comment-16146480 ] zhengruifeng commented on SPARK-20711: -- [~WeichenXu123] I notice that you have just

[jira] [Reopened] (SPARK-20711) MultivariateOnlineSummarizer incorrect min/max for identical NaN feature

2017-08-29 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reopened SPARK-20711: -- > MultivariateOnlineSummarizer incorrect min/max for identical NaN feature > --

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137773#comment-16137773 ] zhengruifeng commented on SPARK-21742: -- [~srowen] Yes, if we cache the input dataset

[jira] [Closed] (SPARK-20711) MultivariateOnlineSummarizer incorrect min/max for identical NaN feature

2017-08-20 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng closed SPARK-20711. Resolution: Not A Problem > MultivariateOnlineSummarizer incorrect min/max for identical NaN featur

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-17 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130045#comment-16130045 ] zhengruifeng commented on SPARK-21742: -- [~srowen] I cache the dataset in that test,

[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129818#comment-16129818 ] zhengruifeng commented on SPARK-16473: -- If the {{sparseDataset}} in {{BisectingKMea

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128623#comment-16128623 ] zhengruifeng commented on SPARK-21742: -- [~srowen] I create {{random}} and {{rdd}} tw

[jira] [Updated] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-21742: - Description: I found that {{BisectingKMeans}} will generate different models if the input is cac

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128535#comment-16128535 ] zhengruifeng commented on SPARK-21742: -- [~mlnick] The seed is already fixed. It look

[jira] [Updated] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-21742: - Description: I found that {{BisectingKMeans}} will generate different models if the input is cac

[jira] [Updated] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-21742: - Description: I found that {{BisectingKMeans}} will generate different models if the input is cac

[jira] [Updated] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-21742: - Description: I found that {{BisectingKMeans}} will generate different models if the input is cac

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128509#comment-16128509 ] zhengruifeng commented on SPARK-21742: -- [~srowen] you are right. When I create the s

[jira] [Comment Edited] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128452#comment-16128452 ] zhengruifeng edited comment on SPARK-21742 at 8/16/17 7:40 AM:

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128452#comment-16128452 ] zhengruifeng commented on SPARK-21742: -- [~srowen] I retest it in different spark-she

[jira] [Commented] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128427#comment-16128427 ] zhengruifeng commented on SPARK-21742: -- [~srowen] I set the seed for generate datase

[jira] [Updated] (SPARK-21742) BisectingKMeans generate different models with/without caching

2017-08-15 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-21742: - Summary: BisectingKMeans generate different models with/without caching (was: BisectingKMeans ge

[jira] [Created] (SPARK-21742) BisectingKMeans generate different results with/without caching

2017-08-15 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-21742: Summary: BisectingKMeans generate different results with/without caching Key: SPARK-21742 URL: https://issues.apache.org/jira/browse/SPARK-21742 Project: Spark

[jira] [Commented] (SPARK-21690) one-pass imputer

2017-08-10 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122909#comment-16122909 ] zhengruifeng commented on SPARK-21690: -- The corresponding PR is here https://github.

[jira] [Comment Edited] (SPARK-21690) one-pass imputer

2017-08-10 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122907#comment-16122907 ] zhengruifeng edited comment on SPARK-21690 at 8/11/17 6:06 AM:

[jira] [Commented] (SPARK-21690) one-pass imputer

2017-08-10 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122907#comment-16122907 ] zhengruifeng commented on SPARK-21690: -- [~facai] Thanks, but I already send a PR for

[jira] [Created] (SPARK-21690) one-pass imputer

2017-08-09 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-21690: Summary: one-pass imputer Key: SPARK-21690 URL: https://issues.apache.org/jira/browse/SPARK-21690 Project: Spark Issue Type: Improvement Components

[jira] [Closed] (SPARK-20762) Make String Params Case-Insensitive

2017-08-09 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng closed SPARK-20762. Resolution: Not A Problem > Make String Params Case-Insensitive > -

[jira] [Created] (SPARK-21388) GBT inherit from HasStepSize & LInearSVC/Binarizer from HasThreshold

2017-07-12 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-21388: Summary: GBT inherit from HasStepSize & LInearSVC/Binarizer from HasThreshold Key: SPARK-21388 URL: https://issues.apache.org/jira/browse/SPARK-21388 Project: Spark

[jira] [Commented] (SPARK-14174) Implement the Mini-Batch KMeans

2017-06-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058982#comment-16058982 ] zhengruifeng commented on SPARK-14174: -- [~mlnick] I send a new PR for MiniBatch KMea

[jira] [Updated] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-06-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-14174: - Description: The MiniBatchKMeans is a variant of the KMeans algorithm which uses mini-batches to

[jira] [Updated] (SPARK-14174) Implement the Mini-Batch KMeans

2017-06-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-14174: - Summary: Implement the Mini-Batch KMeans (was: Accelerate KMeans via Mini-Batch EM) > Implement

[jira] [Updated] (SPARK-14174) Implement the Mini-Batch KMeans

2017-06-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-14174: - Attachment: MBKM.xlsx > Implement the Mini-Batch KMeans > --- > >

[jira] [Updated] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-06-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-14174: - Attachment: (was: MiniBatchKMeans_Performance.pdf) > Accelerate KMeans via Mini-Batch EM > --

[jira] [Updated] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-06-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-14174: - Attachment: (was: MiniBatchKMeans_Performance_II.pdf) > Accelerate KMeans via Mini-Batch EM >

[jira] [Closed] (SPARK-19057) Instance weights must be non-negative

2017-06-07 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng closed SPARK-19057. Resolution: Won't Fix > Instance weights must be non-negative > ---

[jira] [Created] (SPARK-20932) CountVectorizer support handle persistence

2017-05-30 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-20932: Summary: CountVectorizer support handle persistence Key: SPARK-20932 URL: https://issues.apache.org/jira/browse/SPARK-20932 Project: Spark Issue Type: Improv

[jira] [Created] (SPARK-20930) Destroy broadcasted centers after computing cost

2017-05-30 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-20930: Summary: Destroy broadcasted centers after computing cost Key: SPARK-20930 URL: https://issues.apache.org/jira/browse/SPARK-20930 Project: Spark Issue Type:

[jira] [Commented] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025870#comment-16025870 ] zhengruifeng commented on SPARK-14174: -- [~mlnick] [~sethah] I am sorry to say that

[jira] [Commented] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024417#comment-16024417 ] zhengruifeng commented on SPARK-14174: -- [~mlnick] OK, I have updated the attached pe

[jira] [Updated] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-14174: - Attachment: MiniBatchKMeans_Performance_II.pdf > Accelerate KMeans via Mini-Batch EM > --

[jira] [Commented] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-05-23 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020786#comment-16020786 ] zhengruifeng commented on SPARK-14174: -- [~mlnick] I attach the last test result. I f

[jira] [Updated] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-05-23 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-14174: - Attachment: MiniBatchKMeans_Performance.pdf > Accelerate KMeans via Mini-Batch EM > -

[jira] [Updated] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-05-23 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-14174: - Attachment: (was: MiniBatchKMeans_Performance.pdf) > Accelerate KMeans via Mini-Batch EM > --

[jira] [Updated] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-05-23 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-14174: - Attachment: MiniBatchKMeans_Performance.pdf > Accelerate KMeans via Mini-Batch EM > -

[jira] [Commented] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-05-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020696#comment-16020696 ] zhengruifeng commented on SPARK-14174: -- [~chrisfalter] Agree that it's better to sup

[jira] [Comment Edited] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-05-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020693#comment-16020693 ] zhengruifeng edited comment on SPARK-14174 at 5/23/17 6:12 AM:

[jira] [Commented] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-05-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020693#comment-16020693 ] zhengruifeng commented on SPARK-14174: -- [~mlnick] I make some performace experiments

[jira] [Created] (SPARK-20849) Document R DecisionTree

2017-05-22 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-20849: Summary: Document R DecisionTree Key: SPARK-20849 URL: https://issues.apache.org/jira/browse/SPARK-20849 Project: Spark Issue Type: Improvement Com

[jira] [Updated] (SPARK-20762) Make String Params Case-Insensitive

2017-05-16 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-20762: - Description: Make String Params (excpet Cols) case-insensitve: {{solver}} {{modelType}} {{initMod

[jira] [Created] (SPARK-20762) Make String Params Case-Insensitive

2017-05-15 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-20762: Summary: Make String Params Case-Insensitive Key: SPARK-20762 URL: https://issues.apache.org/jira/browse/SPARK-20762 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-20669) LogisticRegression family should be case insensitive

2017-05-15 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-20669: - Priority: Minor (was: Trivial) > LogisticRegression family should be case insensitive > ---

[jira] [Commented] (SPARK-15767) Decision Tree Regression wrapper in SparkR

2017-05-14 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009948#comment-16009948 ] zhengruifeng commented on SPARK-15767: -- I will have a try, if nobody work on it. >

[jira] [Commented] (SPARK-20711) MultivariateOnlineSummarizer incorrect min/max for identical NaN feature

2017-05-12 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007748#comment-16007748 ] zhengruifeng commented on SPARK-20711: -- [~mlnick] It seems that in current implement

[jira] [Created] (SPARK-20711) MultivariateOnlineSummarizer incorrect min/max for identical NaN feature

2017-05-11 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-20711: Summary: MultivariateOnlineSummarizer incorrect min/max for identical NaN feature Key: SPARK-20711 URL: https://issues.apache.org/jira/browse/SPARK-20711 Project: Spa

[jira] [Closed] (SPARK-20673) LDA `optimizer` do not really support case insensitive

2017-05-09 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng closed SPARK-20673. Resolution: Duplicate > LDA `optimizer` do not really support case insensitive > -

[jira] [Created] (SPARK-20673) LDA `optimizer` do not really support case insensitive

2017-05-08 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-20673: Summary: LDA `optimizer` do not really support case insensitive Key: SPARK-20673 URL: https://issues.apache.org/jira/browse/SPARK-20673 Project: Spark Issue

[jira] [Created] (SPARK-20669) LogisticRegression family should be case insensitive

2017-05-08 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-20669: Summary: LogisticRegression family should be case insensitive Key: SPARK-20669 URL: https://issues.apache.org/jira/browse/SPARK-20669 Project: Spark Issue T

[jira] [Closed] (SPARK-20056) IsotonicRegression support Numeric features

2017-04-20 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng closed SPARK-20056. Resolution: Not A Problem > IsotonicRegression support Numeric features > -

[jira] [Created] (SPARK-20041) Update docs for NaN handling in approxQuantile

2017-03-20 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-20041: Summary: Update docs for NaN handling in approxQuantile Key: SPARK-20041 URL: https://issues.apache.org/jira/browse/SPARK-20041 Project: Spark Issue Type: Im

[jira] [Commented] (SPARK-18608) Spark ML algorithms that check RDD cache level for internal caching double-cache data

2017-03-12 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906822#comment-15906822 ] zhengruifeng commented on SPARK-18608: -- [~mlnick][~srowen] [~yuhaoyan] What's your o

[jira] [Commented] (SPARK-19808) About the default blocking arg in unpersist

2017-03-08 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902317#comment-15902317 ] zhengruifeng commented on SPARK-19808: -- [~srowen] Agreed. Changing the default may c

[jira] [Closed] (SPARK-19808) About the default blocking arg in unpersist

2017-03-08 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng closed SPARK-19808. Resolution: Not A Problem > About the default blocking arg in unpersist > -

[jira] [Created] (SPARK-19808) About the default blocking arg in unpersist

2017-03-03 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-19808: Summary: About the default blocking arg in unpersist Key: SPARK-19808 URL: https://issues.apache.org/jira/browse/SPARK-19808 Project: Spark Issue Type: Quest

[jira] [Commented] (SPARK-18608) Spark ML algorithms that check RDD cache level for internal caching double-cache data

2017-03-02 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893596#comment-15893596 ] zhengruifeng commented on SPARK-18608: -- [~mlnick] [~yuhaoyan] [~srowen] I think if w

[jira] [Created] (SPARK-19704) AFTSurvivalRegression should support numeric censorCol

2017-02-22 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-19704: Summary: AFTSurvivalRegression should support numeric censorCol Key: SPARK-19704 URL: https://issues.apache.org/jira/browse/SPARK-19704 Project: Spark Issue

[jira] [Created] (SPARK-19694) Add missing 'setTopicDistributionCol' for LDAModel

2017-02-21 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-19694: Summary: Add missing 'setTopicDistributionCol' for LDAModel Key: SPARK-19694 URL: https://issues.apache.org/jira/browse/SPARK-19694 Project: Spark Issue Type

[jira] [Created] (SPARK-19679) Destroy broadcasted object without blocking

2017-02-21 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-19679: Summary: Destroy broadcasted object without blocking Key: SPARK-19679 URL: https://issues.apache.org/jira/browse/SPARK-19679 Project: Spark Issue Type: Impro

[jira] [Commented] (SPARK-18608) Spark ML algorithms that check RDD cache level for internal caching double-cache data

2017-02-20 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875468#comment-15875468 ] zhengruifeng commented on SPARK-18608: -- [~mlnick] I will send a PR for this accordin

[jira] [Commented] (SPARK-14174) Accelerate KMeans via Mini-Batch EM

2017-02-20 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875313#comment-15875313 ] zhengruifeng commented on SPARK-14174: -- [~srowen] the {{run}} changes were already m

[jira] [Created] (SPARK-19573) Make NaN/null handling consistent in approxQuantile

2017-02-13 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-19573: Summary: Make NaN/null handling consistent in approxQuantile Key: SPARK-19573 URL: https://issues.apache.org/jira/browse/SPARK-19573 Project: Spark Issue Typ

[jira] [Commented] (SPARK-18813) MLlib 2.2 Roadmap

2017-02-03 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851243#comment-15851243 ] zhengruifeng commented on SPARK-18813: -- [~bryanc] +1 > MLlib 2.2 Roadmap >

[jira] [Commented] (SPARK-19208) MultivariateOnlineSummarizer performance optimization

2017-02-02 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849837#comment-15849837 ] zhengruifeng commented on SPARK-19208: -- [~mlnick] +1 I think we can create a private

[jira] [Comment Edited] (SPARK-18608) Spark ML algorithms that check RDD cache level for internal caching double-cache data

2017-02-02 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849830#comment-15849830 ] zhengruifeng edited comment on SPARK-18608 at 2/2/17 11:53 AM:

[jira] [Comment Edited] (SPARK-18608) Spark ML algorithms that check RDD cache level for internal caching double-cache data

2017-02-02 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849830#comment-15849830 ] zhengruifeng edited comment on SPARK-18608 at 2/2/17 11:52 AM:

[jira] [Commented] (SPARK-18608) Spark ML algorithms that check RDD cache level for internal caching double-cache data

2017-02-02 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849830#comment-15849830 ] zhengruifeng commented on SPARK-18608: -- [~yuhaoyan] Agree that it's nice to add an e

[jira] [Created] (SPARK-19436) Add missing tests for approxQuantiles

2017-02-02 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-19436: Summary: Add missing tests for approxQuantiles Key: SPARK-19436 URL: https://issues.apache.org/jira/browse/SPARK-19436 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-19422) Cache input data in algorithms

2017-02-01 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849290#comment-15849290 ] zhengruifeng commented on SPARK-19422: -- [~mlnick] Thanks a lot for pointing this out

[jira] [Commented] (SPARK-19208) MultivariateOnlineSummarizer performance optimization

2017-02-01 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848140#comment-15848140 ] zhengruifeng commented on SPARK-19208: -- [~mlnick] What about supporting {{groupBy}}

[jira] [Updated] (SPARK-19422) Cache input data in algorithms

2017-02-01 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-19422: - Description: Now some algorithms cache the input dataset if it was not cached any more {{Storage

[jira] [Created] (SPARK-19422) Cache input data in algorithms

2017-02-01 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-19422: Summary: Cache input data in algorithms Key: SPARK-19422 URL: https://issues.apache.org/jira/browse/SPARK-19422 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-19421) Remove numClasses and numFeatures methods in LinearSVC

2017-01-31 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-19421: Summary: Remove numClasses and numFeatures methods in LinearSVC Key: SPARK-19421 URL: https://issues.apache.org/jira/browse/SPARK-19421 Project: Spark Issue

[jira] (SPARK-19410) Links to API documentation are broken

2017-01-31 Thread zhengruifeng (JIRA)
Title: Message Title zhengruifeng edited a comment on SPARK-19410

[jira] (SPARK-19410) Links to API documentation are broken

2017-01-31 Thread zhengruifeng (JIRA)
Title: Message Title zhengruifeng commented on SPARK-19410

[jira] (SPARK-19410) Links to API documentation are broken

2017-01-31 Thread zhengruifeng (JIRA)
Title: Message Title zhengruifeng commented on SPARK-19410

[jira] [Commented] (SPARK-19208) MultivariateOnlineSummarizer performance optimization

2017-01-27 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843884#comment-15843884 ] zhengruifeng commented on SPARK-19208: -- [~josephkb] I have considered of the analogy

[jira] [Created] (SPARK-19384) forget unpersist input dataset in IsotonicRegression

2017-01-27 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-19384: Summary: forget unpersist input dataset in IsotonicRegression Key: SPARK-19384 URL: https://issues.apache.org/jira/browse/SPARK-19384 Project: Spark Issue Ty

[jira] [Comment Edited] (SPARK-19208) MultivariateOnlineSummarizer perfermence optimization

2017-01-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15833345#comment-15833345 ] zhengruifeng edited comment on SPARK-19208 at 1/22/17 8:25 AM:

[jira] [Comment Edited] (SPARK-19208) MultivariateOnlineSummarizer perfermence optimization

2017-01-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15833345#comment-15833345 ] zhengruifeng edited comment on SPARK-19208 at 1/22/17 8:00 AM:

[jira] [Comment Edited] (SPARK-19208) MultivariateOnlineSummarizer perfermence optimization

2017-01-22 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15833345#comment-15833345 ] zhengruifeng edited comment on SPARK-19208 at 1/22/17 8:01 AM:

[jira] [Commented] (SPARK-19208) MultivariateOnlineSummarizer perfermence optimization

2017-01-21 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15833345#comment-15833345 ] zhengruifeng commented on SPARK-19208: -- After diving into sparksql's udaf, I design

[jira] [Created] (SPARK-19303) Add evaluate method in clustering models

2017-01-19 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-19303: Summary: Add evaluate method in clustering models Key: SPARK-19303 URL: https://issues.apache.org/jira/browse/SPARK-19303 Project: Spark Issue Type: Improvem

<    4   5   6   7   8   9   10   11   12   >