[jira] [Comment Edited] (SPARK-3383) DecisionTree aggregate size could be smaller

2017-11-06 Thread
[ https://issues.apache.org/jira/browse/SPARK-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240284#comment-16240284 ] Yan Facai (颜发才) edited comment on SPARK-3383 at 11/6/17 1:28 PM

[jira] [Commented] (SPARK-3383) DecisionTree aggregate size could be smaller

2017-11-06 Thread
[ https://issues.apache.org/jira/browse/SPARK-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240284#comment-16240284 ] Yan Facai (颜发才) commented on SPARK-3383: [~WeichenXu123] Good work! I'd like to take a look

[jira] [Commented] (SPARK-3165) DecisionTree does not use sparsity in data

2017-09-26 Thread
[ https://issues.apache.org/jira/browse/SPARK-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16181925#comment-16181925 ] Yan Facai (颜发才) commented on SPARK-3165: The PR proposed by me has been closed because another

[jira] [Commented] (SPARK-21748) Migrate the implementation of HashingTF from MLlib to ML

2017-08-18 Thread
[ https://issues.apache.org/jira/browse/SPARK-21748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133947#comment-16133947 ] Yan Facai (颜发才) commented on SPARK-21748: - There seems to be something wrong with CI

[jira] [Comment Edited] (SPARK-21748) Migrate the implementation of HashingTF from MLlib to ML

2017-08-18 Thread
[ https://issues.apache.org/jira/browse/SPARK-21748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133947#comment-16133947 ] Yan Facai (颜发才) edited comment on SPARK-21748 at 8/19/17 4:43 AM

[jira] [Comment Edited] (SPARK-21748) Migrate the implementation of HashingTF from MLlib to ML

2017-08-16 Thread
[ https://issues.apache.org/jira/browse/SPARK-21748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128725#comment-16128725 ] Yan Facai (颜发才) edited comment on SPARK-21748 at 8/16/17 12:38 PM

[jira] [Commented] (SPARK-21748) Migrate the implementation of HashingTF from MLlib to ML

2017-08-16 Thread
[ https://issues.apache.org/jira/browse/SPARK-21748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128725#comment-16128725 ] Yan Facai (颜发才) commented on SPARK-21748: - [~yanboliang] Thanks, yanbo. As discussed on https

[jira] [Created] (SPARK-21748) Migrate the implementation of HashingTF from MLlib to ML

2017-08-16 Thread
Yan Facai (颜发才) created SPARK-21748: --- Summary: Migrate the implementation of HashingTF from MLlib to ML Key: SPARK-21748 URL: https://issues.apache.org/jira/browse/SPARK-21748 Project: Spark

[jira] [Commented] (SPARK-21690) one-pass imputer

2017-08-11 Thread
[ https://issues.apache.org/jira/browse/SPARK-21690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122922#comment-16122922 ] Yan Facai (颜发才) commented on SPARK-21690: - Cool! Just go head. > one-pass impu

[jira] [Comment Edited] (SPARK-21690) one-pass imputer

2017-08-11 Thread
[ https://issues.apache.org/jira/browse/SPARK-21690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122904#comment-16122904 ] Yan Facai (颜发才) edited comment on SPARK-21690 at 8/11/17 6:02 AM: -- We

[jira] [Comment Edited] (SPARK-21690) one-pass imputer

2017-08-11 Thread
[ https://issues.apache.org/jira/browse/SPARK-21690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122904#comment-16122904 ] Yan Facai (颜发才) edited comment on SPARK-21690 at 8/11/17 6:02 AM: -- We

[jira] [Commented] (SPARK-21690) one-pass imputer

2017-08-11 Thread
[ https://issues.apache.org/jira/browse/SPARK-21690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122904#comment-16122904 ] Yan Facai (颜发才) commented on SPARK-21690: - We can use `df.summary("

[jira] [Commented] (SPARK-21341) Spark 2.1.1: I want to be able to serialize wordVectors on Word2VecModel

2017-07-09 Thread
[ https://issues.apache.org/jira/browse/SPARK-21341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079507#comment-16079507 ] Yan Facai (颜发才) commented on SPARK-21341: - Yes, [~sowen] is right. Why not to use save and load

[jira] [Comment Edited] (SPARK-21341) Spark 2.1.1: I want to be able to serialize wordVectors on Word2VecModel

2017-07-08 Thread
[ https://issues.apache.org/jira/browse/SPARK-21341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078987#comment-16078987 ] Yan Facai (颜发才) edited comment on SPARK-21341 at 7/8/17 6:28 AM: - Hi

[jira] [Comment Edited] (SPARK-21341) Spark 2.1.1: I want to be able to serialize wordVectors on Word2VecModel

2017-07-08 Thread
[ https://issues.apache.org/jira/browse/SPARK-21341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078987#comment-16078987 ] Yan Facai (颜发才) edited comment on SPARK-21341 at 7/8/17 6:28 AM: - Hi

[jira] [Commented] (SPARK-21341) Spark 2.1.1: I want to be able to serialize wordVectors on Word2VecModel

2017-07-08 Thread
[ https://issues.apache.org/jira/browse/SPARK-21341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078987#comment-16078987 ] Yan Facai (颜发才) commented on SPARK-21341: - Hi, [~zsellami]. I guess that since the wordVectors

[jira] [Commented] (SPARK-21331) java.lang.NullPointerException for certain methods in classes of MLlib

2017-07-06 Thread
[ https://issues.apache.org/jira/browse/SPARK-21331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077620#comment-16077620 ] Yan Facai (颜发才) commented on SPARK-21331: - [~anirband] How about using this code? {code} val

[jira] [Comment Edited] (SPARK-21331) java.lang.NullPointerException for certain methods in classes of MLlib

2017-07-06 Thread
[ https://issues.apache.org/jira/browse/SPARK-21331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077605#comment-16077605 ] Yan Facai (颜发才) edited comment on SPARK-21331 at 7/7/17 5:21 AM: - Hi, I

[jira] [Commented] (SPARK-21331) java.lang.NullPointerException for certain methods in classes of MLlib

2017-07-06 Thread
[ https://issues.apache.org/jira/browse/SPARK-21331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077605#comment-16077605 ] Yan Facai (颜发才) commented on SPARK-21331: - Hi, I run the code in description on mac, spark-2.1.1

[jira] [Commented] (SPARK-21306) OneVsRest Conceals Columns That May Be Relevant To Underlying Classifier

2017-07-05 Thread
[ https://issues.apache.org/jira/browse/SPARK-21306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075974#comment-16075974 ] Yan Facai (颜发才) commented on SPARK-21306: - [~cathalgarvey] By the way, since LogisticRegression

[jira] [Comment Edited] (SPARK-21306) OneVsRest Conceals Columns That May Be Relevant To Underlying Classifier

2017-07-05 Thread
[ https://issues.apache.org/jira/browse/SPARK-21306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075970#comment-16075970 ] Yan Facai (颜发才) edited comment on SPARK-21306 at 7/6/17 5:40 AM: - I agree

[jira] [Commented] (SPARK-21306) OneVsRest Conceals Columns That May Be Relevant To Underlying Classifier

2017-07-05 Thread
[ https://issues.apache.org/jira/browse/SPARK-21306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075970#comment-16075970 ] Yan Facai (颜发才) commented on SPARK-21306: - I agree with [~n...@svana.org]. It seems that we

[jira] [Commented] (SPARK-21285) VectorAssembler should report the column name when data type used is not supported

2017-07-03 Thread
[ https://issues.apache.org/jira/browse/SPARK-21285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073125#comment-16073125 ] Yan Facai (颜发才) commented on SPARK-21285: - It seems easy, and I can work

[jira] [Commented] (SPARK-21066) LibSVM load just one input file

2017-06-22 Thread
[ https://issues.apache.org/jira/browse/SPARK-21066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060381#comment-16060381 ] Yan Facai (颜发才) commented on SPARK-21066: - Downgrade to Trivial since `numFeatures` should work

[jira] [Updated] (SPARK-21066) LibSVM load just one input file

2017-06-22 Thread
[ https://issues.apache.org/jira/browse/SPARK-21066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Facai (颜发才) updated SPARK-21066: Priority: Trivial (was: Major) > LibSVM load just one input f

[jira] [Comment Edited] (SPARK-21066) LibSVM load just one input file

2017-06-20 Thread
[ https://issues.apache.org/jira/browse/SPARK-21066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055336#comment-16055336 ] Yan Facai (颜发才) edited comment on SPARK-21066 at 6/20/17 8:27 AM

[jira] [Commented] (SPARK-21066) LibSVM load just one input file

2017-06-20 Thread
[ https://issues.apache.org/jira/browse/SPARK-21066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055336#comment-16055336 ] Yan Facai (颜发才) commented on SPARK-21066: - [~sowen] I believe that the API has explained well

[jira] [Commented] (SPARK-21066) LibSVM load just one input file

2017-06-20 Thread
[ https://issues.apache.org/jira/browse/SPARK-21066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055328#comment-16055328 ] Yan Facai (颜发才) commented on SPARK-21066: - Hi, [~darion] . If `numFeature` is specified

[jira] [Comment Edited] (SPARK-21066) LibSVM load just one input file

2017-06-20 Thread
[ https://issues.apache.org/jira/browse/SPARK-21066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055328#comment-16055328 ] Yan Facai (颜发才) edited comment on SPARK-21066 at 6/20/17 8:12 AM: -- Hi

[jira] [Commented] (SPARK-20787) PySpark can't handle datetimes before 1900

2017-05-29 Thread
[ https://issues.apache.org/jira/browse/SPARK-20787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028253#comment-16028253 ] Yan Facai (颜发才) commented on SPARK-20787: - Just go head, [~RBerenguel] ! The issue is beyond my

[jira] [Comment Edited] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

2017-05-26 Thread
[ https://issues.apache.org/jira/browse/SPARK-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015345#comment-16015345 ] Yan Facai (颜发才) edited comment on SPARK-19581 at 5/26/17 9:02 AM

[jira] [Commented] (SPARK-20498) RandomForestRegressionModel should expose getMaxDepth in PySpark

2017-05-26 Thread
[ https://issues.apache.org/jira/browse/SPARK-20498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025949#comment-16025949 ] Yan Facai (颜发才) commented on SPARK-20498: - [~iamshrek] Hi, Xin Ren. As the task is quite easy

[jira] [Comment Edited] (SPARK-20787) PySpark can't handle datetimes before 1900

2017-05-26 Thread
[ https://issues.apache.org/jira/browse/SPARK-20787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025853#comment-16025853 ] Yan Facai (颜发才) edited comment on SPARK-20787 at 5/26/17 6:03 AM

[jira] [Commented] (SPARK-20787) PySpark can't handle datetimes before 1900

2017-05-26 Thread
[ https://issues.apache.org/jira/browse/SPARK-20787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025853#comment-16025853 ] Yan Facai (颜发才) commented on SPARK-20787: - It seems that the exception is raised when

[jira] [Comment Edited] (SPARK-20768) PySpark FPGrowth does not expose numPartitions (expert) param

2017-05-18 Thread
[ https://issues.apache.org/jira/browse/SPARK-20768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015448#comment-16015448 ] Yan Facai (颜发才) edited comment on SPARK-20768 at 5/18/17 8:59 AM

[jira] [Commented] (SPARK-20768) PySpark FPGrowth does not expose numPartitions (expert) param

2017-05-18 Thread
[ https://issues.apache.org/jira/browse/SPARK-20768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015448#comment-16015448 ] Yan Facai (颜发才) commented on SPARK-20768: - It seems easy, I can work on it. > PySpark FPGro

[jira] [Commented] (SPARK-20768) PySpark FPGrowth does not expose numPartitions (expert) param

2017-05-18 Thread
[ https://issues.apache.org/jira/browse/SPARK-20768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015437#comment-16015437 ] Yan Facai (颜发才) commented on SPARK-20768: - Hi, I'm newbie. `numPartitions` is found in pyspark

[jira] [Commented] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

2017-05-18 Thread
[ https://issues.apache.org/jira/browse/SPARK-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015345#comment-16015345 ] Yan Facai (颜发才) commented on SPARK-19581: - [~barrybecker4] Hi, Becker. I can't reproduce the bug

[jira] [Commented] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

2017-05-08 Thread
[ https://issues.apache.org/jira/browse/SPARK-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001987#comment-16001987 ] Yan Facai (颜发才) commented on SPARK-19581: - [~barrybecker4] Could you give a sample code

[jira] [Commented] (SPARK-20526) Load doesn't work in PCAModel

2017-05-01 Thread
[ https://issues.apache.org/jira/browse/SPARK-20526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992367#comment-15992367 ] Yan Facai (颜发才) commented on SPARK-20526: - Can you give a sample code? > Load doesn't w

[jira] [Comment Edited] (SPARK-16957) Use weighted midpoints for split values.

2017-04-30 Thread
[ https://issues.apache.org/jira/browse/SPARK-16957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990195#comment-15990195 ] Yan Facai (颜发才) edited comment on SPARK-16957 at 4/30/17 11:28 AM

[jira] [Commented] (SPARK-16957) Use weighted midpoints for split values.

2017-04-30 Thread
[ https://issues.apache.org/jira/browse/SPARK-16957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990195#comment-15990195 ] Yan Facai (颜发才) commented on SPARK-16957: - To match the other libraries, we use mean value

[jira] [Comment Edited] (SPARK-20199) GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter

2017-04-26 Thread
[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984161#comment-15984161 ] Yan Facai (颜发才) edited comment on SPARK-20199 at 4/26/17 6:11 AM

[jira] [Commented] (SPARK-20199) GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter

2017-04-25 Thread
[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984161#comment-15984161 ] Yan Facai (颜发才) commented on SPARK-20199: - The work is easy, however Public method is added

[jira] [Commented] (SPARK-16957) Use weighted midpoints for split values.

2017-04-22 Thread
[ https://issues.apache.org/jira/browse/SPARK-16957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980268#comment-15980268 ] Yan Facai (颜发才) commented on SPARK-16957: - [~vlad.feinberg] Hi, I found that R's gbm uses mean

[jira] [Updated] (SPARK-16957) Use weighted midpoints for split values.

2017-04-22 Thread
[ https://issues.apache.org/jira/browse/SPARK-16957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Facai (颜发才) updated SPARK-16957: Description: We should be using weighted split points rather than the actual continuous

[jira] [Commented] (SPARK-20081) RandomForestClassifier doesn't seem to support more than 100 labels

2017-04-19 Thread
[ https://issues.apache.org/jira/browse/SPARK-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976098#comment-15976098 ] Yan Facai (颜发才) commented on SPARK-20081: - By the way, for StringIndexer, numerical label column

[jira] [Commented] (SPARK-20199) GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter

2017-04-14 Thread
[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969622#comment-15969622 ] Yan Facai (颜发才) commented on SPARK-20199: - ping [~jkbreuer] [~sethah] [~mengxr]. Which one

[jira] [Comment Edited] (SPARK-20081) RandomForestClassifier doesn't seem to support more than 100 labels

2017-04-13 Thread
[ https://issues.apache.org/jira/browse/SPARK-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967347#comment-15967347 ] Yan Facai (颜发才) edited comment on SPARK-20081 at 4/13/17 9:42 AM: -- Yes

[jira] [Comment Edited] (SPARK-20081) RandomForestClassifier doesn't seem to support more than 100 labels

2017-04-13 Thread
[ https://issues.apache.org/jira/browse/SPARK-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967347#comment-15967347 ] Yan Facai (颜发才) edited comment on SPARK-20081 at 4/13/17 9:40 AM: -- Yes

[jira] [Comment Edited] (SPARK-20081) RandomForestClassifier doesn't seem to support more than 100 labels

2017-04-13 Thread
[ https://issues.apache.org/jira/browse/SPARK-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967347#comment-15967347 ] Yan Facai (颜发才) edited comment on SPARK-20081 at 4/13/17 9:40 AM: -- Yes

[jira] [Commented] (SPARK-20081) RandomForestClassifier doesn't seem to support more than 100 labels

2017-04-13 Thread
[ https://issues.apache.org/jira/browse/SPARK-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967347#comment-15967347 ] Yan Facai (颜发才) commented on SPARK-20081: - Yes, you should use `builder.putLong("num

[jira] [Commented] (SPARK-19141) VectorAssembler metadata causing memory issues

2017-04-13 Thread
[ https://issues.apache.org/jira/browse/SPARK-19141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967246#comment-15967246 ] Yan Facai (颜发才) commented on SPARK-19141: - `VectorAssembler` will create attribute (name

[jira] [Comment Edited] (SPARK-19141) VectorAssembler metadata causing memory issues

2017-04-13 Thread
[ https://issues.apache.org/jira/browse/SPARK-19141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967246#comment-15967246 ] Yan Facai (颜发才) edited comment on SPARK-19141 at 4/13/17 7:42 AM

[jira] [Commented] (SPARK-20081) RandomForestClassifier doesn't seem to support more than 100 labels

2017-04-13 Thread
[ https://issues.apache.org/jira/browse/SPARK-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967204#comment-15967204 ] Yan Facai (颜发才) commented on SPARK-20081: - How about adding a `setNumClass` to shortcut infer

[jira] [Comment Edited] (SPARK-20081) RandomForestClassifier doesn't seem to support more than 100 labels

2017-04-13 Thread
[ https://issues.apache.org/jira/browse/SPARK-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967196#comment-15967196 ] Yan Facai (颜发才) edited comment on SPARK-20081 at 4/13/17 6:48 AM

[jira] [Updated] (SPARK-20081) RandomForestClassifier doesn't seem to support more than 100 labels

2017-04-13 Thread
[ https://issues.apache.org/jira/browse/SPARK-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Facai (颜发才) updated SPARK-20081: Component/s: ML > RandomForestClassifier doesn't seem to support more than 100 lab

[jira] [Commented] (SPARK-20081) RandomForestClassifier doesn't seem to support more than 100 labels

2017-04-13 Thread
[ https://issues.apache.org/jira/browse/SPARK-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967196#comment-15967196 ] Yan Facai (颜发才) commented on SPARK-20081: - [~creinig] Christian, RandomForestClassifier use

[jira] [Commented] (SPARK-20199) GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter

2017-04-12 Thread
[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966926#comment-15966926 ] Yan Facai (颜发才) commented on SPARK-20199: - It's not hard, and I can work on it. However

[jira] [Commented] (SPARK-20199) GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter

2017-04-11 Thread
[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965296#comment-15965296 ] Yan Facai (颜发才) commented on SPARK-20199: - Yes, as [~pralabhkumar] said, DecisionTree hardcodes

[jira] [Comment Edited] (SPARK-3383) DecisionTree aggregate size could be smaller

2017-04-11 Thread
[ https://issues.apache.org/jira/browse/SPARK-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963741#comment-15963741 ] Yan Facai (颜发才) edited comment on SPARK-3383 at 4/12/17 2:02 AM: - I think

[jira] [Commented] (SPARK-3383) DecisionTree aggregate size could be smaller

2017-04-11 Thread
[ https://issues.apache.org/jira/browse/SPARK-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965263#comment-15965263 ] Yan Facai (颜发才) commented on SPARK-3383: How about the idea? 1. We use `bin` to represent value

[jira] [Comment Edited] (SPARK-10788) Decision Tree duplicates bins for unordered categorical features

2017-04-11 Thread
[ https://issues.apache.org/jira/browse/SPARK-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965244#comment-15965244 ] Yan Facai (颜发才) edited comment on SPARK-10788 at 4/12/17 1:35 AM

[jira] [Commented] (SPARK-10788) Decision Tree duplicates bins for unordered categorical features

2017-04-11 Thread
[ https://issues.apache.org/jira/browse/SPARK-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965244#comment-15965244 ] Yan Facai (颜发才) commented on SPARK-10788: - [~josephkb] As categories A, B and C are independent

[jira] [Commented] (SPARK-3383) DecisionTree aggregate size could be smaller

2017-04-10 Thread
[ https://issues.apache.org/jira/browse/SPARK-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963741#comment-15963741 ] Yan Facai (颜发才) commented on SPARK-3383: I think the task contains two subtask: 1. separate

[jira] [Commented] (SPARK-16957) Use weighted midpoints for split values.

2017-04-06 Thread
[ https://issues.apache.org/jira/browse/SPARK-16957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960177#comment-15960177 ] Yan Facai (颜发才) commented on SPARK-16957: - I think that it is helpful for small dataset, while

[jira] [Commented] (SPARK-3159) Check for reducible DecisionTree

2017-03-30 Thread
[ https://issues.apache.org/jira/browse/SPARK-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948754#comment-15948754 ] Yan Facai (颜发才) commented on SPARK-3159: [~josephkb] Hi, is the jira still needed? I'd like

[jira] [Comment Edited] (SPARK-20043) CrossValidatorModel loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are accept

2017-03-23 Thread
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939571#comment-15939571 ] Yan Facai (颜发才) edited comment on SPARK-20043 at 3/24/17 2:15 AM

[jira] [Commented] (SPARK-20043) CrossValidatorModel loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are accepted

2017-03-23 Thread
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939571#comment-15939571 ] Yan Facai (颜发才) commented on SPARK-20043: - The bug can be reproduced by: ```scala test("

[jira] [Issue Comment Deleted] (SPARK-20043) CrossValidatorModel loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are

2017-03-23 Thread
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Facai (颜发才) updated SPARK-20043: Comment: was deleted (was: [~zsellami] could you give an example of your code? I try

[jira] [Commented] (SPARK-20043) CrossValidatorModel loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are accepted

2017-03-23 Thread
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939560#comment-15939560 ] Yan Facai (颜发才) commented on SPARK-20043: - [~zsellami] could you give an example of your code? I

[jira] [Commented] (SPARK-20043) CrossValidatorModel loader does not recognize impurity "Gini" and "Entropy" on ML random forest and decision. Only "gini" and "entropy" (in lower case) are accepted

2017-03-23 Thread
[ https://issues.apache.org/jira/browse/SPARK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939533#comment-15939533 ] Yan Facai (颜发才) commented on SPARK-20043: - Perhaps it's better to convert impurity Type after

[jira] [Commented] (SPARK-3728) RandomForest: Learn models too large to store in memory

2017-03-22 Thread
[ https://issues.apache.org/jira/browse/SPARK-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936208#comment-15936208 ] Yan Facai (颜发才) commented on SPARK-3728: RandomForest already use a stack to save node