[jira] [Updated] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to d

2021-06-08 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian King updated SPARK-34591: Attachment: Reproducible example of Spark bug - no 2.pdf > Pyspark undertakes pruning of decision

[jira] [Commented] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to

2021-06-08 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359321#comment-17359321 ] Julian King commented on SPARK-34591: - To address any concerns about the example above being

[jira] [Commented] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to

2021-06-07 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358956#comment-17358956 ] Julian King commented on SPARK-34591: - The fact that there's no signal isn't the issue. The issue is

[jira] [Commented] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to

2021-06-07 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358953#comment-17358953 ] Julian King commented on SPARK-34591: - Here is a reproducible example of this bug which demonstrates

[jira] [Updated] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to d

2021-06-07 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian King updated SPARK-34591: Attachment: Reproducible example of Spark bug.pdf > Pyspark undertakes pruning of decision trees

[jira] [Updated] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to d

2021-06-07 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian King updated SPARK-34591: Attachment: Reproducible example of Spark bug.pdf > Pyspark undertakes pruning of decision trees

[jira] [Updated] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to d

2021-06-07 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian King updated SPARK-34591: Attachment: (was: Reproducible example of Spark bug.pdf) > Pyspark undertakes pruning of

[jira] [Updated] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to d

2021-06-07 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian King updated SPARK-34591: Priority: Major (was: Minor) > Pyspark undertakes pruning of decision trees and random forests

[jira] [Updated] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to d

2021-03-10 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian King updated SPARK-34591: Affects Version/s: 2.4.4 > Pyspark undertakes pruning of decision trees and random forests

[jira] [Updated] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to d

2021-03-10 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian King updated SPARK-34591: Priority: Critical (was: Major) > Pyspark undertakes pruning of decision trees and random

[jira] [Updated] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to d

2021-03-10 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian King updated SPARK-34591: Affects Version/s: 3.1.1 > Pyspark undertakes pruning of decision trees and random forests

[jira] [Updated] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to d

2021-03-02 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian King updated SPARK-34591: Description: *History of the issue* SPARK-3159 implemented a method designed to reduce the

[jira] [Commented] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to

2021-03-02 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293484#comment-17293484 ] Julian King commented on SPARK-34591: - FYI [~asolimando] > Pyspark undertakes pruning of decision

[jira] [Created] (SPARK-34591) Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to d

2021-03-02 Thread Julian King (Jira)
Julian King created SPARK-34591: --- Summary: Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to diagnose and impossible to correct

[jira] [Commented] (SPARK-3159) Check for reducible DecisionTree

2021-03-01 Thread Julian King (Jira)
[ https://issues.apache.org/jira/browse/SPARK-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293259#comment-17293259 ] Julian King commented on SPARK-3159: I also need the probability estimates for the tree, not the

[jira] [Commented] (SPARK-9478) Add sample weights to Random Forest

2018-08-06 Thread Julian King (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569855#comment-16569855 ] Julian King commented on SPARK-9478: Has there been any progress on this in recent times? It looks

[jira] [Created] (SPARK-23730) Save and expose "in bag" tracking for random forest model

2018-03-18 Thread Julian King (JIRA)
Julian King created SPARK-23730: --- Summary: Save and expose "in bag" tracking for random forest model Key: SPARK-23730 URL: https://issues.apache.org/jira/browse/SPARK-23730 Project: Spark

[jira] [Created] (SPARK-23704) PySpark access of individual trees in random forest is slow

2018-03-15 Thread Julian King (JIRA)
Julian King created SPARK-23704: --- Summary: PySpark access of individual trees in random forest is slow Key: SPARK-23704 URL: https://issues.apache.org/jira/browse/SPARK-23704 Project: Spark