[jira] [Commented] (SPARK-7131) Move tree,forest implementation from spark.mllib to spark.ml

2018-03-15 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401071#comment-16401071
 ] 

Joseph K. Bradley commented on SPARK-7131:
--

CCing people watching this JIRA about https://github.com/apache/spark/pull/20786
In that PR, we want to make LeafNode and InternalNode into traits (not classes) 
in order to split Regression from Classification nodes (to have stronger 
typing).  Will this break anyone's code outside of org.apache.spark.ml?  I 
doubt it since the node constructors are still private, but I wanted to CC 
people.  Thanks!

> Move tree,forest implementation from spark.mllib to spark.ml
> 
>
> Key: SPARK-7131
> URL: https://issues.apache.org/jira/browse/SPARK-7131
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Major
> Fix For: 1.5.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> We want to change and improve the spark.ml API for trees and ensembles, but 
> we cannot change the old API in spark.mllib.  To support the changes we want 
> to make, we should move the implementation from spark.mllib to spark.ml.  We 
> will generalize and modify it, but will also ensure that we do not change the 
> behavior of the old API.
> There are several steps to this:
> 1. Copy the implementation over to spark.ml and change the spark.ml classes 
> to use that implementation, rather than calling the spark.mllib 
> implementation.  The current spark.ml tests will ensure that the 2 
> implementations learn exactly the same models.  Note: This should include 
> performance testing to make sure the updated code does not have any 
> regressions. --> *UPDATE*: I have run tests using spark-perf, and there were 
> no regressions.
> 2. Remove the spark.mllib implementation, and make the spark.mllib APIs 
> wrappers around the spark.ml implementation.  The spark.ml tests will again 
> ensure that we do not change any behavior.
> 3. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
> verify model equivalence.
> This JIRA is now for step 1 only.  Steps 2 and 3 will be in separate JIRAs.
> After these updates, we can more safely generalize and improve the spark.ml 
> implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7131) Move tree,forest implementation from spark.mllib to spark.ml

2015-12-11 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15054000#comment-15054000
 ] 

Joseph K. Bradley commented on SPARK-7131:
--

Yes, I'm sorry about how long this has taken, but I have enough confidence in 
the API now proceed.  I've created a JIRA for doing this in the next release: 
[SPARK-12301], though I may not be able to look at this issue until January.  
Please post your thoughts there, and ping in early January if there is no 
activity.  Thank you!

> Move tree,forest implementation from spark.mllib to spark.ml
> 
>
> Key: SPARK-7131
> URL: https://issues.apache.org/jira/browse/SPARK-7131
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
> Fix For: 1.5.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> We want to change and improve the spark.ml API for trees and ensembles, but 
> we cannot change the old API in spark.mllib.  To support the changes we want 
> to make, we should move the implementation from spark.mllib to spark.ml.  We 
> will generalize and modify it, but will also ensure that we do not change the 
> behavior of the old API.
> There are several steps to this:
> 1. Copy the implementation over to spark.ml and change the spark.ml classes 
> to use that implementation, rather than calling the spark.mllib 
> implementation.  The current spark.ml tests will ensure that the 2 
> implementations learn exactly the same models.  Note: This should include 
> performance testing to make sure the updated code does not have any 
> regressions. --> *UPDATE*: I have run tests using spark-perf, and there were 
> no regressions.
> 2. Remove the spark.mllib implementation, and make the spark.mllib APIs 
> wrappers around the spark.ml implementation.  The spark.ml tests will again 
> ensure that we do not change any behavior.
> 3. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
> verify model equivalence.
> This JIRA is now for step 1 only.  Steps 2 and 3 will be in separate JIRAs.
> After these updates, we can more safely generalize and improve the spark.ml 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7131) Move tree,forest implementation from spark.mllib to spark.ml

2015-12-09 Thread Peter Rudenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049368#comment-15049368
 ] 

Peter Rudenko commented on SPARK-7131:
--

Please remove final classes from RF and GBM models in ml package. I want to 
extend them, set some parameters, reimplement some functionality (do 
probabilistic models for GBC, etc.).

> Move tree,forest implementation from spark.mllib to spark.ml
> 
>
> Key: SPARK-7131
> URL: https://issues.apache.org/jira/browse/SPARK-7131
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
> Fix For: 1.5.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> We want to change and improve the spark.ml API for trees and ensembles, but 
> we cannot change the old API in spark.mllib.  To support the changes we want 
> to make, we should move the implementation from spark.mllib to spark.ml.  We 
> will generalize and modify it, but will also ensure that we do not change the 
> behavior of the old API.
> There are several steps to this:
> 1. Copy the implementation over to spark.ml and change the spark.ml classes 
> to use that implementation, rather than calling the spark.mllib 
> implementation.  The current spark.ml tests will ensure that the 2 
> implementations learn exactly the same models.  Note: This should include 
> performance testing to make sure the updated code does not have any 
> regressions. --> *UPDATE*: I have run tests using spark-perf, and there were 
> no regressions.
> 2. Remove the spark.mllib implementation, and make the spark.mllib APIs 
> wrappers around the spark.ml implementation.  The spark.ml tests will again 
> ensure that we do not change any behavior.
> 3. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
> verify model equivalence.
> This JIRA is now for step 1 only.  Steps 2 and 3 will be in separate JIRAs.
> After these updates, we can more safely generalize and improve the spark.ml 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7131) Move tree,forest implementation from spark.mllib to spark.ml

2015-12-07 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045546#comment-15045546
 ] 

Joseph K. Bradley commented on SPARK-7131:
--

Thanks for pinging!  I think I lost track of this JIRA since it was closed 
before steps 2 and 3 were done.  I'll make JIRAs.

> Move tree,forest implementation from spark.mllib to spark.ml
> 
>
> Key: SPARK-7131
> URL: https://issues.apache.org/jira/browse/SPARK-7131
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
> Fix For: 1.5.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> We want to change and improve the spark.ml API for trees and ensembles, but 
> we cannot change the old API in spark.mllib.  To support the changes we want 
> to make, we should move the implementation from spark.mllib to spark.ml.  We 
> will generalize and modify it, but will also ensure that we do not change the 
> behavior of the old API.
> There are several steps to this:
> 1. Copy the implementation over to spark.ml and change the spark.ml classes 
> to use that implementation, rather than calling the spark.mllib 
> implementation.  The current spark.ml tests will ensure that the 2 
> implementations learn exactly the same models.  Note: This should include 
> performance testing to make sure the updated code does not have any 
> regressions. --> *UPDATE*: I have run tests using spark-perf, and there were 
> no regressions.
> 2. Remove the spark.mllib implementation, and make the spark.mllib APIs 
> wrappers around the spark.ml implementation.  The spark.ml tests will again 
> ensure that we do not change any behavior.
> 3. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
> verify model equivalence.
> This JIRA is now for step 1 only.  Steps 2 and 3 will be in separate JIRAs.
> After these updates, we can more safely generalize and improve the spark.ml 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7131) Move tree,forest implementation from spark.mllib to spark.ml

2015-12-07 Thread Seth Hendrickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045559#comment-15045559
 ] 

Seth Hendrickson commented on SPARK-7131:
-

Great. I'd love to take a crack at step two if that's alright.

> Move tree,forest implementation from spark.mllib to spark.ml
> 
>
> Key: SPARK-7131
> URL: https://issues.apache.org/jira/browse/SPARK-7131
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
> Fix For: 1.5.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> We want to change and improve the spark.ml API for trees and ensembles, but 
> we cannot change the old API in spark.mllib.  To support the changes we want 
> to make, we should move the implementation from spark.mllib to spark.ml.  We 
> will generalize and modify it, but will also ensure that we do not change the 
> behavior of the old API.
> There are several steps to this:
> 1. Copy the implementation over to spark.ml and change the spark.ml classes 
> to use that implementation, rather than calling the spark.mllib 
> implementation.  The current spark.ml tests will ensure that the 2 
> implementations learn exactly the same models.  Note: This should include 
> performance testing to make sure the updated code does not have any 
> regressions. --> *UPDATE*: I have run tests using spark-perf, and there were 
> no regressions.
> 2. Remove the spark.mllib implementation, and make the spark.mllib APIs 
> wrappers around the spark.ml implementation.  The spark.ml tests will again 
> ensure that we do not change any behavior.
> 3. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
> verify model equivalence.
> This JIRA is now for step 1 only.  Steps 2 and 3 will be in separate JIRAs.
> After these updates, we can more safely generalize and improve the spark.ml 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7131) Move tree,forest implementation from spark.mllib to spark.ml

2015-12-01 Thread Seth Hendrickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034859#comment-15034859
 ] 

Seth Hendrickson commented on SPARK-7131:
-

[~josephkb] Numbers two and three on the list above are still outstanding. 
Shall we create Jiras for them and begin work?

> Move tree,forest implementation from spark.mllib to spark.ml
> 
>
> Key: SPARK-7131
> URL: https://issues.apache.org/jira/browse/SPARK-7131
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
> Fix For: 1.5.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> We want to change and improve the spark.ml API for trees and ensembles, but 
> we cannot change the old API in spark.mllib.  To support the changes we want 
> to make, we should move the implementation from spark.mllib to spark.ml.  We 
> will generalize and modify it, but will also ensure that we do not change the 
> behavior of the old API.
> This JIRA should be done in several PRs, in this order:
> 1. Copy the implementation over to spark.ml and change the spark.ml classes 
> to use that implementation, rather than calling the spark.mllib 
> implementation.  The current spark.ml tests will ensure that the 2 
> implementations learn exactly the same models.  Note: This should include 
> performance testing to make sure the updated code does not have any 
> regressions.
> 2. Remove the spark.mllib implementation, and make the spark.mllib APIs 
> wrappers around the spark.ml implementation.  The spark.ml tests will again 
> ensure that we do not change any behavior.
> 3. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
> verify model equivalence.
> After these updates, we can more safely generalize and improve the spark.ml 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7131) Move tree,forest implementation from spark.mllib to spark.ml

2015-07-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619166#comment-14619166
 ] 

Apache Spark commented on SPARK-7131:
-

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/7294

 Move tree,forest implementation from spark.mllib to spark.ml
 

 Key: SPARK-7131
 URL: https://issues.apache.org/jira/browse/SPARK-7131
 Project: Spark
  Issue Type: Improvement
  Components: ML, MLlib
Affects Versions: 1.4.0
Reporter: Joseph K. Bradley
Assignee: Joseph K. Bradley
   Original Estimate: 168h
  Remaining Estimate: 168h

 We want to change and improve the spark.ml API for trees and ensembles, but 
 we cannot change the old API in spark.mllib.  To support the changes we want 
 to make, we should move the implementation from spark.mllib to spark.ml.  We 
 will generalize and modify it, but will also ensure that we do not change the 
 behavior of the old API.
 This JIRA should be done in several PRs, in this order:
 1. Copy the implementation over to spark.ml and change the spark.ml classes 
 to use that implementation, rather than calling the spark.mllib 
 implementation.  The current spark.ml tests will ensure that the 2 
 implementations learn exactly the same models.  Note: This should include 
 performance testing to make sure the updated code does not have any 
 regressions.
 2. Remove the spark.mllib implementation, and make the spark.mllib APIs 
 wrappers around the spark.ml implementation.  The spark.ml tests will again 
 ensure that we do not change any behavior.
 3. Move the unit tests to spark.ml, and change the spark.mllib unit tests to 
 verify model equivalence.
 After these updates, we can more safely generalize and improve the spark.ml 
 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org