[jira] [Updated] (SPARK-18813) MLlib 2.2 Roadmap

2016-12-29 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18813:
--
Description: 
*PROPOSAL: This includes a proposal for the 2.2 roadmap process for MLlib.*
The roadmap process described below is significantly updated since the 2.1 
roadmap [SPARK-15581].  Please refer to [SPARK-15581] for more discussion on 
the basis for this proposal, and comment in this JIRA if you have suggestions 
for improvements.

h1. Roadmap process

This roadmap is a master list for MLlib improvements we are working on during 
this release.  This includes ML-related changes in PySpark and SparkR.

*What is planned for the next release?*
* This roadmap lists issues which at least one Committer has prioritized.  See 
details below in "Instructions for committers."
* This roadmap only lists larger or more critical issues.

*How can contributors influence this roadmap?*
* If you believe an issue should be in this roadmap, please discuss the issue 
on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
least one must agree to shepherd the issue.
* For general discussions, use this JIRA or the dev mailing list.  For specific 
issues, please comment on those issues or the mailing list.
* Vote for & watch issues which are important to you.
** MLlib, sorted by: [Votes | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20votes%20DESC]
 or [Watchers | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20Watchers%20DESC]
** SparkR, sorted by: [Votes | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20votes%20DESC]
 or [Watchers | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20Watchers%20DESC]

h2. Target Version and Priority

This section describes the meaning of Target Version and Priority.  _These 
meanings have been updated in this proposal for the 2.2 process._

|| Category | Target Version | Priority | Shepherd | Put on roadmap? | In next 
release? ||
| [1 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Blocker%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
 | next release | Blocker | *must* | *must* | *must* |
| [2 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Critical%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
 | next release | Critical | *must* | yes, unless small | *best effort* |
| [3 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Major%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
 | next release | Major | *must* | optional | *best effort* |
| [4 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Minor%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
 | next release | Minor | optional | no | maybe |
| [5 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Trivial%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
 | next release | Trivial | optional | no | maybe |
| [6 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20"Target%20Version%2Fs"%20in%20(EMPTY)%20AND%20Shepherd%20not%20in%20(EMPTY)%20ORDER%20BY%20priority%20DESC]
 | (empty) | (any) | yes | no | maybe |
| [7 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(EMPTY)%20AND%20Shepherd%20in%20(EMPTY)%20ORDER%20BY%20priority%20DESC]
 | (empty) | (any) | 

[jira] [Updated] (SPARK-18813) MLlib 2.2 Roadmap

2016-12-14 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18813:
--
Description: 
*PROPOSAL: This includes a proposal for the 2.2 roadmap process for MLlib.*
The roadmap process described below is significantly updated since the 2.1 
roadmap [SPARK-15581].  Please refer to [SPARK-15581] for more discussion on 
the basis for this proposal, and comment in this JIRA if you have suggestions 
for improvements.

h1. Roadmap process

This roadmap is a master list for MLlib improvements we are working on during 
this release.  This includes ML-related changes in PySpark and SparkR.

*What is planned for the next release?*
* This roadmap lists issues which at least one Committer has prioritized.  See 
details below in "Instructions for committers."
* This roadmap only lists larger or more critical issues.

*How can contributors influence this roadmap?*
* If you believe an issue should be in this roadmap, please discuss the issue 
on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
least one must agree to shepherd the issue.
* For general discussions, use this JIRA or the dev mailing list.  For specific 
issues, please comment on those issues or the mailing list.
* Vote for & watch issues which are important to you.
** MLlib, sorted by: [Votes | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20votes%20DESC]
 or [Watchers | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20Watchers%20DESC]
** SparkR, sorted by: [Votes | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20votes%20DESC]
 or [Watchers | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20Watchers%20DESC]

h2. Target Version and Priority

This section describes the meaning of Target Version and Priority.  _These 
meanings have been updated in this proposal for the 2.2 process._

|| Category | Target Version | Priority | Shepherd | Put on roadmap? | In next 
release? ||
| 1 | next release | Blocker | *must* | *must* | *must* |
| 2 | next release | Critical | *must* | yes, unless small | *best effort* |
| 3 | next release | Major | *must* | optional | *best effort* |
| 4 | next release | Minor | optional | no | maybe |
| 5 | next release | Trivial | optional | no | maybe |
| 6 | (empty) | (any) | yes | no | maybe |
| 7 | (empty) | (any) | no | no | maybe |

The *Category* in the table above has the following meaning:

1. A committer has promised to see this issue to completion for the next 
release.  Contributions *will* receive attention.
2-3. A committer has promised to see this issue to completion for the next 
release.  Contributions *will* receive attention.  The issue may slip to the 
next release if development is slower than expected.
4-5. A committer has promised interest in this issue.  Contributions *will* 
receive attention.  The issue may slip to another release.
6. A committer has promised interest in this issue and should respond, but no 
promises are made about priorities or releases.
7. This issue is open for discussion, but it needs a committer to promise 
interest to proceed.

h1. Instructions

h2. For contributors

Getting started
* Please read http://spark.apache.org/contributing.html carefully. Code style, 
documentation, and unit tests are important.
* If you are a first-time contributor, please always start with a small 
[starter task|https://issues.apache.org/jira/issues/?filter=12333209] rather 
than a larger feature.

Coordinating on JIRA
* Never work silently. Let everyone know on the corresponding JIRA page when 
you start work. This is to avoid duplicate work. For small patches, you do not 
need to get the JIRA assigned to you to begin work.
* For medium/large features or features with dependencies, please get assigned 
first before coding and keep the ETA updated on the JIRA. If there is no 
activity on the JIRA page for a certain amount of time, the JIRA should be 
released for other contributors.
* Do not claim multiple (>3) JIRAs at the same time. Try to finish them one 
after another.
* Do not set these fields: Target Version, Fix Version, or Shepherd.  Only 
Committers should set those.

Writing and reviewing PRs
* Remember to add the `@Since("VERSION")` annotation to new public APIs.
* *Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code 
review greatly helps to improve others' code as well as yours.*


[jira] [Updated] (SPARK-18813) MLlib 2.2 Roadmap

2016-12-13 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18813:
--
Description: 
*PROPOSAL: This includes a proposal for the 2.2 roadmap process for MLlib.*
The roadmap process described below is significantly updated since the 2.1 
roadmap [SPARK-15581].  Please refer to [SPARK-15581] for more discussion on 
the basis for this proposal, and comment in this JIRA if you have suggestions 
for improvements.

h1. Roadmap process

This roadmap is a master list for MLlib improvements we are working on during 
this release.  This includes ML-related changes in PySpark and SparkR.

*What is planned for the next release?*
* This roadmap lists issues which at least one Committer has prioritized.  See 
details below in "Instructions for committers."
* This roadmap only lists larger or more critical issues.

*How can contributors influence this roadmap?*
* If you believe an issue should be in this roadmap, please discuss the issue 
on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
least one must agree to shepherd the issue.
* For general discussions, use this JIRA or the dev mailing list.  For specific 
issues, please comment on those issues or the mailing list.

h2. Target Version and Priority

This section describes the meaning of Target Version and Priority.  _These 
meanings have been updated in this proposal for the 2.2 process._

|| Category | Target Version | Priority | Shepherd | Put on roadmap? | In next 
release? ||
| 1 | next release | Blocker | *must* | *must* | *must* |
| 2 | next release | Critical | *must* | yes, unless small | *best effort* |
| 3 | next release | Major | *must* | optional | *best effort* |
| 4 | next release | Minor | optional | no | maybe |
| 5 | next release | Trivial | optional | no | maybe |
| 6 | (empty) | (any) | yes | no | maybe |
| 7 | (empty) | (any) | no | no | maybe |

The *Category* in the table above has the following meaning:

1. A committer has promised to see this issue to completion for the next 
release.  Contributions *will* receive attention.
2-3. A committer has promised to see this issue to completion for the next 
release.  Contributions *will* receive attention.  The issue may slip to the 
next release if development is slower than expected.
4-5. A committer has promised interest in this issue.  Contributions *will* 
receive attention.  The issue may slip to another release.
6. A committer has promised interest in this issue and should respond, but no 
promises are made about priorities or releases.
7. This issue is open for discussion, but it needs a committer to promise 
interest to proceed.

h1. Instructions

h2. For contributors

Getting started
* Please read http://spark.apache.org/contributing.html carefully. Code style, 
documentation, and unit tests are important.
* If you are a first-time contributor, please always start with a small 
[starter task|https://issues.apache.org/jira/issues/?filter=12333209] rather 
than a larger feature.

Coordinating on JIRA
* Never work silently. Let everyone know on the corresponding JIRA page when 
you start work. This is to avoid duplicate work. For small patches, you do not 
need to get the JIRA assigned to you to begin work.
* For medium/large features or features with dependencies, please get assigned 
first before coding and keep the ETA updated on the JIRA. If there is no 
activity on the JIRA page for a certain amount of time, the JIRA should be 
released for other contributors.
* Do not claim multiple (>3) JIRAs at the same time. Try to finish them one 
after another.
* Do not set these fields: Target Version, Fix Version, or Shepherd.  Only 
Committers should set those.

Writing and reviewing PRs
* Remember to add the `@Since("VERSION")` annotation to new public APIs.
* *Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code 
review greatly helps to improve others' code as well as yours.*

h2. For Committers

Adding to this roadmap
* You can update the roadmap by (a) adding issues to this list and (b) setting 
Target Versions.  Only Committers may make these changes.
* *If you add an issue to this roadmap or set a Target Version, you _must_ 
assign yourself or another Committer as Shepherd.*
* This list should be actively managed during the release.
* If you target a significant item for the next release, please list the item 
on this roadmap.
* If you commit to shepherding a new public API, you implicitly commit to 
shepherding the follow-up issues as well (Python/R APIs, docs).

Creating JIRA issues
* Try to break down big features into small and specific JIRA tasks and link 
them properly.
* Add a "starter" label to starter tasks.
* Put a rough time estimate for medium/big features and track the progress.
* Set Priority carefully.  Priority should not be mixed with size of effort for 
implementation.

Managing JIRA issues 

[jira] [Updated] (SPARK-18813) MLlib 2.2 Roadmap

2016-12-12 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-18813:

Description: 
*PROPOSAL: This includes a proposal for the 2.2 roadmap process for MLlib.*
The roadmap process described below is significantly updated since the 2.1 
roadmap [SPARK-15581].  Please refer to [SPARK-15581] for more discussion on 
the basis for this proposal, and comment in this JIRA if you have suggestions 
for improvements.

h1. Roadmap process

This roadmap is a master list for MLlib improvements we are working on during 
this release.  This includes ML-related changes in PySpark and SparkR.

*What is planned for the next release?*
* This roadmap lists issues which at least one Committer has prioritized.  See 
details below in "Instructions for committers."
* This roadmap only lists larger or more critical issues.

*How can contributors influence this roadmap?*
* If you believe an issue should be in this roadmap, please discuss the issue 
on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
least one must agree to shepherd the issue.
* For general discussions, use this JIRA or the dev mailing list.  For specific 
issues, please comment on those issues or the mailing list.

h2. Target Version and Priority

This section describes the meaning of Target Version and Priority.  _These 
meanings have been updated in this proposal for the 2.2 process._

|| Category | Target Version | Priority | Shepherd | Put on roadmap? | In next 
release? ||
| 1 | next release | Blocker | *must* | *must* | *must* |
| 2 | next release | Critical | *must* | yes, unless small | *best effort* |
| 3 | next release | Major | *must* | optional | *best effort* |
| 4 | next release | Minor | optional | no | maybe |
| 5 | next release | Trivial | optional | no | maybe |
| 6 | (empty) | (any) | yes | no | maybe |
| 7 | (empty) | (any) | no | no | maybe |

The *Category* in the table above has the following meaning:

1. A committer has promised to see this issue to completion for the next 
release.  Contributions *will* receive attention.
2-3. A committer has promised to see this issue to completion for the next 
release.  Contributions *will* receive attention.  The issue may slip to the 
next release if development is slower than expected.
4-5. A committer has promised interest in this issue.  Contributions *will* 
receive attention.  The issue may slip to another release.
6. A committer has promised interest in this issue and should respond, but no 
promises are made about priorities or releases.
7. This issue is open for discussion, but it needs a committer to promise 
interest to proceed.

h1. Instructions

h2. For contributors

Getting started
* Please read http://spark.apache.org/contributing.html carefully. Code style, 
documentation, and unit tests are important.
* If you are a first-time contributor, please always start with a small 
[starter task|https://issues.apache.org/jira/issues/?filter=12333209] rather 
than a larger feature.

Coordinating on JIRA
* Never work silently. Let everyone know on the corresponding JIRA page when 
you start work. This is to avoid duplicate work. For small patches, you do not 
need to get the JIRA assigned to you to begin work.
* For medium/large features or features with dependencies, please get assigned 
first before coding and keep the ETA updated on the JIRA. If there is no 
activity on the JIRA page for a certain amount of time, the JIRA should be 
released for other contributors.
* Do not claim multiple (>3) JIRAs at the same time. Try to finish them one 
after another.
* Do not set these fields: Target Version, Fix Version, or Shepherd.  Only 
Committers should set those.

Writing and reviewing PRs
* Remember to add the `@Since("VERSION")` annotation to new public APIs.
* *Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code 
review greatly helps to improve others' code as well as yours.*

h2. For Committers

Adding to this roadmap
* You can update the roadmap by (a) adding issues to this list and (b) setting 
Target Versions.  Only Committers may make these changes.
* *If you add an issue to this roadmap or set a Target Version, you _must_ 
assign yourself or another Committer as Shepherd.*
* This list should be actively managed during the release.
* If you target a significant item for the next release, please list the item 
on this roadmap.
* If you commit to shepherding a new public API, you implicitly commit to 
shepherding the follow-up issues as well (Python/R APIs, docs).

Creating JIRA issues
* Try to break down big features into small and specific JIRA tasks and link 
them properly.
* Add a "starter" label to starter tasks.
* Put a rough time estimate for medium/big features and track the progress.
* Set Priority carefully.  Priority should not be mixed with size of effort for 
implementation.

Managing JIRA issues and PRs
* 

[jira] [Updated] (SPARK-18813) MLlib 2.2 Roadmap

2016-12-12 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-18813:

Description: 
*PROPOSAL: This includes a proposal for the 2.2 roadmap process for MLlib.*
The roadmap process described below is significantly updated since the 2.1 
roadmap [SPARK-15581].  Please refer to [SPARK-15581] for more discussion on 
the basis for this proposal, and comment in this JIRA if you have suggestions 
for improvements.

h1. Roadmap process

This roadmap is a master list for MLlib improvements we are working on during 
this release.  This includes ML-related changes in PySpark and SparkR.

*What is planned for the next release?*
* This roadmap lists issues which at least one Committer has prioritized.  See 
details below in "Instructions for committers."
* This roadmap only lists larger or more critical issues.

*How can contributors influence this roadmap?*
* If you believe an issue should be in this roadmap, please discuss the issue 
on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
least one must agree to shepherd the issue.
* For general discussions, use this JIRA or the dev mailing list.  For specific 
issues, please comment on those issues or the mailing list.

h2. Target Version and Priority

This section describes the meaning of Target Version and Priority.  _These 
meanings have been updated in this proposal for the 2.2 process._

|| Category | Target Version | Priority | Shepherd | Put on roadmap? | In next 
release? ||
| 1 | next release | Blocker | *must* | *must* | *must* |
| 2 | next release | Critical | *must* | yes, unless small | *best effort* |
| 3 | next release | Major | *must* | optional | *best effort* |
| 4 | next release | Minor | optional | no | maybe |
| 5 | next release | Trivial | optional | no | maybe |
| 6 | (empty) | (any) | yes | no | maybe |
| 7 | (empty) | (any) | no | no | maybe |

The *Category* in the table above has the following meaning:

1. A committer has promised to see this issue to completion for the next 
release.  Contributions *will* receive attention.
2-3. A committer has promised to see this issue to completion for the next 
release.  Contributions *will* receive attention.  The issue may slip to the 
next release if development is slower than expected.
4-5. A committer has promised interest in this issue.  Contributions *will* 
receive attention.  The issue may slip to another release.
6. A committer has promised interest in this issue and should respond, but no 
promises are made about priorities or releases.
7. This issue is open for discussion, but it needs a committer to promise 
interest to proceed.

h1. Instructions

h2. For contributors

Getting started
* Please read http://spark.apache.org/contributing.html. Code style, 
documentation, and unit tests are important.
* If you are a first-time contributor, please always start with a small 
[starter task|https://issues.apache.org/jira/issues/?filter=12333209] rather 
than a larger feature.

Coordinating on JIRA
* Never work silently. Let everyone know on the corresponding JIRA page when 
you start work. This is to avoid duplicate work. For small patches, you do not 
need to get the JIRA assigned to you to begin work.
* For medium/large features or features with dependencies, please get assigned 
first before coding and keep the ETA updated on the JIRA. If there is no 
activity on the JIRA page for a certain amount of time, the JIRA should be 
released for other contributors.
* Do not claim multiple (>3) JIRAs at the same time. Try to finish them one 
after another.
* Do not set these fields: Target Version, Fix Version, or Shepherd.  Only 
Committers should set those.

Writing and reviewing PRs
* Remember to add the `@Since("VERSION")` annotation to new public APIs.
* *Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code 
review greatly helps to improve others' code as well as yours.*

h2. For Committers

Adding to this roadmap
* You can update the roadmap by (a) adding issues to this list and (b) setting 
Target Versions.  Only Committers may make these changes.
* *If you add an issue to this roadmap or set a Target Version, you _must_ 
assign yourself or another Committer as Shepherd.*
* This list should be actively managed during the release.
* If you target a significant item for the next release, please list the item 
on this roadmap.
* If you commit to shepherding a new public API, you implicitly commit to 
shepherding the follow-up issues as well (Python/R APIs, docs).

Creating JIRA issues
* Try to break down big features into small and specific JIRA tasks and link 
them properly.
* Add a "starter" label to starter tasks.
* Put a rough time estimate for medium/big features and track the progress.
* Set Priority carefully.  Priority should not be mixed with size of effort for 
implementation.

Managing JIRA issues and PRs
* Please add 

[jira] [Updated] (SPARK-18813) MLlib 2.2 Roadmap

2016-12-09 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18813:
--
Description: 
*PROPOSAL: This includes a proposal for the 2.2 roadmap process for MLlib.*
The roadmap process described below is significantly updated since the 2.1 
roadmap [SPARK-15581].  Please refer to [SPARK-15581] for more discussion on 
the basis for this proposal, and comment in this JIRA if you have suggestions 
for improvements.

h1. Roadmap process

This roadmap is a master list for MLlib improvements we are working on during 
this release.  This includes ML-related changes in PySpark and SparkR.

*What is planned for the next release?*
* This roadmap lists issues which at least one Committer has prioritized.  See 
details below in "Instructions for committers."
* This roadmap only lists larger or more critical issues.

*How can contributors influence this roadmap?*
* If you believe an issue should be in this roadmap, please discuss the issue 
on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
least one must agree to shepherd the issue.
* For general discussions, use this JIRA or the dev mailing list.  For specific 
issues, please comment on those issues or the mailing list.

h2. Target Version and Priority

This section describes the meaning of Target Version and Priority.  _These 
meanings have been updated in this proposal for the 2.2 process._

|| Category | Target Version | Priority | Shepherd | Put on roadmap? | In next 
release? ||
| 1 | next release | Blocker | *must* | *must* | *must* |
| 2 | next release | Critical | *must* | yes, unless small | *best effort* |
| 3 | next release | Major | *must* | optional | *best effort* |
| 4 | next release | Minor | optional | no | maybe |
| 5 | next release | Trivial | optional | no | maybe |
| 6 | (empty) | (any) | yes | no | maybe |
| 7 | (empty) | (any) | no | no | maybe |

The *Category* in the table above has the following meaning:

1. A committer has promised to see this issue to completion for the next 
release.  Contributions *will* receive attention.
2-3. A committer has promised to see this issue to completion for the next 
release.  Contributions *will* receive attention.  The issue may slip to the 
next release if development is slower than expected.
4-5. A committer has promised interest in this issue.  Contributions *will* 
receive attention.  The issue may slip to another release.
6. A committer has promised interest in this issue and should respond, but no 
promises are made about priorities or releases.
7. This issue is open for discussion, but it needs a committer to promise 
interest to proceed.

h1. Instructions

h2. For contributors

Getting started
* Please read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark 
carefully. Code style, documentation, and unit tests are important.
* If you are a first-time contributor, please always start with a small 
[starter task|https://issues.apache.org/jira/issues/?filter=12333209] rather 
than a larger feature.

Coordinating on JIRA
* Never work silently. Let everyone know on the corresponding JIRA page when 
you start work. This is to avoid duplicate work. For small patches, you do not 
need to get the JIRA assigned to you to begin work.
* For medium/large features or features with dependencies, please get assigned 
first before coding and keep the ETA updated on the JIRA. If there is no 
activity on the JIRA page for a certain amount of time, the JIRA should be 
released for other contributors.
* Do not claim multiple (>3) JIRAs at the same time. Try to finish them one 
after another.
* Do not set these fields: Target Version, Fix Version, or Shepherd.  Only 
Committers should set those.

Writing and reviewing PRs
* Remember to add the `@Since("VERSION")` annotation to new public APIs.
* *Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code 
review greatly helps to improve others' code as well as yours.*

h2. For Committers

Adding to this roadmap
* You can update the roadmap by (a) adding issues to this list and (b) setting 
Target Versions.  Only Committers may make these changes.
* *If you add an issue to this roadmap or set a Target Version, you _must_ 
assign yourself or another Committer as Shepherd.*
* This list should be actively managed during the release.
* If you target a significant item for the next release, please list the item 
on this roadmap.
* If you commit to shepherding a new public API, you implicitly commit to 
shepherding the follow-up issues as well (Python/R APIs, docs).

Creating JIRA issues
* Try to break down big features into small and specific JIRA tasks and link 
them properly.
* Add a "starter" label to starter tasks.
* Put a rough time estimate for medium/big features and track the progress.
* Set Priority carefully.  Priority should not be mixed with size of effort for