GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/13619

    [SPARK-15892][ML] Incorrectly merged AFTAggregator with zero total count

    ## What changes were proposed in this pull request?
    
    Currently, `AFTAggregator` is not being merged correctly. For example, if 
there is any single empty partition in the data, this creates an 
`AFTAggregator` with zero total count which causes the exception below:
    
    ```
    IllegalArgumentException: u'requirement failed: The number of instances 
should be greater than 0.0, but got 0.'
    ```
    
    Please see 
[AFTSurvivalRegression.scala#L573-L575](https://github.com/apache/spark/blob/6ecedf39b44c9acd58cdddf1a31cf11e8e24428c/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala#L573-L575)
 as well.
    
    
    Just to be clear, the python example `aft_survival_regression.py` seems 
using 5 rows. So, if there exist partitions more than 5, it throws the 
exception above since it contains empty partitions which results in an 
incorrectly merged `AFTAggregator`.
    
    Executing `bin/spark-submit 
examples/src/main/python/ml/aft_survival_regression.py` on a machine with CPUs 
more than 5 is being failed because it creates tasks with some empty partitions.
    
    ## How was this patch tested?
    
    Manually tested by `bin/spark-submit 
examples/src/main/python/ml/aft_survival_regression.py`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-15892

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13619.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13619
    
----
commit fb16b71a96ef55541207b77c9bb9bc49d0a85243
Author: hyukjinkwon <[email protected]>
Date:   2016-06-11T16:40:04Z

    Fix incorrect comparison

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to