GitHub user j4munoz opened a pull request:

    https://github.com/apache/spark/pull/13895

    [MLlib] org.apache.spark.mllib.util.SVMDataGenerator generates 
ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.

    ## What changes were proposed in this pull request?
    
    Just adjust the size of an array in line 58 so it does not cause an 
ArrayOutOfBoundsException in line 66.
    
    ## How was this patch tested?
    
    Manual tests. I have recompiled the entire project with the fix, it has 
been built successfully and I have run the code, also with good results.
    
    line 66: val yD = blas.ddot(trueWeights.length, x, 1, trueWeights, 1) + 
rnd.nextGaussian() * 0.1
    crashes because trueWeights has length "nfeatures + 1" while "x" has length 
"features", and they should have the same length.
    
    To fix this just make trueWeights be the same length as x.
    
    I have recompiled the project with the change and it is working now:
    [spark-1.6.1]$ spark-submit --master local[*] --class 
org.apache.spark.mllib.util.SVMDataGenerator 
mllib/target/spark-mllib_2.11-1.6.1.jar local /home/user/test
    
    And it generates the data successfully now in the specified folder.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/j4munoz/spark patch-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13895.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13895
    
----
commit a5ebe405a4457570b494935550b5fbf0804f6e95
Author: José Antonio <[email protected]>
Date:   2016-06-24T18:58:08Z

    [MLlib] org.apache.spark.mllib.util.SVMDataGenerator generates 
ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.
    
    line 66: val yD = blas.ddot(trueWeights.length, x, 1, trueWeights, 1) + 
rnd.nextGaussian() * 0.1
    crashes because trueWeights has length "nfeatures + 1" while "x" has length 
"features", and they should have the same length.
    
    To fix this just make trueWeights be the same length as x.
    
    I have recompiled the project with the change and it is working now:
    [spark-1.6.1]$ spark-submit --master local[*] --class 
org.apache.spark.mllib.util.SVMDataGenerator 
mllib/target/spark-mllib_2.11-1.6.1.jar local /home/user/test
    
    And it generates the data successfully now in the specified folder.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to