[GitHub] spark pull request: [SPARK-14843][ML] Fix encoding error in LibSVM...

viirya Fri, 22 Apr 2016 06:51:06 -0700

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/12611


    [SPARK-14843][ML] Fix encoding error in LibSVMRelation

    ## What changes were proposed in this pull request?
    
    We use `RowEncoder` in libsvm data source to serialize the label and 
features read from libsvm files. However, the schema passed in this encoder is 
not correct. As the result, we can't correctly select `features` column from 
the DataFrame. We should use full data schema instead of `requiredSchema` to 
serialize the data read in. Then do projection to select required columns later.
    
    ## How was this patch tested?
    `LibSVMRelationSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 fix-libsvm

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12611.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12611
    
----
commit 1ceed49861e992693f5812cc1f14270a17a9694e
Author: Liang-Chi Hsieh <[email protected]>
Date:   2016-04-22T13:38:44Z

    Use correct schema for RowEncoder.

commit 5777ee5b6bd1016d652e55394a387fc728accba0
Author: Liang-Chi Hsieh <[email protected]>
Date:   2016-04-22T13:48:45Z

    Add test.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-14843][ML] Fix encoding error in LibSVM...

Reply via email to