GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/12370
[SPARK-14599][ML] BaggedPoint should support sample weights.
## What changes were proposed in this pull request?
This PR changes BaggedPoint to store the number of subsamples AND the
sample weight of `Datum`. Specifically:
* `subsampleWeights: Array[Double]` is changed to `subsampleCounts:
Array[Int]`
* A `sampleWeight: Double` field is added to the BaggedPoint constructor
* A function to extract the sample weight from `datum` is added to
`convertToBaggedPointRDD`. This will be helpful when we add weights to decision
trees, so that we can extract the instance weight from the `RDD[Instance]`.
## How was this patch tested?
This PR does not introduce any new functional changes, so there are no
tests added.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sethah/spark SPARK-14599
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12370.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12370
----
commit 383956abadb7f34f131d98109cdb09122f4e30a4
Author: sethah <[email protected]>
Date: 2016-04-13T18:26:58Z
bagged point supports sample weight
commit 6f6c2a178de2c29d6ddfa8e3d57e6684d2401179
Author: sethah <[email protected]>
Date: 2016-04-13T21:42:51Z
added a test
commit a673658c4d2f6121b7854be80ade481ae2c18ddb
Author: sethah <[email protected]>
Date: 2016-04-13T21:49:20Z
removing test and style
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]