GitHub user smurching opened a pull request:
https://github.com/apache/spark/pull/13881
[SPARK-3723] [MLlib] Adding instrumentation to random forests
## What changes were proposed in this pull request?
In RandomForest.run(), added instrumentation for the number of node groups,
along with the min, max, and average number of nodes per group.
Also fixed a typo in BaggedPoint.scala documentation.
## How was this patch tested?
Tested by running RandomForestClassifierSuite, checking the test output
manually to make sure instrumentation information was present and reasonable.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/smurching/spark random-forest-instrumentation
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13881.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13881
----
commit 8f45533b9a5f7c3c1f46d0d15a9f1815fa6227d5
Author: Siddharth Murching <[email protected]>
Date: 2016-06-23T23:40:26Z
Fix typo in BaggedPoint.scala, add simple instrumentation to Random Forests
commit bd7d24d4f5a79eca6ff9629706c254beba74bc45
Author: Siddharth Murching <[email protected]>
Date: 2016-06-24T00:40:02Z
Reorder instrumentation logging statements to look nicer
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]