GitHub user mhmoudr opened a pull request: https://github.com/apache/spark/pull/13588
SPARK-15858: Fix calculating error by tree stack over flow problem an⦠## What changes were proposed in this pull request? Improving evaluateEachIteration function in mllib as it fails when trying to calculate error by tree for a model that has more than 500 trees ## How was this patch tested? the batch tested on productions data set (2K rows x 2K features) training a gradient boosted model without validation with 1000 maxIteration settings, then trying to produce the error by tree, the new patch was able to perform the calculation within 30 seconds, while previously it was take hours then fail. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mhmoudr/spark SPARK-15858.1.6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13588.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13588 ---- commit 4726937bacd6ee43dd12b27e1746bc708e99c6da Author: Mahmoud Rawas <mhmo...@gmail.com> Date: 2016-06-10T01:27:21Z SPARK-15858: Fix calculating error by tree stack over flow problem and over memory allocation issue for a model that have 2000+ trees. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org