Hi All,

I wonder if anyone else has some experience building a Gradient Boosted Trees 
model using spark/mllib? I have noticed when building decent-size models that 
the process slows down over time. We observe that the time to build tree n is 
approximately a constant time longer than the time to build tree n-1 i.e. t(n) 
= t(n-1) + const. The implication is that the total build time goes as 
something like N^2, where N is the total number of trees. I would expect that 
the algorithm should be approximately linear in total time (i.e. each boosting 
iteration takes roughly the same time to complete).

So I have a couple of questions:
1. Is this behaviour expected, or consistent with what others are seeing?
2. Does anyone know if there a tuning parameters (e.g. in the boosting 
strategy, or tree stategy) that may be impacting this?

All aspects of the build seem to slow down as I go. Here's a random example 
culled from the logs, from the beginning and end of the model build:

15/02/09 17:22:11 INFO scheduler.DAGScheduler: Job 42 finished: count at 
DecisionTreeMetadata.scala:111, took 0.077957 s
....
15/02/09 19:44:01 INFO scheduler.DAGScheduler: Job 7954 finished: count at 
DecisionTreeMetadata.scala:111, took 5.495166 s

Any thoughts or advice, or even suggestions on where to dig for more info would 
be welcome.

thanks
chris

Christopher Thom

QUANTIUM
Level 25, 8 Chifley, 8-12 Chifley Square
Sydney NSW 2000

T: +61 2 8222 3577
F: +61 2 9292 6444

W: quantium.com.au<www.quantium.com.au>

________________________________

linkedin.com/company/quantium<www.linkedin.com/company/quantium>

facebook.com/QuantiumAustralia<www.facebook.com/QuantiumAustralia>

twitter.com/QuantiumAU<www.twitter.com/QuantiumAU>


The contents of this email, including attachments, may be confidential 
information. If you are not the intended recipient, any use, disclosure or 
copying of the information is unauthorised. If you have received this email in 
error, we would be grateful if you would notify us immediately by email reply, 
phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete the message from 
your system.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to