Hi All, I wonder if anyone else has some experience building a Gradient Boosted Trees model using spark/mllib? I have noticed when building decent-size models that the process slows down over time. We observe that the time to build tree n is approximately a constant time longer than the time to build tree n-1 i.e. t(n) = t(n-1) + const. The implication is that the total build time goes as something like N^2, where N is the total number of trees. I would expect that the algorithm should be approximately linear in total time (i.e. each boosting iteration takes roughly the same time to complete).
So I have a couple of questions: 1. Is this behaviour expected, or consistent with what others are seeing? 2. Does anyone know if there a tuning parameters (e.g. in the boosting strategy, or tree stategy) that may be impacting this? All aspects of the build seem to slow down as I go. Here's a random example culled from the logs, from the beginning and end of the model build: 15/02/09 17:22:11 INFO scheduler.DAGScheduler: Job 42 finished: count at DecisionTreeMetadata.scala:111, took 0.077957 s .... 15/02/09 19:44:01 INFO scheduler.DAGScheduler: Job 7954 finished: count at DecisionTreeMetadata.scala:111, took 5.495166 s Any thoughts or advice, or even suggestions on where to dig for more info would be welcome. thanks chris Christopher Thom QUANTIUM Level 25, 8 Chifley, 8-12 Chifley Square Sydney NSW 2000 T: +61 2 8222 3577 F: +61 2 9292 6444 W: quantium.com.au<www.quantium.com.au> ________________________________ linkedin.com/company/quantium<www.linkedin.com/company/quantium> facebook.com/QuantiumAustralia<www.facebook.com/QuantiumAustralia> twitter.com/QuantiumAU<www.twitter.com/QuantiumAU> The contents of this email, including attachments, may be confidential information. If you are not the intended recipient, any use, disclosure or copying of the information is unauthorised. If you have received this email in error, we would be grateful if you would notify us immediately by email reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete the message from your system. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org