Could you check the Spark UI and see whether there are RDDs being
kicked out during the computation? We cache the residual RDD after
each iteration. If we don't have enough memory/disk, it gets
recomputed and results something like `t(n) = t(n-1) + const`. We
might cache the features multiple times, which could be improved.
-Xiangrui

On Sun, Feb 8, 2015 at 5:32 PM, Christopher Thom
<christopher.t...@quantium.com.au> wrote:
> Hi All,
>
> I wonder if anyone else has some experience building a Gradient Boosted Trees 
> model using spark/mllib? I have noticed when building decent-size models that 
> the process slows down over time. We observe that the time to build tree n is 
> approximately a constant time longer than the time to build tree n-1 i.e. 
> t(n) = t(n-1) + const. The implication is that the total build time goes as 
> something like N^2, where N is the total number of trees. I would expect that 
> the algorithm should be approximately linear in total time (i.e. each 
> boosting iteration takes roughly the same time to complete).
>
> So I have a couple of questions:
> 1. Is this behaviour expected, or consistent with what others are seeing?
> 2. Does anyone know if there a tuning parameters (e.g. in the boosting 
> strategy, or tree stategy) that may be impacting this?
>
> All aspects of the build seem to slow down as I go. Here's a random example 
> culled from the logs, from the beginning and end of the model build:
>
> 15/02/09 17:22:11 INFO scheduler.DAGScheduler: Job 42 finished: count at 
> DecisionTreeMetadata.scala:111, took 0.077957 s
> ....
> 15/02/09 19:44:01 INFO scheduler.DAGScheduler: Job 7954 finished: count at 
> DecisionTreeMetadata.scala:111, took 5.495166 s
>
> Any thoughts or advice, or even suggestions on where to dig for more info 
> would be welcome.
>
> thanks
> chris
>
> Christopher Thom
>
> QUANTIUM
> Level 25, 8 Chifley, 8-12 Chifley Square
> Sydney NSW 2000
>
> T: +61 2 8222 3577
> F: +61 2 9292 6444
>
> W: quantium.com.au<www.quantium.com.au>
>
> ________________________________
>
> linkedin.com/company/quantium<www.linkedin.com/company/quantium>
>
> facebook.com/QuantiumAustralia<www.facebook.com/QuantiumAustralia>
>
> twitter.com/QuantiumAU<www.twitter.com/QuantiumAU>
>
>
> The contents of this email, including attachments, may be confidential 
> information. If you are not the intended recipient, any use, disclosure or 
> copying of the information is unauthorised. If you have received this email 
> in error, we would be grateful if you would notify us immediately by email 
> reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete the 
> message from your system.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to