subject:"Re\: \[Scikit\-learn\-general\] Distributed RandomForests"

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-29 Thread Youssef Barhomi

Thank you Andreas! On Sat, Apr 27, 2013 at 2:03 PM, Andreas Mueller wrote: > Hi Youssef. > I would strongly advise you to use a image specific random forest > implementation. > There is a very good implementation by some other MSRC people: > > http://research.microsoft.com/en-us/downloads/03e0c

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-27 Thread Andreas Mueller

Hi Youssef. I would strongly advise you to use a image specific random forest implementation. There is a very good implementation by some other MSRC people: http://research.microsoft.com/en-us/downloads/03e0ca05-8aa9-49f6-801f-bb23846dc147/ It implements a much more complicated model, decision t

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-26 Thread Youssef Barhomi

Thank you Peter, I found that the feature extraction was taking a lot of extra memory and that was not related to wiseRF, so you were right. Actually, from "top" it seems the training part was taking only an extra 20% of memory than the size of the dataset itself, wich is pretty impressive. So at t

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-25 Thread Ronnie Ghose

I've tried larger data sets. It wasn't pretty, much fewer features though On Apr 25, 2013 4:03 AM, "Peter Prettenhofer" wrote: > Hi Youssef, > > please make sure that you use the latest version of sklearn (>= 0.13) - we > did some enhancements to the sub-sampling procedure lately. > > Looking at

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-25 Thread Peter Prettenhofer

2013/4/25 Youssef Barhomi > > thank you very much Peter, > > you are right about the n_jobs, something was going wrong with that. When > n_jobs = -1, for larger dataset (1E6 for this case), no cpu was being used > and the process was hanging for a while. getting n_jobs = 1 made everything > work.

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-25 Thread Youssef Barhomi

Hi Brian, thanks for your feedback. were you able to reproduce their results? how big was your dataset that you have processed so far with an RF? the MS people have used a distributed RF, so yes, the features I am guessing were being computed in parallel on all these cores. Though, I am still new t

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-25 Thread Youssef Barhomi

ohh makes total sense now!! thank you Gilles!! Y On Thu, Apr 25, 2013 at 2:38 AM, Gilles Louppe wrote: > Hi Youssef, > > Regarding memory usage, you should know that it'll basically blow up if > you increase the number of jobs. With the current implementation, you'll > need O(n_jobs * |X| * 2)

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-25 Thread Youssef Barhomi

thank you very much Peter, you are right about the n_jobs, something was going wrong with that. When n_jobs = -1, for larger dataset (1E6 for this case), no cpu was being used and the process was hanging for a while. getting n_jobs = 1 made everything work. yes, I will look into the iPython parall

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-25 Thread Peter Prettenhofer

Hi Youssef, please make sure that you use the latest version of sklearn (>= 0.13) - we did some enhancements to the sub-sampling procedure lately. Looking at the RandomForest code - it seems that the jobs=-1 should not be the issue for the parallel training of the trees since ``n_jobs = min(cpu_c

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-24 Thread Gilles Louppe

Hi Youssef, Regarding memory usage, you should know that it'll basically blow up if you increase the number of jobs. With the current implementation, you'll need O(n_jobs * |X| * 2) in memory space (where |X| is the size of X, in bytes). That issue stems from the use of joblib which basically forc

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-24 Thread Brian Holt

Hi Youssef, You're trying to do exactly what I did. First thing to note is that the Microsoft guys don't precompute the features, rather they compute them on the fly. That means that they only need enough memory to store the depth images, and since they have a 1000 core cluster, computing the feat

Re: [Scikit-learn-general] Distributed RandomForests

Re: [Scikit-learn-general] Distributed RandomForests

Re: [Scikit-learn-general] Distributed RandomForests

Re: [Scikit-learn-general] Distributed RandomForests

Re: [Scikit-learn-general] Distributed RandomForests

Re: [Scikit-learn-general] Distributed RandomForests

Re: [Scikit-learn-general] Distributed RandomForests

Re: [Scikit-learn-general] Distributed RandomForests

Re: [Scikit-learn-general] Distributed RandomForests

Re: [Scikit-learn-general] Distributed RandomForests

Re: [Scikit-learn-general] Distributed RandomForests

11 matches

Site Navigation

Mail list logo

Footer information