No, each subset contains all the features, but only a subset of data rows. On Thu, Jul 5, 2012 at 7:03 AM, Nowal, Akshay <[email protected]>wrote:
> Hey thanks for quick response. > > If I am understanding u properly, " every tree grown is trained on the > whole dataset" means that all the features/variables are used for building > the trees where as in partial we take a subset of the features/variables ?? > Kindly correct me if I m wrong > > Thanks again > > Regards, > Akshay Nowal > > | > > -----Original Message----- > From: deneche abdelhakim [mailto:[email protected]] > Sent: Thursday, July 05, 2012 11:23 AM > To: [email protected] > Subject: Re: Difference when we don't use partial implementation > > Hi Akshay, > > when you don't use the "-p" parameter, the builder loads the whole dataset > in memory in every computing node, so every tree grown is trained on the > whole dataset (of course using bagging to select a subset of it). When > using "-p", every computing node loads a part of the dataset (thus the name > "partial") so the trees are trained on parts of the dataset. The training > algorithm is the same in both implementations, and the partial > implementation is used when the dataset is too big to fit in memory. > > On Thu, Jul 5, 2012 at 4:38 AM, Nowal, Akshay <[email protected] > >wrote: > > > Hi All, > > > > > > > > I am running Decision forest in Mahout, below are the commands that I > > have used to implement the algo: > > > > > > > > Info file: > > > > mahout org.apache.mahout.df.tools.Describe -p > > /user/an32665/KDD/KDDTrain+.arff -f /user/an32665/KDD/KDDTrain+.info -d > > N 3 C 2 N C 4 N C 8 N 2 C 19 N L > > > > Building Forest: > > > > mahout org.apache.mahout.df.mapreduce.BuildForest > > -Dmapred.max.split.size=1874231 -oob -d /user/an32665/KDD/KDDTrain+.arff > > -ds /user/an32665/KDD/KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest > > > > Testing Forest: > > > > mahout org.apache.mahout.df.mapreduce.TestForest -i > > /user/an32665/KDD/KDDTest+.arff -ds /user/an32665/KDD/KDDTrain+.info -m > > nsl-forest -a -mr -o predictions > > > > > > > > So while building the forest we use "-P" for implementing partial > > implementation. I just wanted to know the difference in algorithm when > > we use "-p" and when we don't use "-p". > > > > > > > > > > > > Regards, > > > > Akshay Nowal > > > > > > > > >
