Re: Difference when we don't use partial implementation

deneche abdelhakim Wed, 04 Jul 2012 23:20:03 -0700

No, each subset contains all the features, but only a subset of data rows.

On Thu, Jul 5, 2012 at 7:03 AM, Nowal, Akshay <[email protected]>wrote:


> Hey thanks for quick response.
>
> If I am understanding u properly, " every tree grown is trained on the
> whole dataset" means that all the features/variables are used for building
> the trees where as in partial we take a subset of the features/variables ??
> Kindly correct me if I m wrong
>
> Thanks again
>
> Regards,
> Akshay Nowal
>
>  |
>
> -----Original Message-----
> From: deneche abdelhakim [mailto:[email protected]]
> Sent: Thursday, July 05, 2012 11:23 AM
> To: [email protected]
> Subject: Re: Difference when we don't use partial implementation
>
> Hi Akshay,
>
> when you don't use the "-p" parameter, the builder loads the whole dataset
> in memory in every computing node, so every tree grown is trained on the
> whole dataset (of course using bagging to select a subset of it). When
> using "-p", every computing node loads a part of the dataset (thus the name
> "partial") so the trees are trained on parts of the dataset. The training
> algorithm is the same in both implementations, and the partial
> implementation is used when the dataset is too big to fit in memory.
>
> On Thu, Jul 5, 2012 at 4:38 AM, Nowal, Akshay <[email protected]
> >wrote:
>
> > Hi All,
> >
> >
> >
> > I am running Decision forest in Mahout, below are the commands that I
> > have used to implement the algo:
> >
> >
> >
> > Info file:
> >
> > mahout org.apache.mahout.df.tools.Describe -p
> > /user/an32665/KDD/KDDTrain+.arff -f /user/an32665/KDD/KDDTrain+.info -d
> > N 3 C 2 N C 4 N C 8 N 2 C 19 N L
> >
> > Building Forest:
> >
> > mahout org.apache.mahout.df.mapreduce.BuildForest
> > -Dmapred.max.split.size=1874231 -oob -d /user/an32665/KDD/KDDTrain+.arff
> > -ds /user/an32665/KDD/KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest
> >
> > Testing Forest:
> >
> > mahout org.apache.mahout.df.mapreduce.TestForest -i
> > /user/an32665/KDD/KDDTest+.arff -ds /user/an32665/KDD/KDDTrain+.info -m
> > nsl-forest -a -mr -o predictions
> >
> >
> >
> > So while building the forest we use "-P" for implementing partial
> > implementation. I just wanted to know the difference in algorithm when
> > we use "-p" and when we don't use "-p".
> >
> >
> >
> >
> >
> > Regards,
> >
> > Akshay Nowal
> >
> >
> >
> >
>

Re: Difference when we don't use partial implementation

Reply via email to