Glad it worked.

On Tue, Jan 22, 2013 at 7:20 AM, Ranjitha Chandrashekar <[email protected]
> wrote:

> Hi Deneche,
>
> The patch is working perfect! The classifier now output the label string
> values instead of the
> numerical codes.
>
> Thank you for fixing the issue.
>
> Regards
> Ranjitha
>
> -----Original Message-----
> From: deneche abdelhakim [mailto:[email protected]]
> Sent: 18 January 2013 21:14
> To: [email protected]
> Subject: Re: Issue with Partial Implementation Problem
>
> I submitted a patch, you can give it a try and let me know if it fixes the
> problem.
>
> https://issues.apache.org/jira/browse/MAHOUT-1143
>
> The classifier should now output the label string values instead of the
> numerical codes.
>
>
> On Fri, Jan 18, 2013 at 4:26 PM, deneche abdelhakim <[email protected]
> >wrote:
>
> > Hi Ranjitha,
> >
> > I created a JIRA issue to fix this, and should submit a patch soon.
> >
> >
> > On Fri, Jan 18, 2013 at 10:29 AM, Ranjitha Chandrashekar <
> > [email protected]> wrote:
> >
> >> Hi Deneche,
> >>
> >> Thanks. As suggested, I replaced the label value as "normal" in KDDTest
> >> dataset and tested the forest without -a option.
> >> It generates a binary file(.out file) with values 0 and 1.
> >>
> >> In order to interpret this I have gone through the code and hence
> >> understand that MR job (Classifier.CMapper) generates a file with Key ->
> >> Correct Label and Value -> Prediction. Then it creates a new file with
> .out
> >> extension which only contains Values i.e. Prediction(0 or 1) in my case
> and
> >> then it deletes the previous file generated by the MR job. Hence I do
> not
> >> have access to the file generated by MR job which contains Correct Label
> >> and Prediction for each input Test record
> >>
> >> After looking at these predictions I am not sure what 0 and 1 actually
> >> means . Does 1 mean its classified correctly..? "normal" in this case
> and 0
> >> means the classification is wrong and should be "anamoly"?
> >>
> >> Please Suggest
> >>
> >> Regards
> >> Ranjitha
> >>
> >> -----Original Message-----
> >> From: deneche abdelhakim [mailto:[email protected]]
> >> Sent: 18 January 2013 12:21
> >> To: [email protected]
> >> Subject: Re: Issue with Partial Implementation Problem
> >>
> >> My mistake. You should put any label value available in the training
> set.
> >> In the previous example, putting "normal" in all test record should be
> >> fine.
> >>
> >>
> >> On Fri, Jan 18, 2013 at 7:26 AM, Ranjitha Chandrashekar <
> >> [email protected]
> >> > wrote:
> >>
> >> > Hi Deneche
> >> >
> >> > Thank you for your quick response.
> >> >
> >> > I tried using the numerical value in the label attribute in the test
> >> data.
> >> >
> >> > Original Record in KDDTest :
> >> >
> >>
> 13,tcp,telnet,SF,118,2425,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,26,10,0.38,0.12,0.04,0.00,0.00,0.00,0.12,0.30,normal
> >> >
> >> > Replaced Record :
> >> >
> >> >
> >>
> 13,tcp,telnet,SF,118,2425,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,26,10,0.38,0.12,0.04,0.00,0.00,0.00,0.12,0.30,1
> >> >
> >> > (normal class replaced with numerical value 1)
> >> >
> >> > Ran TestForest on KDDTest dataset. Following is the error that i get.
> >> > Sequential and map reduce classification gives the same error.
> >> >
> >> > Command --> hadoop jar
> >> > /usr/lib/mahout-0.5/mahout-examples-0.5-cdh3u5-job.jar
> >> > org.apache.mahout.df.mapreduce.TestForest -i
> >> > /user/ranjitha/input/KDDTest+.arff.txt_withnum -ds
> >> > /user/ranjitha/input/KDDTrain+.info -m /user/ranjitha/KDDForest -o
> >> > /user/ranjitha/KDDResult
> >> >
> >> > 13/01/18 11:29:24 INFO mapreduce.TestForest: Loading the forest...
> >> > 13/01/18 11:29:24 INFO mapreduce.TestForest: Sequential
> >> classification...
> >> > 13/01/18 11:29:24 ERROR data.DataConverter: label token: 1
> >> dataset.labels:
> >> > [normal, anomaly] Exception in thread "main"
> >> > java.lang.IllegalStateException: Label value (1) not known
> >> >         at
> >> > org.apache.mahout.df.data.DataConverter.convert(DataConverter.java:71)
> >> >         at
> >> >
> org.apache.mahout.df.mapreduce.TestForest.testFile(TestForest.java:256)
> >> >         at
> >> >
> >>
> org.apache.mahout.df.mapreduce.TestForest.sequential(TestForest.java:216)
> >> >         at
> >> >
> >>
> org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:172)
> >> >         at
> >> > org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:142)
> >> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >> >         at
> >> > org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:275)
> >> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >         at
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >> >         at
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> >         at java.lang.reflect.Method.invoke(Method.java:616)
> >> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >> >
> >> > Looking forward to your reply
> >> >
> >> > Thanks
> >> > Ranjitha.
> >> >
> >> > -----Original Message-----
> >> > From: deneche abdelhakim [mailto:[email protected]]
> >> > Sent: 17 January 2013 18:20
> >> > To: [email protected]
> >> > Subject: Re: Issue with Partial Implementation Problem
> >> >
> >> > Hi Ranjitha,
> >> >
> >> > just put any numerical value in the label attribute. You should be
> able
> >> to
> >> > classify the data, but you won't be able to compute the confusion
> >> matrix or
> >> > the accuracy.
> >> >
> >> >
> >> > On Thu, Jan 17, 2013 at 12:15 PM, Ranjitha Chandrashekar <
> >> > [email protected]> wrote:
> >> >
> >> > > Hi
> >> > >
> >> > > I am using Partial Implementation for Random Forest classification.
> >> > >
> >> > > I have a training dataset with labels class0, class 1, class 2.  The
> >> > > decision forest is built on this training dataset.  The
> classification
> >> > for
> >> > > the test dataset is computed using the same data descriptor
> generated
> >> for
> >> > > the training dataset.  I am able to generate confusion matrix,
> >> accuracy
> >> > > details with the test data set with class variable.
> >> > >
> >> > > However I also need to make a classification for a scenario, where
> >> test
> >> > > data may not have the class variable or class values are not known.
> >>  For
> >> > > ex, assume test data is about future data points, for which class
> >> values
> >> > > will have to be computed only in the future.
> >> > >
> >> > >
> >> > > *         How is it possible to classify the test data set, where
> the
> >> > > class label is not defined or not known. I have tried using default
> >> > labels
> >> > > like "unknown", "NO_LABEL". It doesnt seem to work.
> >> > >
> >> > >
> >> > > *         How to set the class label as "unknown" in the testing
> >> dataset.
> >> > >
> >> > > Looking forward to your reply,
> >> > >
> >> > > Thanks
> >> > > Ranjitha.
> >> > >
> >> > >
> >> > >
> >> > > ::DISCLAIMER::
> >> > >
> >> > >
> >> >
> >>
> ----------------------------------------------------------------------------------------------------------------------------------------------------
> >> > >
> >> > > The contents of this e-mail and any attachment(s) are confidential
> and
> >> > > intended for the named recipient(s) only.
> >> > > E-mail transmission is not guaranteed to be secure or error-free as
> >> > > information could be intercepted, corrupted,
> >> > > lost, destroyed, arrive late or incomplete, or may contain viruses
> in
> >> > > transmission. The e mail and its contents
> >> > > (with or without referred errors) shall therefore not attach any
> >> > liability
> >> > > on the originator or HCL or its affiliates.
> >> > > Views or opinions, if any, presented in this email are solely those
> of
> >> > the
> >> > > author and may not necessarily reflect the
> >> > > views or opinions of HCL or its affiliates. Any form of
> reproduction,
> >> > > dissemination, copying, disclosure, modification,
> >> > > distribution and / or publication of this message without the prior
> >> > > written consent of authorized representative of
> >> > > HCL is strictly prohibited. If you have received this email in error
> >> > > please delete it and notify the sender immediately.
> >> > > Before opening any email and/or attachments, please check them for
> >> > viruses
> >> > > and other defects.
> >> > >
> >> > >
> >> > >
> >> >
> >>
> ----------------------------------------------------------------------------------------------------------------------------------------------------
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to