Hi Deneche,

Thanks. As suggested, I replaced the label value as "normal" in KDDTest dataset 
and tested the forest without -a option.
It generates a binary file(.out file) with values 0 and 1.

In order to interpret this I have gone through the code and hence understand 
that MR job (Classifier.CMapper) generates a file with Key -> Correct Label and 
Value -> Prediction. Then it creates a new file with .out extension which only 
contains Values i.e. Prediction(0 or 1) in my case and then it deletes the 
previous file generated by the MR job. Hence I do not have access to the file 
generated by MR job which contains Correct Label and Prediction for each input 
Test record

After looking at these predictions I am not sure what 0 and 1 actually means . 
Does 1 mean its classified correctly..? "normal" in this case and 0 means the 
classification is wrong and should be "anamoly"?

Please Suggest

Regards
Ranjitha

-----Original Message-----
From: deneche abdelhakim [mailto:[email protected]] 
Sent: 18 January 2013 12:21
To: [email protected]
Subject: Re: Issue with Partial Implementation Problem

My mistake. You should put any label value available in the training set.
In the previous example, putting "normal" in all test record should be fine.


On Fri, Jan 18, 2013 at 7:26 AM, Ranjitha Chandrashekar <[email protected]
> wrote:

> Hi Deneche
>
> Thank you for your quick response.
>
> I tried using the numerical value in the label attribute in the test data.
>
> Original Record in KDDTest :
> 13,tcp,telnet,SF,118,2425,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,26,10,0.38,0.12,0.04,0.00,0.00,0.00,0.12,0.30,normal
>
> Replaced Record :
>
> 13,tcp,telnet,SF,118,2425,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,26,10,0.38,0.12,0.04,0.00,0.00,0.00,0.12,0.30,1
>
> (normal class replaced with numerical value 1)
>
> Ran TestForest on KDDTest dataset. Following is the error that i get.
> Sequential and map reduce classification gives the same error.
>
> Command --> hadoop jar
> /usr/lib/mahout-0.5/mahout-examples-0.5-cdh3u5-job.jar
> org.apache.mahout.df.mapreduce.TestForest -i
> /user/ranjitha/input/KDDTest+.arff.txt_withnum -ds
> /user/ranjitha/input/KDDTrain+.info -m /user/ranjitha/KDDForest -o
> /user/ranjitha/KDDResult
>
> 13/01/18 11:29:24 INFO mapreduce.TestForest: Loading the forest...
> 13/01/18 11:29:24 INFO mapreduce.TestForest: Sequential classification...
> 13/01/18 11:29:24 ERROR data.DataConverter: label token: 1 dataset.labels:
> [normal, anomaly] Exception in thread "main"
> java.lang.IllegalStateException: Label value (1) not known
>         at
> org.apache.mahout.df.data.DataConverter.convert(DataConverter.java:71)
>         at
> org.apache.mahout.df.mapreduce.TestForest.testFile(TestForest.java:256)
>         at
> org.apache.mahout.df.mapreduce.TestForest.sequential(TestForest.java:216)
>         at
> org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:172)
>         at
> org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:142)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at
> org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:275)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:616)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> Looking forward to your reply
>
> Thanks
> Ranjitha.
>
> -----Original Message-----
> From: deneche abdelhakim [mailto:[email protected]]
> Sent: 17 January 2013 18:20
> To: [email protected]
> Subject: Re: Issue with Partial Implementation Problem
>
> Hi Ranjitha,
>
> just put any numerical value in the label attribute. You should be able to
> classify the data, but you won't be able to compute the confusion matrix or
> the accuracy.
>
>
> On Thu, Jan 17, 2013 at 12:15 PM, Ranjitha Chandrashekar <
> [email protected]> wrote:
>
> > Hi
> >
> > I am using Partial Implementation for Random Forest classification.
> >
> > I have a training dataset with labels class0, class 1, class 2.  The
> > decision forest is built on this training dataset.  The classification
> for
> > the test dataset is computed using the same data descriptor generated for
> > the training dataset.  I am able to generate confusion matrix, accuracy
> > details with the test data set with class variable.
> >
> > However I also need to make a classification for a scenario, where test
> > data may not have the class variable or class values are not known.  For
> > ex, assume test data is about future data points, for which class values
> > will have to be computed only in the future.
> >
> >
> > *         How is it possible to classify the test data set, where the
> > class label is not defined or not known. I have tried using default
> labels
> > like "unknown", "NO_LABEL". It doesnt seem to work.
> >
> >
> > *         How to set the class label as "unknown" in the testing dataset.
> >
> > Looking forward to your reply,
> >
> > Thanks
> > Ranjitha.
> >
> >
> >
> > ::DISCLAIMER::
> >
> >
> ----------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > The contents of this e-mail and any attachment(s) are confidential and
> > intended for the named recipient(s) only.
> > E-mail transmission is not guaranteed to be secure or error-free as
> > information could be intercepted, corrupted,
> > lost, destroyed, arrive late or incomplete, or may contain viruses in
> > transmission. The e mail and its contents
> > (with or without referred errors) shall therefore not attach any
> liability
> > on the originator or HCL or its affiliates.
> > Views or opinions, if any, presented in this email are solely those of
> the
> > author and may not necessarily reflect the
> > views or opinions of HCL or its affiliates. Any form of reproduction,
> > dissemination, copying, disclosure, modification,
> > distribution and / or publication of this message without the prior
> > written consent of authorized representative of
> > HCL is strictly prohibited. If you have received this email in error
> > please delete it and notify the sender immediately.
> > Before opening any email and/or attachments, please check them for
> viruses
> > and other defects.
> >
> >
> >
> ----------------------------------------------------------------------------------------------------------------------------------------------------
> >
>

Reply via email to