I submitted a patch, you can give it a try and let me know if it fixes the
problem.

https://issues.apache.org/jira/browse/MAHOUT-1143

The classifier should now output the label string values instead of the
numerical codes.


On Fri, Jan 18, 2013 at 4:26 PM, deneche abdelhakim <[email protected]>wrote:

> Hi Ranjitha,
>
> I created a JIRA issue to fix this, and should submit a patch soon.
>
>
> On Fri, Jan 18, 2013 at 10:29 AM, Ranjitha Chandrashekar <
> [email protected]> wrote:
>
>> Hi Deneche,
>>
>> Thanks. As suggested, I replaced the label value as "normal" in KDDTest
>> dataset and tested the forest without -a option.
>> It generates a binary file(.out file) with values 0 and 1.
>>
>> In order to interpret this I have gone through the code and hence
>> understand that MR job (Classifier.CMapper) generates a file with Key ->
>> Correct Label and Value -> Prediction. Then it creates a new file with .out
>> extension which only contains Values i.e. Prediction(0 or 1) in my case and
>> then it deletes the previous file generated by the MR job. Hence I do not
>> have access to the file generated by MR job which contains Correct Label
>> and Prediction for each input Test record
>>
>> After looking at these predictions I am not sure what 0 and 1 actually
>> means . Does 1 mean its classified correctly..? "normal" in this case and 0
>> means the classification is wrong and should be "anamoly"?
>>
>> Please Suggest
>>
>> Regards
>> Ranjitha
>>
>> -----Original Message-----
>> From: deneche abdelhakim [mailto:[email protected]]
>> Sent: 18 January 2013 12:21
>> To: [email protected]
>> Subject: Re: Issue with Partial Implementation Problem
>>
>> My mistake. You should put any label value available in the training set.
>> In the previous example, putting "normal" in all test record should be
>> fine.
>>
>>
>> On Fri, Jan 18, 2013 at 7:26 AM, Ranjitha Chandrashekar <
>> [email protected]
>> > wrote:
>>
>> > Hi Deneche
>> >
>> > Thank you for your quick response.
>> >
>> > I tried using the numerical value in the label attribute in the test
>> data.
>> >
>> > Original Record in KDDTest :
>> >
>> 13,tcp,telnet,SF,118,2425,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,26,10,0.38,0.12,0.04,0.00,0.00,0.00,0.12,0.30,normal
>> >
>> > Replaced Record :
>> >
>> >
>> 13,tcp,telnet,SF,118,2425,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,26,10,0.38,0.12,0.04,0.00,0.00,0.00,0.12,0.30,1
>> >
>> > (normal class replaced with numerical value 1)
>> >
>> > Ran TestForest on KDDTest dataset. Following is the error that i get.
>> > Sequential and map reduce classification gives the same error.
>> >
>> > Command --> hadoop jar
>> > /usr/lib/mahout-0.5/mahout-examples-0.5-cdh3u5-job.jar
>> > org.apache.mahout.df.mapreduce.TestForest -i
>> > /user/ranjitha/input/KDDTest+.arff.txt_withnum -ds
>> > /user/ranjitha/input/KDDTrain+.info -m /user/ranjitha/KDDForest -o
>> > /user/ranjitha/KDDResult
>> >
>> > 13/01/18 11:29:24 INFO mapreduce.TestForest: Loading the forest...
>> > 13/01/18 11:29:24 INFO mapreduce.TestForest: Sequential
>> classification...
>> > 13/01/18 11:29:24 ERROR data.DataConverter: label token: 1
>> dataset.labels:
>> > [normal, anomaly] Exception in thread "main"
>> > java.lang.IllegalStateException: Label value (1) not known
>> >         at
>> > org.apache.mahout.df.data.DataConverter.convert(DataConverter.java:71)
>> >         at
>> > org.apache.mahout.df.mapreduce.TestForest.testFile(TestForest.java:256)
>> >         at
>> >
>> org.apache.mahout.df.mapreduce.TestForest.sequential(TestForest.java:216)
>> >         at
>> >
>> org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:172)
>> >         at
>> > org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:142)
>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >         at
>> > org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:275)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >         at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> >         at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >         at java.lang.reflect.Method.invoke(Method.java:616)
>> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> >
>> > Looking forward to your reply
>> >
>> > Thanks
>> > Ranjitha.
>> >
>> > -----Original Message-----
>> > From: deneche abdelhakim [mailto:[email protected]]
>> > Sent: 17 January 2013 18:20
>> > To: [email protected]
>> > Subject: Re: Issue with Partial Implementation Problem
>> >
>> > Hi Ranjitha,
>> >
>> > just put any numerical value in the label attribute. You should be able
>> to
>> > classify the data, but you won't be able to compute the confusion
>> matrix or
>> > the accuracy.
>> >
>> >
>> > On Thu, Jan 17, 2013 at 12:15 PM, Ranjitha Chandrashekar <
>> > [email protected]> wrote:
>> >
>> > > Hi
>> > >
>> > > I am using Partial Implementation for Random Forest classification.
>> > >
>> > > I have a training dataset with labels class0, class 1, class 2.  The
>> > > decision forest is built on this training dataset.  The classification
>> > for
>> > > the test dataset is computed using the same data descriptor generated
>> for
>> > > the training dataset.  I am able to generate confusion matrix,
>> accuracy
>> > > details with the test data set with class variable.
>> > >
>> > > However I also need to make a classification for a scenario, where
>> test
>> > > data may not have the class variable or class values are not known.
>>  For
>> > > ex, assume test data is about future data points, for which class
>> values
>> > > will have to be computed only in the future.
>> > >
>> > >
>> > > *         How is it possible to classify the test data set, where the
>> > > class label is not defined or not known. I have tried using default
>> > labels
>> > > like "unknown", "NO_LABEL". It doesnt seem to work.
>> > >
>> > >
>> > > *         How to set the class label as "unknown" in the testing
>> dataset.
>> > >
>> > > Looking forward to your reply,
>> > >
>> > > Thanks
>> > > Ranjitha.
>> > >
>> > >
>> > >
>> > > ::DISCLAIMER::
>> > >
>> > >
>> >
>> ----------------------------------------------------------------------------------------------------------------------------------------------------
>> > >
>> > > The contents of this e-mail and any attachment(s) are confidential and
>> > > intended for the named recipient(s) only.
>> > > E-mail transmission is not guaranteed to be secure or error-free as
>> > > information could be intercepted, corrupted,
>> > > lost, destroyed, arrive late or incomplete, or may contain viruses in
>> > > transmission. The e mail and its contents
>> > > (with or without referred errors) shall therefore not attach any
>> > liability
>> > > on the originator or HCL or its affiliates.
>> > > Views or opinions, if any, presented in this email are solely those of
>> > the
>> > > author and may not necessarily reflect the
>> > > views or opinions of HCL or its affiliates. Any form of reproduction,
>> > > dissemination, copying, disclosure, modification,
>> > > distribution and / or publication of this message without the prior
>> > > written consent of authorized representative of
>> > > HCL is strictly prohibited. If you have received this email in error
>> > > please delete it and notify the sender immediately.
>> > > Before opening any email and/or attachments, please check them for
>> > viruses
>> > > and other defects.
>> > >
>> > >
>> > >
>> >
>> ----------------------------------------------------------------------------------------------------------------------------------------------------
>> > >
>> >
>>
>
>

Reply via email to