Glad it worked.
On Tue, Jan 22, 2013 at 7:20 AM, Ranjitha Chandrashekar <[email protected] > wrote: > Hi Deneche, > > The patch is working perfect! The classifier now output the label string > values instead of the > numerical codes. > > Thank you for fixing the issue. > > Regards > Ranjitha > > -----Original Message----- > From: deneche abdelhakim [mailto:[email protected]] > Sent: 18 January 2013 21:14 > To: [email protected] > Subject: Re: Issue with Partial Implementation Problem > > I submitted a patch, you can give it a try and let me know if it fixes the > problem. > > https://issues.apache.org/jira/browse/MAHOUT-1143 > > The classifier should now output the label string values instead of the > numerical codes. > > > On Fri, Jan 18, 2013 at 4:26 PM, deneche abdelhakim <[email protected] > >wrote: > > > Hi Ranjitha, > > > > I created a JIRA issue to fix this, and should submit a patch soon. > > > > > > On Fri, Jan 18, 2013 at 10:29 AM, Ranjitha Chandrashekar < > > [email protected]> wrote: > > > >> Hi Deneche, > >> > >> Thanks. As suggested, I replaced the label value as "normal" in KDDTest > >> dataset and tested the forest without -a option. > >> It generates a binary file(.out file) with values 0 and 1. > >> > >> In order to interpret this I have gone through the code and hence > >> understand that MR job (Classifier.CMapper) generates a file with Key -> > >> Correct Label and Value -> Prediction. Then it creates a new file with > .out > >> extension which only contains Values i.e. Prediction(0 or 1) in my case > and > >> then it deletes the previous file generated by the MR job. Hence I do > not > >> have access to the file generated by MR job which contains Correct Label > >> and Prediction for each input Test record > >> > >> After looking at these predictions I am not sure what 0 and 1 actually > >> means . Does 1 mean its classified correctly..? "normal" in this case > and 0 > >> means the classification is wrong and should be "anamoly"? > >> > >> Please Suggest > >> > >> Regards > >> Ranjitha > >> > >> -----Original Message----- > >> From: deneche abdelhakim [mailto:[email protected]] > >> Sent: 18 January 2013 12:21 > >> To: [email protected] > >> Subject: Re: Issue with Partial Implementation Problem > >> > >> My mistake. You should put any label value available in the training > set. > >> In the previous example, putting "normal" in all test record should be > >> fine. > >> > >> > >> On Fri, Jan 18, 2013 at 7:26 AM, Ranjitha Chandrashekar < > >> [email protected] > >> > wrote: > >> > >> > Hi Deneche > >> > > >> > Thank you for your quick response. > >> > > >> > I tried using the numerical value in the label attribute in the test > >> data. > >> > > >> > Original Record in KDDTest : > >> > > >> > 13,tcp,telnet,SF,118,2425,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,26,10,0.38,0.12,0.04,0.00,0.00,0.00,0.12,0.30,normal > >> > > >> > Replaced Record : > >> > > >> > > >> > 13,tcp,telnet,SF,118,2425,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,26,10,0.38,0.12,0.04,0.00,0.00,0.00,0.12,0.30,1 > >> > > >> > (normal class replaced with numerical value 1) > >> > > >> > Ran TestForest on KDDTest dataset. Following is the error that i get. > >> > Sequential and map reduce classification gives the same error. > >> > > >> > Command --> hadoop jar > >> > /usr/lib/mahout-0.5/mahout-examples-0.5-cdh3u5-job.jar > >> > org.apache.mahout.df.mapreduce.TestForest -i > >> > /user/ranjitha/input/KDDTest+.arff.txt_withnum -ds > >> > /user/ranjitha/input/KDDTrain+.info -m /user/ranjitha/KDDForest -o > >> > /user/ranjitha/KDDResult > >> > > >> > 13/01/18 11:29:24 INFO mapreduce.TestForest: Loading the forest... > >> > 13/01/18 11:29:24 INFO mapreduce.TestForest: Sequential > >> classification... > >> > 13/01/18 11:29:24 ERROR data.DataConverter: label token: 1 > >> dataset.labels: > >> > [normal, anomaly] Exception in thread "main" > >> > java.lang.IllegalStateException: Label value (1) not known > >> > at > >> > org.apache.mahout.df.data.DataConverter.convert(DataConverter.java:71) > >> > at > >> > > org.apache.mahout.df.mapreduce.TestForest.testFile(TestForest.java:256) > >> > at > >> > > >> > org.apache.mahout.df.mapreduce.TestForest.sequential(TestForest.java:216) > >> > at > >> > > >> > org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:172) > >> > at > >> > org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:142) > >> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >> > at > >> > org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:275) > >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> > at > >> > > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > >> > at > >> > > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >> > at java.lang.reflect.Method.invoke(Method.java:616) > >> > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > >> > > >> > Looking forward to your reply > >> > > >> > Thanks > >> > Ranjitha. > >> > > >> > -----Original Message----- > >> > From: deneche abdelhakim [mailto:[email protected]] > >> > Sent: 17 January 2013 18:20 > >> > To: [email protected] > >> > Subject: Re: Issue with Partial Implementation Problem > >> > > >> > Hi Ranjitha, > >> > > >> > just put any numerical value in the label attribute. You should be > able > >> to > >> > classify the data, but you won't be able to compute the confusion > >> matrix or > >> > the accuracy. > >> > > >> > > >> > On Thu, Jan 17, 2013 at 12:15 PM, Ranjitha Chandrashekar < > >> > [email protected]> wrote: > >> > > >> > > Hi > >> > > > >> > > I am using Partial Implementation for Random Forest classification. > >> > > > >> > > I have a training dataset with labels class0, class 1, class 2. The > >> > > decision forest is built on this training dataset. The > classification > >> > for > >> > > the test dataset is computed using the same data descriptor > generated > >> for > >> > > the training dataset. I am able to generate confusion matrix, > >> accuracy > >> > > details with the test data set with class variable. > >> > > > >> > > However I also need to make a classification for a scenario, where > >> test > >> > > data may not have the class variable or class values are not known. > >> For > >> > > ex, assume test data is about future data points, for which class > >> values > >> > > will have to be computed only in the future. > >> > > > >> > > > >> > > * How is it possible to classify the test data set, where > the > >> > > class label is not defined or not known. I have tried using default > >> > labels > >> > > like "unknown", "NO_LABEL". It doesnt seem to work. > >> > > > >> > > > >> > > * How to set the class label as "unknown" in the testing > >> dataset. > >> > > > >> > > Looking forward to your reply, > >> > > > >> > > Thanks > >> > > Ranjitha. > >> > > > >> > > > >> > > > >> > > ::DISCLAIMER:: > >> > > > >> > > > >> > > >> > ---------------------------------------------------------------------------------------------------------------------------------------------------- > >> > > > >> > > The contents of this e-mail and any attachment(s) are confidential > and > >> > > intended for the named recipient(s) only. > >> > > E-mail transmission is not guaranteed to be secure or error-free as > >> > > information could be intercepted, corrupted, > >> > > lost, destroyed, arrive late or incomplete, or may contain viruses > in > >> > > transmission. The e mail and its contents > >> > > (with or without referred errors) shall therefore not attach any > >> > liability > >> > > on the originator or HCL or its affiliates. > >> > > Views or opinions, if any, presented in this email are solely those > of > >> > the > >> > > author and may not necessarily reflect the > >> > > views or opinions of HCL or its affiliates. Any form of > reproduction, > >> > > dissemination, copying, disclosure, modification, > >> > > distribution and / or publication of this message without the prior > >> > > written consent of authorized representative of > >> > > HCL is strictly prohibited. If you have received this email in error > >> > > please delete it and notify the sender immediately. > >> > > Before opening any email and/or attachments, please check them for > >> > viruses > >> > > and other defects. > >> > > > >> > > > >> > > > >> > > >> > ---------------------------------------------------------------------------------------------------------------------------------------------------- > >> > > > >> > > >> > > > > >
