Hi Stuti, Yes they are in HDFS.
I think I almost nailed down the problem. I checked my dataset. There is only one data with "EMI" label. So when I split the dataset into test and training set, I think it is not in training set. I suspect this might be a problem. To confirm, I am removing this data from my dataset and I am going to run the mahout commands again. Regards, Anand.C -----Original Message----- From: Stuti Awasthi [mailto:[email protected]] Sent: Wednesday, May 15, 2013 10:56 AM To: [email protected] Subject: RE: java.lang.IllegalArgumentException: Label not found: EMI whi running mahout testnb Hi ChandraMohan, Yes, I am also looking at text classification using Mahout. I have also tried this link and it worked for me. Just a basic question, I hope you have your files in HDFS and not in local. This was the first mistake I did in running 20 Newsgroup example. Thanks Stuti Awasthi -----Original Message----- From: Chandra Mohan, Ananda Vel Murugan [mailto:[email protected]] Sent: Wednesday, May 15, 2013 9:37 AM To: [email protected] Subject: RE: java.lang.IllegalArgumentException: Label not found: EMI whi running mahout testnb Hi Stuti, Thanks for your response. Labels are present. I ran seqdumper as you have suggested and I could see the labels. I see that you are also into similar text classification effort as me. I am referring this link http://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/ Do you have any other links or references? Regards, Anand.C -----Original Message----- From: Stuti Awasthi [mailto:[email protected]] Sent: Tuesday, May 14, 2013 12:23 PM To: [email protected] Subject: RE: java.lang.IllegalArgumentException: Label not found: EMI whi running mahout testnb Hi Chandra, I think that your label is not created correctly. Check the file by using seqdumper to see if there are labels present in that . Thanks Stuti Awasthi -----Original Message----- From: Chandra Mohan, Ananda Vel Murugan [mailto:[email protected]] Sent: Tuesday, May 14, 2013 10:10 AM To: [email protected] Subject: java.lang.IllegalArgumentException: Label not found: EMI whi running mahout testnb Hi, I am running a Hadoop 1.0.2 cluster in pseudo distributed mode and my Mahout version is 0.7. I am trying to do Text classification using Mahout naïve bayes command. I created sequence files using my custom java program and uploaded the seq file to HDFS. I am running the following mahout commands mahout seq2sparse -I my-seq -o my-vectors mahout split -i my-vectors/tfidf-vectors --trainingOutput train-vectors --testOutput test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential mahout trainnb -i train-vectors -el -li labelindex -o model -ow -c mahout testnb -i train-vectors -m model -l labelindex -ow -o my-testing -c mahout testnb -i test-vectors -m model -l labelindex -ow -o tweets-testing -c When I run this final command, I am getting the following exception. Exception in thread "main" java.lang.IllegalArgumentException: Label not found: EMI at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:102) at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:122) at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:126) at org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:94) at org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:71) at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.analyzeResults(TestNaiveBayesDriver.java:158) at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.run(TestNaiveBayesDriver.java:124) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.main(TestNaiveBayesDriver.java:65) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) Can someone help me in fixing this error? Regards, Anand.C ::DISCLAIMER:: ---------------------------------------------------------------------------------------------------------------------------------------------------- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects. ----------------------------------------------------------------------------------------------------------------------------------------------------
