Did the training run use both machines? How large is the input for the test run?
Is it contained in a single file? On Sat, Nov 30, 2013 at 11:22 AM, Fernando Santos < [email protected]> wrote: > Hello everyone, > > I'm trying to do a text classification task. My dataset is not that big, I > have around 700.000 small comments. > > Following the 20newsgroups example, I created the vector from the text, > splited it and trained the model. Now I'm trying to test it but it is > really slow and also I cannot make it to run in the cluster. Whatever I do > it always just run in one machine. And I think the testnb algorithm is > supposed to run using mapReduce, right? > > I also tried this example here ( > > http://chimpler.wordpress.com/2013/06/24/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages-part-2-distribute-classification-with-hadoop/ > ) > but also, the other box in the cluster is not executing any task. In fact, > when I execute the testnb or using the MapReduceClassifier proposed in this > tutorial above, I get one job, executing one task and this task runs really > slowly (like 6 minutes to achieve 0.13% of the task). > > I think I must be doing something wrong so that the cluster is not working > how it is supposed to be. > > I have a cluster with 2 box configured with hadoop 0.20.205.0 and using > mahout 0.8. > > I also tried versions 0.7 and 0.6 of mahout but nothing changed. > > Any help would be aprreciated. > > > The logs I have from this task: > > > *stdout logs* > > Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library > /usr/local/hadoop/lib/libhadoop.so which might have disabled stack > guard. The VM will try to fix the stack guard now. > It's highly recommended that you fix the library with 'execstack -c > <libfile>', or link it with '-z noexecstack'. > > > *syslog logs* > > 2013-11-30 17:09:19,191 WARN org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using > builtin-java classes where applicable > 2013-11-30 17:09:19,400 WARN > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi > already exists! > 2013-11-30 17:09:19,472 INFO org.apache.hadoop.util.ProcessTree: > setsid exited with exit code 0 > 2013-11-30 17:09:19,474 INFO org.apache.hadoop.mapred.Task: Using > ResourceCalculatorPlugin : > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@5810d963 > 2013-11-30 17:09:19,543 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb > = 100 > 2013-11-30 17:09:19,569 INFO org.apache.hadoop.mapred.MapTask: data > buffer = 79691776/99614720 > 2013-11-30 17:09:19,569 INFO org.apache.hadoop.mapred.MapTask: record > buffer = 262144/327680 > > > > > > -- > Fernando Santos > +55 61 8129 8505 >
