Test naivebayes task running really slowly and not in distributed mode

Fernando Santos Sat, 30 Nov 2013 11:23:32 -0800

Hello everyone,

I'm trying to do a text classification task. My dataset is not that big, I
have around 700.000 small comments.

Following the 20newsgroups example, I created the vector from the text,
splited it and trained the model. Now I'm trying to test it but it is
really slow and also I cannot make it to run in the cluster. Whatever I do
it always just run in one machine. And I think the testnb algorithm is
supposed to run using mapReduce, right?

I also tried this example here (
http://chimpler.wordpress.com/2013/06/24/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages-part-2-distribute-classification-with-hadoop/)
but also, the other box in the cluster is not executing any task. In fact,
when I execute the testnb or using the MapReduceClassifier proposed in this
tutorial above, I get one job, executing one task and this task runs really
slowly (like 6 minutes to achieve 0.13% of the task).

I think I must be doing something wrong so that the cluster is not working
how it is supposed to be.

I have a cluster with 2 box configured with hadoop 0.20.205.0 and using
mahout 0.8.

I also tried versions 0.7 and 0.6 of mahout but nothing changed.

Any help would be aprreciated.

The logs I have from this task:

*stdout logs*

Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library
/usr/local/hadoop/lib/libhadoop.so which might have disabled stack
guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c
<libfile>', or link it with '-z noexecstack'.

*syslog logs*

2013-11-30 17:09:19,191 WARN org.apache.hadoop.util.NativeCodeLoader:
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
2013-11-30 17:09:19,400 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
already exists!
2013-11-30 17:09:19,472 INFO org.apache.hadoop.util.ProcessTree:
setsid exited with exit code 0
2013-11-30 17:09:19,474 INFO org.apache.hadoop.mapred.Task: Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@5810d963
2013-11-30 17:09:19,543 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
2013-11-30 17:09:19,569 INFO org.apache.hadoop.mapred.MapTask: data
buffer = 79691776/99614720
2013-11-30 17:09:19,569 INFO org.apache.hadoop.mapred.MapTask: record
buffer = 262144/327680

--
Fernando Santos
+55 61 8129 8505

Test naivebayes task running really slowly and not in distributed mode

Reply via email to