R jobs

Ioakim Perros Mon, 23 Jul 2012 19:09:03 -0700

Update (for anyone ending up here after a possible google search on theissue) :

Finally, running M/R job in order to bulk import data in apseudo-distributed is feasible (for testing purposes) .

The error concerning TotalOrderPartitioner had something to do with atrivial bug at the keys I passed from mappers.

The thing is that you need to add "guava-r09.jar" (or any version oflatest guava I suppose - it is located under lib folder of hbase setuppath) to the lib folder of hadoop setup path. I suppose that in orderfor the same job to run on a truly distributed environment, one has toadd -libjars /path/to/guava.jar to the options of hadoop jar command.


On 07/24/2012 02:06 AM, Jean-Daniel Cryans wrote:

... INFO mapred.JobClient: Task Id : attempt_201207232344_0001_m_000000_0,
Status : FAILED
java.lang.IllegalArgumentException: *Can't read partitions file*
     at
org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
...

I followed this link, while googling for the solution :
http://hbase.apache.org/book/trouble.mapreduce.html
and it implies a misconfiguration concerning a fully distributed
environment.

I would like, therefore, to ask if it is even possible to bulk import data
in a pseudo-distributed mode and if this is the case, does anyone have a
guess about this error?

AFAIK you just can't use the local job tracker for this, so you do
need to start one.

J-D

Re: Efficient read/write - Iterative M/R jobs

Reply via email to