Update (for anyone ending up here after a possible google search on the issue) :

Finally, running M/R job in order to bulk import data in a pseudo-distributed is feasible (for testing purposes) .

The error concerning TotalOrderPartitioner had something to do with a trivial bug at the keys I passed from mappers.

The thing is that you need to add "guava-r09.jar" (or any version of latest guava I suppose - it is located under lib folder of hbase setup path) to the lib folder of hadoop setup path. I suppose that in order for the same job to run on a truly distributed environment, one has to add -libjars /path/to/guava.jar to the options of hadoop jar command.

On 07/24/2012 02:06 AM, Jean-Daniel Cryans wrote:
... INFO mapred.JobClient: Task Id : attempt_201207232344_0001_m_000000_0,
Status : FAILED
java.lang.IllegalArgumentException: *Can't read partitions file*
     at
org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
...

I followed this link, while googling for the solution :
http://hbase.apache.org/book/trouble.mapreduce.html
and it implies a misconfiguration concerning a fully distributed
environment.

I would like, therefore, to ask if it is even possible to bulk import data
in a pseudo-distributed mode and if this is the case, does anyone have a
guess about this error?
AFAIK you just can't use the local job tracker for this, so you do
need to start one.

J-D




Reply via email to