Thank you very much for responding :-)

I also found this one : http://www.deerwalk.com/bulk_importing_data , which seems very informative.

The thing is that I tried to create and run a simple (custom) bulk loading job and I tried to run it locally (in pseudo-distributed mode) - and the following error occurs:

... INFO mapred.JobClient: Task Id : attempt_201207232344_0001_m_000000_0, Status : FAILED
java.lang.IllegalArgumentException: *Can't read partitions file*
at org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111) ...

I followed this link, while googling for the solution : http://hbase.apache.org/book/trouble.mapreduce.html and it implies a misconfiguration concerning a fully distributed environment.

I would like, therefore, to ask if it is even possible to bulk import data in a pseudo-distributed mode and if this is the case, does anyone have a guess about this error?

Thanks in advance!
IP


On 07/23/2012 07:40 AM, Sonal Goyal wrote:
Hi,

You can check the bulk loading section at

http://hbase.apache.org/book/arch.bulk.load.html

Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Mon, Jul 23, 2012 at 6:15 AM, Ioakim Perros <[email protected]> wrote:

Hi,

Is there any efficient way (beyond the trivial using TableMapReduceUtil /
TableOutputFormat) to perform faster read and write operations to tables ?
Could anyone provide some example code of it ?

As of faster importing to table, I am aware of tools such as
completebulkload, but I would prefer triggering such a process through M/R
code, as I would like a whole table to be read and updated through
iterations of M/R jobs.

Thanks in advance!
IP


Reply via email to