Thank you very much for responding :-)
I also found this one : http://www.deerwalk.com/bulk_importing_data ,
which seems very informative.
The thing is that I tried to create and run a simple (custom) bulk
loading job and I tried to run it locally (in pseudo-distributed mode) -
and the following error occurs:
... INFO mapred.JobClient: Task Id :
attempt_201207232344_0001_m_000000_0, Status : FAILED
java.lang.IllegalArgumentException: *Can't read partitions file*
at
org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
...
I followed this link, while googling for the solution :
http://hbase.apache.org/book/trouble.mapreduce.html
and it implies a misconfiguration concerning a fully distributed
environment.
I would like, therefore, to ask if it is even possible to bulk import
data in a pseudo-distributed mode and if this is the case, does anyone
have a guess about this error?
Thanks in advance!
IP
On 07/23/2012 07:40 AM, Sonal Goyal wrote:
Hi,
You can check the bulk loading section at
http://hbase.apache.org/book/arch.bulk.load.html
Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>
<http://in.linkedin.com/in/sonalgoyal>
On Mon, Jul 23, 2012 at 6:15 AM, Ioakim Perros <[email protected]> wrote:
Hi,
Is there any efficient way (beyond the trivial using TableMapReduceUtil /
TableOutputFormat) to perform faster read and write operations to tables ?
Could anyone provide some example code of it ?
As of faster importing to table, I am aware of tools such as
completebulkload, but I would prefer triggering such a process through M/R
code, as I would like a whole table to be read and updated through
iterations of M/R jobs.
Thanks in advance!
IP