Hi Bejoy,
thank you for the answer, I will try it. But I still have a doubt.
How should I manage connections to HBase inside the job?

Should I open a new connection in each job? How can I set  a
connectionPool inside a job?

Thank you,
Pablo

From: Bejoy Ks [mailto:[email protected]]
Sent: quinta-feira, 27 de setembro de 2012 14:55
To: [email protected]
Subject: Re: Two map inputs (file & HBase). "Join" file data and Hbase data 
into a map reduce.

Hi Pablo

>I could read the file and do a get followed by a put, but this would not be a 
>MR job
>and would be very slow if there are a lot of entries in the file.

If you have a large file, by using mapreduce you can parallelize the hbase gets 
and puts. Configure the split size accordingly so that there are sufficient 
number of mappers to ensure enough parallelism and good performance.
On Thu, Sep 27, 2012 at 11:14 PM, Pablo Musa 
<[email protected]<mailto:[email protected]>> wrote:
Hey guys,
I am not sure if this is the correct list (could also be HBase), but I think my 
doubt
is more related to the MR than to the HBase itself.

I am trying to update some columns of a family in my Hbase db using a MR job.
In each column I have a byte array with different information concatenated.

So far, so easy. A MR job with the table as input and the scan.setfamily.
My problem is that the rows that I want to update are inside a file. In
other words, I have a big file containing all the rows that should be updated.

So, I have to read the row:column content so I can update it and then write it 
again.
But I also have to read the file in order to know which files I should update.

I could read the file and do a get followed by a put, but this would not be a 
MR job
and would be very slow if there are a lot of entries in the file.

Another possibility I thought but don't know how to implement, is to use the 
table
as input and have a Map with the rows that should be updated. The problem is 
that
I don't know how to distribute this Map or how to distribute the file so every 
Map
can read it.

Any thoughts?

Thanks,
Pablo

Reply via email to