Hey guys,
I am not sure if this is the correct list (could also be HBase), but I think my 
doubt
is more related to the MR than to the HBase itself.

I am trying to update some columns of a family in my Hbase db using a MR job.
In each column I have a byte array with different information concatenated.

So far, so easy. A MR job with the table as input and the scan.setfamily.
My problem is that the rows that I want to update are inside a file. In
other words, I have a big file containing all the rows that should be updated.

So, I have to read the row:column content so I can update it and then write it 
again.
But I also have to read the file in order to know which files I should update.

I could read the file and do a get followed by a put, but this would not be a 
MR job
and would be very slow if there are a lot of entries in the file.

Another possibility I thought but don't know how to implement, is to use the 
table
as input and have a Map with the rows that should be updated. The problem is 
that
I don't know how to distribute this Map or how to distribute the file so every 
Map
can read it.

Any thoughts?

Thanks,
Pablo

Reply via email to