Hey guys, I am not sure if this is the correct list (could also be HBase), but I think my doubt is more related to the MR than to the HBase itself.
I am trying to update some columns of a family in my Hbase db using a MR job. In each column I have a byte array with different information concatenated. So far, so easy. A MR job with the table as input and the scan.setfamily. My problem is that the rows that I want to update are inside a file. In other words, I have a big file containing all the rows that should be updated. So, I have to read the row:column content so I can update it and then write it again. But I also have to read the file in order to know which files I should update. I could read the file and do a get followed by a put, but this would not be a MR job and would be very slow if there are a lot of entries in the file. Another possibility I thought but don't know how to implement, is to use the table as input and have a Map with the rows that should be updated. The problem is that I don't know how to distribute this Map or how to distribute the file so every Map can read it. Any thoughts? Thanks, Pablo
