Hi, I have a table which I want to parse over a MR job.
Today, I'm using a scan to parse all the rows. Each row is retrieve, removed, and the parsed (feeding 2 other tables) The goal is to parse all the content while some process might still be adding some more. On the map method from the MR job, can I delete the row I'm working with? If so, how should I do? should I take the table from the pool, and simply call the delete method? The issue is, doing a delete for each line will take a while. I would prefer to batch them, but I don't know when will be the last line, so it's difficult to know when to send the batch. Is there a way to say to the MR job to delete this line? Also, what's the impact on the MR job if I delete the row it's working one? Or is the MR job not the best way to do that? Thanks, JM
