We need to shove in 10 million lines to process into hbase. I have the file on hadoop dfs and would like to map/reduce it to just put the line to the account that the line is for(ie. Routes right to the node).
I am considering putting a List<lines> in the Account basically(ie. A column-family with n columns) and a coprocessor then processes the line for that account(lots of work is needed to be done here including checking the Activity table which is a key of account-sequence so it is co-located with the account. Is this the most optimal strategy? I almost wish I had a grid function I just call and pass it the line and it runs on the node where the account is. Also, if my processing fails for any reason, will the original client that did the put receive an exception??? This way I can record, lines 5, 10, 15 failed for reasons A, B, and C. Thanks for any info here, Dean This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.
