We need to shove in 10 million lines to process into hbase.  I have the file on 
hadoop dfs and would like to map/reduce it to just put the line to the account 
that the line is for(ie. Routes right to the node).

I am considering putting a List<lines> in the Account basically(ie. A 
column-family with n columns) and a coprocessor then processes the line for 
that account(lots of work is needed to be done here including checking the 
Activity table which is a key of account-sequence so it is co-located with the 
account.

Is this the most optimal strategy?  I almost wish I had a grid function I just 
call and pass it the line and it runs on the node where the account is.

Also, if my processing fails for any reason, will the original client that did 
the put receive an exception???  This way I can record, lines 5, 10, 15 failed 
for reasons A, B, and C.

Thanks for any info here,
Dean



This message and any attachments are intended only for the use of the addressee 
and
may contain information that is privileged and confidential. If the reader of 
the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.

Reply via email to