I have what seems to be unique situation or I can't find a comparative example. Every night, we slam our data storage with new information coming in and need to use old and new information to calculate values to be used by the web apps which is very very intensive(some calculus involved as well).....12 to 24 hours worth.
I am wondering if there are map/reduce jobs that can be trigger oriented so as this data is written to 30 nodes in a cluster, the map/reduce jobs just go off and run as the data comes into each node? Is this a possibility? Are there hooks in hbase or maybe map/reduce that allow this? hmmm, I have hdfs and hbase checked out. Caveat, we don't want to load ALL the data and then run map/reduce. We really just want to send data into the cluster and it runs, send more data into the cluster and it runs, etc. etc. and when it runs, it is just running on the nodes that received data, not on all nodes like a typical map/reduce job. Thanks, Dean This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.
