Mahadev Konar: > Hi Thomas, > There are a couple of projects inside Yahoo! that use ZooKeeper as an > event manager for feed processing. > > I am little bit unclear on your example below. As I understand it- > > 1. There are 1 million feeds that will be stored in Hbase. > 2. A map reduce job will be run on these feeds to find out which feeds need > to be fetched. > 3. This will create queues in ZooKeeper to fetch the feeds > 4. Workers will pull items from this queue and process feeds > > Did I understand it correctly? Also, if above is the case, how many queue > items would you anticipate be accumulated every hour? Yes. That's exactly what I'm thinking about. Currently one node processes like 20000 Feeds an hour and we have 5 feed-fetch-nodes. This would mean ~100000 queue items/hour. Each queue item should carry some meta informations, most important the feed items, that are already known to the system so that only new items get processed.
Thomas Koch, http://www.koch.ro