gearman?

Patrick Hunt Mon, 12 Apr 2010 11:23:08 -0700

See this environment http://bit.ly/4ekN8G. Subsequently I used the 3server setup, each configured with 8gig of heap in the jvm and 4CPUs/jvm (I think I used 10second session timeouts for this) for someadditional testing that I've not written up yet. I was able to run ~500clients (same test script) in parallel. So that means about 5millionznodes and 25million watches.


The thing to watch out for is:

1) most important is you need to tune the GC, in particular you need toturn on CMS and incremental GC. OTW the GC pauses will cause highlatencies and you will see session timeouts

2) you need a stable network, esp for the serving ensemble
3) sufficient memory available in the JVM heap

4) no IO issues on the serving hosts (VM's, overloaded disk, swapping,etc...)

In your case you've got less going on with only 30 or so writes persecond. The performance page shows that your going to be well below themax ops/sec we see in our testing harness.

btw, gearman would also be a good choice imo. I've looked at integratingZK with gearman, there are two potentials. 1) as an additional backendpersistent store for gearman, 2) as a way of addressing gearmanfailover. 1 is pretty simple to do today, 2 is harder, would requiresome changes to gearman itself but I think it would be useful (automaticfailover of persistent tasks if a gearman server fails).


Patrick

On 04/12/2010 10:49 AM, Thomas Koch wrote:

Mahadev Konar:

Hi Thomas,
   There are a couple of projects inside Yahoo! that use ZooKeeper as an
event manager for feed processing.

I am little bit unclear on your example below. As I understand it-

1. There are 1 million feeds that will be stored in Hbase.
2. A map reduce job will be run on these feeds to find out which feeds need
to be fetched.
3. This will create queues in ZooKeeper to fetch the feeds
4.  Workers will pull items from this queue and process feeds

Did I understand it correctly? Also, if above is the case, how many queue
items would you anticipate be accumulated every hour?

Yes. That's exactly what I'm thinking about. Currently one node processes like
20000 Feeds an hour and we have 5 feed-fetch-nodes. This would mean ~100000
queue items/hour. Each queue item should carry some meta informations, most
important the feed items, that are already known to the system so that only
new items get processed.

Thomas Koch, http://www.koch.ro

Re: feed queue fetcher with hadoop/zookeeper/gearman?

Reply via email to