Re: znode cversion decreasing?
Hi Kevin, The server increments a znode's cversion by one each time a change to it's child list is made. Every znode has it's own cversion. It should never decrease. If you delete a znode and create it a new then the cversion is reset for that znode. The cversion also happens to be used for the sequence number. Are you using the c or java client? Is this always happening or just in some cases? reproduceable? You might try creating your ephemerals with the sequence flag, then comparing the cversion of the parent with the sequence number assigned - might help with debugging. Patrick On 04/11/2010 03:53 PM, Kevin Webb wrote: I'm using Zookeeper (3.2.2) for a simple group membership service in the manner that is typically described[1,2]: I create a znode for the group, and each present group member adds an ephemeral node under the group node. I'm using the cversion of the group node as a group number. I expected this value to be monotonically increasing, but I'm seeing instances where this isn't the case. According to the programmer's guide, changes to a node will cause the appropriate version number to increase, but it says nothing about decreasing. Am I misunderstanding something about the way node version numbers work? Is there a better/recommended way to implement a monotonically increasing group number? Thanks! Kevin [1] http://hadoop.apache.org/zookeeper/docs/r3.2.2/recipes.html [2] http://eng.kaching.com/2010/01/actually-implementing-group-management.html
feed queue fetcher with hadoop/zookeeper/gearman?
Hi, I'd like to implement a feed loader with Hadoop and most likely HBase. I've got around 1 million feeds, that should be loaded and checked for new entries. However the feeds have different priorities based on their average update frequency in the past and their relevance. The feeds (url, last_fetched timestamp, priority) are stored in HBase. How could I implement the fetch queue for the loaders? - An hourly map-reduce job to produce new queues for each node and save them on the nodes? - but how to know, which feeds have been fetched in the last hour? - what to do, if a fetch node dies? - Store a fetch queue in zookeeper and add to the queue with map-reduce each hour? - Isn't that too much load for zookeeper? (I could make one znode for a bunch of urls...?) - Use gearman to store the fetch queue? - But the gearman job server still seems to be a SPOF [1] http://gearman.org Thank you! Thomas Koch, http://www.koch.ro
Re: feed queue fetcher with hadoop/zookeeper/gearman?
Hi Thomas, There are a couple of projects inside Yahoo! that use ZooKeeper as an event manager for feed processing. I am little bit unclear on your example below. As I understand it- 1. There are 1 million feeds that will be stored in Hbase. 2. A map reduce job will be run on these feeds to find out which feeds need to be fetched. 3. This will create queues in ZooKeeper to fetch the feeds 4. Workers will pull items from this queue and process feeds Did I understand it correctly? Also, if above is the case, how many queue items would you anticipate be accumulated every hour? Thanks mahadev On 4/12/10 1:21 AM, Thomas Koch tho...@koch.ro wrote: Hi, I'd like to implement a feed loader with Hadoop and most likely HBase. I've got around 1 million feeds, that should be loaded and checked for new entries. However the feeds have different priorities based on their average update frequency in the past and their relevance. The feeds (url, last_fetched timestamp, priority) are stored in HBase. How could I implement the fetch queue for the loaders? - An hourly map-reduce job to produce new queues for each node and save them on the nodes? - but how to know, which feeds have been fetched in the last hour? - what to do, if a fetch node dies? - Store a fetch queue in zookeeper and add to the queue with map-reduce each hour? - Isn't that too much load for zookeeper? (I could make one znode for a bunch of urls...?) - Use gearman to store the fetch queue? - But the gearman job server still seems to be a SPOF [1] http://gearman.org Thank you! Thomas Koch, http://www.koch.ro
Re: feed queue fetcher with hadoop/zookeeper/gearman?
Mahadev Konar: Hi Thomas, There are a couple of projects inside Yahoo! that use ZooKeeper as an event manager for feed processing. I am little bit unclear on your example below. As I understand it- 1. There are 1 million feeds that will be stored in Hbase. 2. A map reduce job will be run on these feeds to find out which feeds need to be fetched. 3. This will create queues in ZooKeeper to fetch the feeds 4. Workers will pull items from this queue and process feeds Did I understand it correctly? Also, if above is the case, how many queue items would you anticipate be accumulated every hour? Yes. That's exactly what I'm thinking about. Currently one node processes like 2 Feeds an hour and we have 5 feed-fetch-nodes. This would mean ~10 queue items/hour. Each queue item should carry some meta informations, most important the feed items, that are already known to the system so that only new items get processed. Thomas Koch, http://www.koch.ro
Re: feed queue fetcher with hadoop/zookeeper/gearman?
See this environment http://bit.ly/4ekN8G. Subsequently I used the 3 server setup, each configured with 8gig of heap in the jvm and 4 CPUs/jvm (I think I used 10second session timeouts for this) for some additional testing that I've not written up yet. I was able to run ~500 clients (same test script) in parallel. So that means about 5million znodes and 25million watches. The thing to watch out for is: 1) most important is you need to tune the GC, in particular you need to turn on CMS and incremental GC. OTW the GC pauses will cause high latencies and you will see session timeouts 2) you need a stable network, esp for the serving ensemble 3) sufficient memory available in the JVM heap 4) no IO issues on the serving hosts (VM's, overloaded disk, swapping, etc...) In your case you've got less going on with only 30 or so writes per second. The performance page shows that your going to be well below the max ops/sec we see in our testing harness. btw, gearman would also be a good choice imo. I've looked at integrating ZK with gearman, there are two potentials. 1) as an additional backend persistent store for gearman, 2) as a way of addressing gearman failover. 1 is pretty simple to do today, 2 is harder, would require some changes to gearman itself but I think it would be useful (automatic failover of persistent tasks if a gearman server fails). Patrick On 04/12/2010 10:49 AM, Thomas Koch wrote: Mahadev Konar: Hi Thomas, There are a couple of projects inside Yahoo! that use ZooKeeper as an event manager for feed processing. I am little bit unclear on your example below. As I understand it- 1. There are 1 million feeds that will be stored in Hbase. 2. A map reduce job will be run on these feeds to find out which feeds need to be fetched. 3. This will create queues in ZooKeeper to fetch the feeds 4. Workers will pull items from this queue and process feeds Did I understand it correctly? Also, if above is the case, how many queue items would you anticipate be accumulated every hour? Yes. That's exactly what I'm thinking about. Currently one node processes like 2 Feeds an hour and we have 5 feed-fetch-nodes. This would mean ~10 queue items/hour. Each queue item should carry some meta informations, most important the feed items, that are already known to the system so that only new items get processed. Thomas Koch, http://www.koch.ro
Re: znode cversion decreasing?
On Mon, 12 Apr 2010 09:27:46 -0700 Mahadev Konar maha...@yahoo-inc.com wrote: HI Kevin, The cversion should be monotonically increasing for the the znode. It would be a bug if its not. Can you please elaborate in which cases you are seeing the cversion decreasing? If you can reproduce with an example that would be great. Thanks mahadev Thanks Mahadev and Patrick! Here are some more details: I'm using the C client and running three servers on PlanetLab, with each server on a different continent. Most of the time, the cversion is increasing as expected. I'm never deleting the group node, so that's not the issue. Of course, now that I've emailed this list, I haven't seen it happen again... I do have one old log file though: ZK(10): 1270514949 (Re)Connected to zookeeper server. ZK(10): 1270514952 Beginning new view #7. Unsetting panic... GOSSIP(10): 1270514952 Changing view to 7 ZK(10): 1270515798 Disconnected from zookeeper. Setting panic... ZK(10): 1270515803 (Re)Connected to zookeeper server. ZK(10): 1270515806 Beginning new view #7. Unsetting panic... GOSSIP(10): 1270515806 Ignoring delivery request for view 7, current view is 7. ZK(10): 1270516812 Disconnected from zookeeper. Setting panic... ZK(10): 1270516823 (Re)Connected to zookeeper server. ZK(10): 1270516826 Beginning new view #11. Unsetting panic... GOSSIP(10): 1270516826 Changing view to 11 ZK(10): 1270519191 Disconnected from zookeeper. Setting panic... ZK(10): 1270519195 (Re)Connected to zookeeper server. ZK(10): 1270519198 Beginning new view #9. Unsetting panic... GOSSIP(10): 1270519198 Ignoring delivery request for view 9, current view is 11. The large integral number is a Unix seconds-since-epoch timestamp (the result of calling time(NULL)). In this case, the client connected, got group #7, disconnected, reconnected, got #7 again, disconnected, reconnected, got #11, disconnected, reconnected, and then got #9. The host string that I pass to zookeeper_init contains only one address:port, so it's not an issue of re-connecting to a different server and getting old/stale information. If/when it does happen again, I'll be sure to also save the zookeeper server logs. -Kevin
Re: znode cversion decreasing?
On Mon, 12 Apr 2010 14:33:44 -0700 Mahadev Konar maha...@yahoo-inc.com wrote: Hi Kevin, Thanks for the info. Could you cut and paste the code you are using that prints the view info? That would help. We can then create a jira and follow up on that. Also, a zookeeper client can never go back in time (even if its gets disconnected and connected back to another server). Thanks mahadev Ah, sorry, I meant to include that last time. This is the function I use to read the cversion: static int32_t read_path_cversion(zhandle_t *zkhandle, const char *path) { struct Stat stat; int zoo_result; memset(stat, 0, sizeof(struct Stat)); zoo_result = zoo_exists(zkhandle, path, 0, stat); if (zoo_result != ZOK) { return -1; } return stat.cversion; } It gets called by this function, which is called whenever the client (re)connects to the server or when the watch on the group node gets triggered: static int process_membership_change(zhandle_t *zkhandle, zk_comm_t *context, const char *path) { struct String_vector children; int32_t view_before = 0; int32_t view_after = view_before + 1; int zoo_result = 0; int i; while (view_before != view_after) { view_before = read_path_cversion(zkhandle, path); zoo_result = zoo_get_children(zkhandle, path, 1, children); if (zoo_result != ZOK) { return zoo_result; } view_after = read_path_cversion(zkhandle, path); } printlog(LOG_CRITICAL, ZK(% PRIu32 ): %u Beginning new view #% PRId32 . Unsetting panic...\n, context-comm-id, time(NULL), view_after); (call application function to restart with group #view_after) ... More application logic ... } Let me know if any other details would be helpful. -Kevin
Re: znode cversion decreasing?
We did have a case where the user setup 3 servers, each was standalone. :-) Doesn't look like that's the problem here though given you only specify 1 server in the connect string (although as mahadev mentioned you don't need to worry about that aspect). After it goes 7-11-9, does it ever go back to 11 or just 9? It would be good to capture the server log files (all 3) when this happens next time. Please provide those as well, would be critical for discovering this. In particular not many users are running cross-colo clusters. If you can provide the config files too that will be useful. What version of java/OS is being used? Might be a good time to create a JIRA, attach all this to the JIRA so that you don't have to repeat. :-) Patrick On 04/12/2010 02:26 PM, Kevin Webb wrote: On Mon, 12 Apr 2010 09:27:46 -0700 Mahadev Konarmaha...@yahoo-inc.com wrote: HI Kevin, The cversion should be monotonically increasing for the the znode. It would be a bug if its not. Can you please elaborate in which cases you are seeing the cversion decreasing? If you can reproduce with an example that would be great. Thanks mahadev Thanks Mahadev and Patrick! Here are some more details: I'm using the C client and running three servers on PlanetLab, with each server on a different continent. Most of the time, the cversion is increasing as expected. I'm never deleting the group node, so that's not the issue. Of course, now that I've emailed this list, I haven't seen it happen again... I do have one old log file though: ZK(10): 1270514949 (Re)Connected to zookeeper server. ZK(10): 1270514952 Beginning new view #7. Unsetting panic... GOSSIP(10): 1270514952 Changing view to 7 ZK(10): 1270515798 Disconnected from zookeeper. Setting panic... ZK(10): 1270515803 (Re)Connected to zookeeper server. ZK(10): 1270515806 Beginning new view #7. Unsetting panic... GOSSIP(10): 1270515806 Ignoring delivery request for view 7, current view is 7. ZK(10): 1270516812 Disconnected from zookeeper. Setting panic... ZK(10): 1270516823 (Re)Connected to zookeeper server. ZK(10): 1270516826 Beginning new view #11. Unsetting panic... GOSSIP(10): 1270516826 Changing view to 11 ZK(10): 1270519191 Disconnected from zookeeper. Setting panic... ZK(10): 1270519195 (Re)Connected to zookeeper server. ZK(10): 1270519198 Beginning new view #9. Unsetting panic... GOSSIP(10): 1270519198 Ignoring delivery request for view 9, current view is 11. The large integral number is a Unix seconds-since-epoch timestamp (the result of calling time(NULL)). In this case, the client connected, got group #7, disconnected, reconnected, got #7 again, disconnected, reconnected, got #11, disconnected, reconnected, and then got #9. The host string that I pass to zookeeper_init contains only one address:port, so it's not an issue of re-connecting to a different server and getting old/stale information. If/when it does happen again, I'll be sure to also save the zookeeper server logs. -Kevin
Re: znode cversion decreasing?
On 04/12/2010 03:58 PM, Kevin Webb wrote: On Mon, 12 Apr 2010 15:09:20 -0700 Patrick Huntph...@apache.org wrote: We did have a case where the user setup 3 servers, each was standalone. :-) Doesn't look like that's the problem here though given you only specify 1 server in the connect string (although as mahadev mentioned you don't need to worry about that aspect). They're definitely not standalone. Here's the server config: # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=5 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=2 # the directory where the snapshot is stored. dataDir=/home/pl_drl/zookeeper-3.2.2/data # the port at which the clients will connect clientPort=2181 server.1=hostname 1:2888:3888 server.2=hostname 2:2888:3888 server.3=hostname 3:2888:3888 What's the ping time btw colos? 2sec tickTime and esp the initLimit and syncLimit are pretty low. You are allowing for only 4 seconds to d/l the data repository to a remote server. Even in-colo we typically use a higher value... but you many not want to change until we can reproduce this. You probably want a 4 sec tickTime and 60/40sec (so settings of 15/10) for the init/sync limits (something like that, depending on latencies/bandwidth you see) After it goes 7-11-9, does it ever go back to 11 or just 9? It actually does this: 7-7-11-9-9-12-14 ... (proceeds normally from here) Hrm, that's very weird. It would be good to capture the server log files (all 3) when this happens next time. Please provide those as well, would be critical for discovering this. In particular not many users are running cross-colo clusters. I'll be sure to save these next time. I thought I had them for this run, sorry. NP. As I mentioned creating a JIRA would be a good idea. Very DRY. If you can provide the config files too that will be useful. What version of java/OS is being used? I'm running on PlanetLab, which is based on Fedora 8 (very old). uname says: Linux 2.6.22.19-vs2.3.0.34.39.planetlab #1 SMP Tue Jun 30 09:32:05 UTC 2009 i686 i686 i386 GNU/Linux Hrm... java -version says: java version 1.7.0 IcedTea Runtime Environment (build 1.7.0-b21) IcedTea Client VM (build 1.7.0-b21, mixed mode) Well we don't support 1.7 vms yet, but that's not to say that would cause the issue. Really once we see the server logs we should get more insight. The only thing I could see with the os/java would be significant differences in thread/networking timing that we don't typically see with new os's and 1.6 vms... Might be a good time to create a JIRA, attach all this to the JIRA so that you don't have to repeat. :-) I'll do that (including server logs) next time I see it happen. Good. Patrick
Re: znode cversion decreasing?
Probably reaching for straws but could you print path, just to confirm it's what you know it is? Patrick On 04/12/2010 02:53 PM, Kevin Webb wrote: On Mon, 12 Apr 2010 14:33:44 -0700 Mahadev Konarmaha...@yahoo-inc.com wrote: Hi Kevin, Thanks for the info. Could you cut and paste the code you are using that prints the view info? That would help. We can then create a jira and follow up on that. Also, a zookeeper client can never go back in time (even if its gets disconnected and connected back to another server). Thanks mahadev Ah, sorry, I meant to include that last time. This is the function I use to read the cversion: static int32_t read_path_cversion(zhandle_t *zkhandle, const char *path) { struct Stat stat; int zoo_result; memset(stat, 0, sizeof(struct Stat)); zoo_result = zoo_exists(zkhandle, path, 0,stat); if (zoo_result != ZOK) { return -1; } return stat.cversion; } It gets called by this function, which is called whenever the client (re)connects to the server or when the watch on the group node gets triggered: static int process_membership_change(zhandle_t *zkhandle, zk_comm_t *context, const char *path) { struct String_vector children; int32_t view_before = 0; int32_t view_after = view_before + 1; int zoo_result = 0; int i; while (view_before != view_after) { view_before = read_path_cversion(zkhandle, path); zoo_result = zoo_get_children(zkhandle, path, 1,children); if (zoo_result != ZOK) { return zoo_result; } view_after = read_path_cversion(zkhandle, path); } printlog(LOG_CRITICAL, ZK(% PRIu32 ): %u Beginning new view #% PRId32 . Unsetting panic...\n, context-comm-id, time(NULL), view_after); (call application function to restart with group #view_after) ... More application logic ... } Let me know if any other details would be helpful. -Kevin