Re: znode cversion decreasing?

2010-04-12 Thread Patrick Hunt

Hi Kevin,

The server increments a znode's cversion by one each time a change to 
it's child list is made. Every znode has it's own cversion. It should 
never decrease. If you delete a znode and create it a new then the 
cversion is reset for that znode. The cversion also happens to be used 
for the sequence number. Are you using the c or java client? Is this 
always happening or just in some cases? reproduceable? You might try 
creating your ephemerals with the sequence flag, then comparing the 
cversion of the parent with the sequence number assigned - might help 
with debugging.


Patrick

On 04/11/2010 03:53 PM, Kevin Webb wrote:

I'm using Zookeeper (3.2.2) for  a simple group membership service in
the manner that is typically described[1,2]:

I create a znode for the group, and each present group member adds an
ephemeral node under the group node. I'm using the cversion of the group
node as a group number. I expected this value to be monotonically
increasing, but I'm seeing instances where this isn't the case.
According to the programmer's guide, changes to a node will cause the
appropriate version number to increase, but it says nothing about
decreasing.

Am I misunderstanding something about the way node version numbers work?
Is there a better/recommended way to implement a monotonically
increasing group number?

Thanks!
Kevin


[1] http://hadoop.apache.org/zookeeper/docs/r3.2.2/recipes.html
[2]
http://eng.kaching.com/2010/01/actually-implementing-group-management.html


feed queue fetcher with hadoop/zookeeper/gearman?

2010-04-12 Thread Thomas Koch
Hi,

I'd like to implement a feed loader with Hadoop and most likely HBase. I've 
got around 1 million feeds, that should be loaded and checked for new entries. 
However the feeds have different priorities based on their average update 
frequency in the past and their relevance.
The feeds (url, last_fetched timestamp, priority) are stored in HBase. How 
could I implement the fetch queue for the loaders?

- An hourly map-reduce job to produce new queues for each node and save them 
on the nodes?
  - but how to know, which feeds have been fetched in the last hour?
  - what to do, if a fetch node dies?

- Store a fetch queue in zookeeper and add to the queue with map-reduce each 
hour?
  - Isn't that too much load for zookeeper? (I could make one znode for a 
bunch of urls...?)

- Use gearman to store the fetch queue?
  - But the gearman job server still seems to be a SPOF

[1] http://gearman.org

Thank you!

Thomas Koch, http://www.koch.ro


Re: feed queue fetcher with hadoop/zookeeper/gearman?

2010-04-12 Thread Mahadev Konar
Hi Thomas,
  There are a couple of projects inside Yahoo! that use ZooKeeper as an
event manager for feed processing.
  
I am little bit unclear on your example below. As I understand it-

1. There are 1 million feeds that will be stored in Hbase.
2. A map reduce job will be run on these feeds to find out which feeds need
to be fetched. 
3. This will create queues in ZooKeeper to fetch the feeds
4.  Workers will pull items from this queue and process feeds

Did I understand it correctly? Also, if above is the case, how many queue
items would you anticipate be accumulated every hour?

Thanks
mahadev


On 4/12/10 1:21 AM, Thomas Koch tho...@koch.ro wrote:

 Hi,
 
 I'd like to implement a feed loader with Hadoop and most likely HBase. I've
 got around 1 million feeds, that should be loaded and checked for new entries.
 However the feeds have different priorities based on their average update
 frequency in the past and their relevance.
 The feeds (url, last_fetched timestamp, priority) are stored in HBase. How
 could I implement the fetch queue for the loaders?
 
 - An hourly map-reduce job to produce new queues for each node and save them
 on the nodes?
   - but how to know, which feeds have been fetched in the last hour?
   - what to do, if a fetch node dies?
 
 - Store a fetch queue in zookeeper and add to the queue with map-reduce each
 hour?
   - Isn't that too much load for zookeeper? (I could make one znode for a
 bunch of urls...?)
 
 - Use gearman to store the fetch queue?
   - But the gearman job server still seems to be a SPOF
 
 [1] http://gearman.org
 
 Thank you!
 
 Thomas Koch, http://www.koch.ro
 



Re: feed queue fetcher with hadoop/zookeeper/gearman?

2010-04-12 Thread Thomas Koch
Mahadev Konar:
 Hi Thomas,
   There are a couple of projects inside Yahoo! that use ZooKeeper as an
 event manager for feed processing.
 
 I am little bit unclear on your example below. As I understand it-
 
 1. There are 1 million feeds that will be stored in Hbase.
 2. A map reduce job will be run on these feeds to find out which feeds need
 to be fetched.
 3. This will create queues in ZooKeeper to fetch the feeds
 4.  Workers will pull items from this queue and process feeds
 
 Did I understand it correctly? Also, if above is the case, how many queue
 items would you anticipate be accumulated every hour?
Yes. That's exactly what I'm thinking about. Currently one node processes like 
2 Feeds an hour and we have 5 feed-fetch-nodes. This would mean ~10 
queue items/hour. Each queue item should carry some meta informations, most 
important the feed items, that are already known to the system so that only 
new items get processed.

Thomas Koch, http://www.koch.ro


Re: feed queue fetcher with hadoop/zookeeper/gearman?

2010-04-12 Thread Patrick Hunt
See this environment http://bit.ly/4ekN8G. Subsequently I used the 3 
server setup, each configured with 8gig of heap in the jvm and 4 
CPUs/jvm (I think I used 10second session timeouts for this) for some 
additional testing that I've not written up yet. I was able to run ~500 
clients (same test script) in parallel. So that means about 5million 
znodes and 25million watches.


The thing to watch out for is:
1) most important is you need to tune the GC, in particular you need to 
turn on CMS and incremental GC. OTW the GC pauses will cause high 
latencies and you will see session timeouts

2) you need a stable network, esp for the serving ensemble
3) sufficient memory available in the JVM heap
4) no IO issues on the serving hosts (VM's, overloaded disk, swapping, 
etc...)


In your case you've got less going on with only 30 or so writes per 
second. The performance page shows that your going to be well below the 
max ops/sec we see in our testing harness.


btw, gearman would also be a good choice imo. I've looked at integrating 
ZK with gearman, there are two potentials. 1) as an additional backend 
persistent store for gearman, 2) as a way of addressing gearman 
failover. 1 is pretty simple to do today, 2 is harder, would require 
some changes to gearman itself but I think it would be useful (automatic 
failover of persistent tasks if a gearman server fails).


Patrick

On 04/12/2010 10:49 AM, Thomas Koch wrote:

Mahadev Konar:

Hi Thomas,
   There are a couple of projects inside Yahoo! that use ZooKeeper as an
event manager for feed processing.

I am little bit unclear on your example below. As I understand it-

1. There are 1 million feeds that will be stored in Hbase.
2. A map reduce job will be run on these feeds to find out which feeds need
to be fetched.
3. This will create queues in ZooKeeper to fetch the feeds
4.  Workers will pull items from this queue and process feeds

Did I understand it correctly? Also, if above is the case, how many queue
items would you anticipate be accumulated every hour?

Yes. That's exactly what I'm thinking about. Currently one node processes like
2 Feeds an hour and we have 5 feed-fetch-nodes. This would mean ~10
queue items/hour. Each queue item should carry some meta informations, most
important the feed items, that are already known to the system so that only
new items get processed.

Thomas Koch, http://www.koch.ro


Re: znode cversion decreasing?

2010-04-12 Thread Kevin Webb
On Mon, 12 Apr 2010 09:27:46 -0700
Mahadev Konar maha...@yahoo-inc.com wrote:

 HI Kevin,
 
  The cversion should be monotonically increasing for the the znode.
 It would be a bug if its not. Can you please elaborate in which cases
 you are seeing the cversion decreasing? If you can reproduce with an
 example that would be great.
 
 Thanks
 mahadev

Thanks Mahadev and Patrick!

Here are some more details:

I'm using the C client and running three servers on PlanetLab, with
each server on a different continent.  Most of the time, the cversion
is increasing as expected.  I'm never deleting the group node, so
that's not the issue.

Of course, now that I've emailed this list, I haven't seen it happen
again...  

I do have one old log file though:

ZK(10): 1270514949 (Re)Connected to zookeeper server.
ZK(10): 1270514952 Beginning new view #7.  Unsetting panic...
GOSSIP(10): 1270514952 Changing view to 7
ZK(10): 1270515798 Disconnected from zookeeper.  Setting panic...
ZK(10): 1270515803 (Re)Connected to zookeeper server.
ZK(10): 1270515806 Beginning new view #7.  Unsetting panic...
GOSSIP(10): 1270515806 Ignoring delivery request for view 7, current
view is 7.
ZK(10): 1270516812 Disconnected from zookeeper.  Setting panic...
ZK(10): 1270516823 (Re)Connected to zookeeper server.
ZK(10): 1270516826 Beginning new view #11.  Unsetting panic...
GOSSIP(10): 1270516826 Changing view to 11
ZK(10): 1270519191 Disconnected from zookeeper.  Setting panic...
ZK(10): 1270519195 (Re)Connected to zookeeper server.
ZK(10): 1270519198 Beginning new view #9.  Unsetting panic...
GOSSIP(10): 1270519198 Ignoring delivery request for view 9, current
view is 11.

The large integral number is a Unix seconds-since-epoch timestamp (the
result of calling time(NULL)).

In this case, the client connected, got group #7, disconnected,
reconnected, got #7 again, disconnected, reconnected, got #11,
disconnected, reconnected, and then got #9.

The host string that I pass to zookeeper_init contains only one
address:port, so it's not an issue of re-connecting to a different
server and getting old/stale information.


If/when it does happen again, I'll be sure to also save the zookeeper
server logs.

-Kevin


Re: znode cversion decreasing?

2010-04-12 Thread Kevin Webb
On Mon, 12 Apr 2010 14:33:44 -0700
Mahadev Konar maha...@yahoo-inc.com wrote:

 Hi Kevin,
 
  Thanks for the info. Could you cut and paste the code you are using
 that prints the view info?
 That would help. We can then create a jira and follow up on that.
 
 Also, a zookeeper client can never go back in time (even if its gets
 disconnected and connected back to another server).
 
 Thanks
 mahadev

Ah, sorry, I meant to include that last time.

This is the function I use to read the cversion:

static int32_t read_path_cversion(zhandle_t *zkhandle, const char
*path) {
struct Stat stat;
int zoo_result;

memset(stat, 0, sizeof(struct Stat));

zoo_result = zoo_exists(zkhandle, path, 0, stat);

if (zoo_result != ZOK) {
return -1;
}

return stat.cversion;
}

It gets called by this function, which is called whenever the client
(re)connects to the server or when the watch on the group node gets
triggered:

static int process_membership_change(zhandle_t *zkhandle, zk_comm_t
*context, const char *path) {
struct String_vector children;
int32_t view_before = 0;
int32_t view_after = view_before + 1;
int zoo_result = 0;
int i;

while (view_before != view_after) {
view_before = read_path_cversion(zkhandle, path);

zoo_result = zoo_get_children(zkhandle, path, 1, children);
if (zoo_result != ZOK) {
return zoo_result;
}

view_after = read_path_cversion(zkhandle, path);
}

printlog(LOG_CRITICAL, ZK(% PRIu32 ): %u Beginning new view #%
PRId32 .  Unsetting panic...\n, context-comm-id, time(NULL),
view_after);

(call application function to restart with group #view_after)

...
More application logic
...
}

Let me know if any other details would be helpful.

-Kevin


Re: znode cversion decreasing?

2010-04-12 Thread Patrick Hunt
We did have a case where the user setup 3 servers, each was standalone. 
:-) Doesn't look like that's the problem here though given you only 
specify 1 server in the connect string (although as mahadev mentioned 
you don't need to worry about that aspect).


After it goes 7-11-9, does it ever go back to 11 or just 9?

It would be good to capture the server log files (all 3) when this 
happens next time. Please provide those as well, would be critical for 
discovering this. In particular not many users are running cross-colo 
clusters.


If you can provide the config files too that will be useful.

What version of java/OS is being used?

Might be a good time to create a JIRA, attach all this to the JIRA so 
that you don't have to repeat. :-)


Patrick

On 04/12/2010 02:26 PM, Kevin Webb wrote:

On Mon, 12 Apr 2010 09:27:46 -0700
Mahadev Konarmaha...@yahoo-inc.com  wrote:


HI Kevin,

  The cversion should be monotonically increasing for the the znode.
It would be a bug if its not. Can you please elaborate in which cases
you are seeing the cversion decreasing? If you can reproduce with an
example that would be great.

Thanks
mahadev


Thanks Mahadev and Patrick!

Here are some more details:

I'm using the C client and running three servers on PlanetLab, with
each server on a different continent.  Most of the time, the cversion
is increasing as expected.  I'm never deleting the group node, so
that's not the issue.

Of course, now that I've emailed this list, I haven't seen it happen
again...

I do have one old log file though:

ZK(10): 1270514949 (Re)Connected to zookeeper server.
ZK(10): 1270514952 Beginning new view #7.  Unsetting panic...
GOSSIP(10): 1270514952 Changing view to 7
ZK(10): 1270515798 Disconnected from zookeeper.  Setting panic...
ZK(10): 1270515803 (Re)Connected to zookeeper server.
ZK(10): 1270515806 Beginning new view #7.  Unsetting panic...
GOSSIP(10): 1270515806 Ignoring delivery request for view 7, current
view is 7.
ZK(10): 1270516812 Disconnected from zookeeper.  Setting panic...
ZK(10): 1270516823 (Re)Connected to zookeeper server.
ZK(10): 1270516826 Beginning new view #11.  Unsetting panic...
GOSSIP(10): 1270516826 Changing view to 11
ZK(10): 1270519191 Disconnected from zookeeper.  Setting panic...
ZK(10): 1270519195 (Re)Connected to zookeeper server.
ZK(10): 1270519198 Beginning new view #9.  Unsetting panic...
GOSSIP(10): 1270519198 Ignoring delivery request for view 9, current
view is 11.

The large integral number is a Unix seconds-since-epoch timestamp (the
result of calling time(NULL)).

In this case, the client connected, got group #7, disconnected,
reconnected, got #7 again, disconnected, reconnected, got #11,
disconnected, reconnected, and then got #9.

The host string that I pass to zookeeper_init contains only one
address:port, so it's not an issue of re-connecting to a different
server and getting old/stale information.


If/when it does happen again, I'll be sure to also save the zookeeper
server logs.

-Kevin


Re: znode cversion decreasing?

2010-04-12 Thread Patrick Hunt


On 04/12/2010 03:58 PM, Kevin Webb wrote:

On Mon, 12 Apr 2010 15:09:20 -0700
Patrick Huntph...@apache.org  wrote:


We did have a case where the user setup 3 servers, each was
standalone. :-) Doesn't look like that's the problem here though
given you only specify 1 server in the connect string (although as
mahadev mentioned you don't need to worry about that aspect).


They're definitely not standalone.  Here's the server config:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=5
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=2
# the directory where the snapshot is stored.
dataDir=/home/pl_drl/zookeeper-3.2.2/data
# the port at which the clients will connect
clientPort=2181
server.1=hostname 1:2888:3888
server.2=hostname 2:2888:3888
server.3=hostname 3:2888:3888



What's the ping time btw colos? 2sec tickTime and esp the initLimit and 
syncLimit are pretty low. You are allowing for only 4 seconds to d/l the 
data repository to a remote server. Even in-colo we typically use a 
higher value... but you many not want to change until we can reproduce 
this. You probably want a 4 sec tickTime and 60/40sec (so settings of 
15/10) for the init/sync limits (something like that, depending on 
latencies/bandwidth you see)





After it goes 7-11-9, does it ever go back to 11 or just 9?


It actually does this:
7-7-11-9-9-12-14 ... (proceeds normally from here)



Hrm, that's very weird.


It would be good to capture the server log files (all 3) when this
happens next time. Please provide those as well, would be critical
for discovering this. In particular not many users are running
cross-colo clusters.


I'll be sure to save these next time.  I thought I had them for this
run, sorry.



NP. As I mentioned creating a JIRA would be a good idea. Very DRY.


If you can provide the config files too that will be useful.

What version of java/OS is being used?


I'm running on PlanetLab, which is based on Fedora 8 (very old).
uname says: Linux 2.6.22.19-vs2.3.0.34.39.planetlab #1 SMP Tue
Jun 30 09:32:05 UTC 2009 i686 i686 i386 GNU/Linux



Hrm...


java -version says:
java version 1.7.0
IcedTea Runtime Environment (build 1.7.0-b21)
IcedTea Client VM (build 1.7.0-b21, mixed mode)



Well we don't support 1.7 vms yet, but that's not to say that would 
cause the issue. Really once we see the server logs we should get more 
insight.


The only thing I could see with the os/java would be significant 
differences in thread/networking timing that we don't typically see with 
new os's and 1.6 vms...



Might be a good time to create a JIRA, attach all this to the JIRA so
that you don't have to repeat. :-)


I'll do that (including server logs) next time I see it happen.


Good.

Patrick


Re: znode cversion decreasing?

2010-04-12 Thread Patrick Hunt
Probably reaching for straws but could you print path, just to confirm 
it's what you know it is?


Patrick

On 04/12/2010 02:53 PM, Kevin Webb wrote:

On Mon, 12 Apr 2010 14:33:44 -0700
Mahadev Konarmaha...@yahoo-inc.com  wrote:


Hi Kevin,

  Thanks for the info. Could you cut and paste the code you are using
that prints the view info?
That would help. We can then create a jira and follow up on that.

Also, a zookeeper client can never go back in time (even if its gets
disconnected and connected back to another server).

Thanks
mahadev


Ah, sorry, I meant to include that last time.

This is the function I use to read the cversion:

static int32_t read_path_cversion(zhandle_t *zkhandle, const char
*path) {
 struct Stat stat;
 int zoo_result;

 memset(stat, 0, sizeof(struct Stat));

 zoo_result = zoo_exists(zkhandle, path, 0,stat);

 if (zoo_result != ZOK) {
 return -1;
 }

 return stat.cversion;
}

It gets called by this function, which is called whenever the client
(re)connects to the server or when the watch on the group node gets
triggered:

static int process_membership_change(zhandle_t *zkhandle, zk_comm_t
*context, const char *path) {
 struct String_vector children;
 int32_t view_before = 0;
 int32_t view_after = view_before + 1;
 int zoo_result = 0;
 int i;

 while (view_before != view_after) {
 view_before = read_path_cversion(zkhandle, path);

 zoo_result = zoo_get_children(zkhandle, path, 1,children);
 if (zoo_result != ZOK) {
 return zoo_result;
 }

 view_after = read_path_cversion(zkhandle, path);
 }

 printlog(LOG_CRITICAL, ZK(% PRIu32 ): %u Beginning new view #%
PRId32 .  Unsetting panic...\n, context-comm-id, time(NULL),
view_after);

 (call application function to restart with group #view_after)

 ...
 More application logic
 ...
}

Let me know if any other details would be helpful.

-Kevin