[jira] Commented: (ZOOKEEPER-493) patch for command line setquota

2009-08-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739428#action_12739428
 ] 

Hudson commented on ZOOKEEPER-493:
--

Integrated in ZooKeeper-trunk #405 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/405/])
. patch for command line setquota


 patch for command line setquota 
 

 Key: ZOOKEEPER-493
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-493
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.2.0
Reporter: steve bendiola
Assignee: steve bendiola
Priority: Minor
 Fix For: 3.2.1, 3.3.0

 Attachments: quotafix.patch, ZOOKEEPER-493.patch


 the command line setquota tries to use argument 3 as both a path and a value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-491) Prevent zero-weight servers from being elected

2009-08-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739426#action_12739426
 ] 

Hudson commented on ZOOKEEPER-491:
--

Integrated in ZooKeeper-trunk #405 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/405/])
. Prevent zero-weight servers from being elected. (flavio via mahadev)


 Prevent zero-weight servers from being elected
 --

 Key: ZOOKEEPER-491
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-491
 Project: Zookeeper
  Issue Type: New Feature
  Components: leaderElection
Affects Versions: 3.2.0
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-491-3.2branch.patch, ZOOKEEPER-491.patch


 This is a fix to prevent zero-weight servers from being elected leaders. This 
 will allow in wide-area scenarios to restrict the set of servers that can 
 lead the ensemble.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-480) FLE should perform leader check when node is not leading and add vote of follower

2009-08-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739427#action_12739427
 ] 

Hudson commented on ZOOKEEPER-480:
--

Integrated in ZooKeeper-trunk #405 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/405/])
. FLE should perform leader check when node is not leading and add vote of 
follower (flavio via mahadev)


 FLE should perform leader check when node is not leading and add vote of 
 follower
 -

 Key: ZOOKEEPER-480
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-480
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.0
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-480-3.2branch.patch, 
 ZOOKEEPER-480-3.2branch.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, 
 ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch


 As a server may join leader election while others have already elected a 
 leader, it is necessary that a server handles some special cases of leader 
 election when notifications are from servers that are either LEADING or 
 FOLLOWING. In such special cases, we check if we have received a message from 
 the leader to declare a leader elected. This check does not consider the case 
 that the process performing the check might be a recently elected leader, and 
 consequently the check fails.
 This patch also adds a new case, which corresponds to adding a vote to 
 recvset when the notification is from a process LEADING or FOLLOWING. This 
 fixes the case raised in ZOOKEEPER-475.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-447) zkServer.sh doesn't allow different config files to be specified on the command line

2009-08-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739429#action_12739429
 ] 

Hudson commented on ZOOKEEPER-447:
--

Integrated in ZooKeeper-trunk #405 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/405/])
. zkServer.sh doesn't allow different config files to be specified on the 
command line


 zkServer.sh doesn't allow different config files to be specified on the 
 command line
 

 Key: ZOOKEEPER-447
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-447
 Project: Zookeeper
  Issue Type: Improvement
Affects Versions: 3.1.1, 3.2.0
Reporter: Henry Robinson
Assignee: Henry Robinson
Priority: Minor
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-447.patch


 Unless I'm missing something, you can change the directory that the zoo.cfg 
 file is in by setting ZOOCFGDIR but not the name of the file itself.
 I find it convenient myself to specify the config file on the command line, 
 but we should also let it be specified by environment variable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.

2009-08-05 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated ZOOKEEPER-484:
-

Status: Open  (was: Patch Available)

resubmitting the patch to the patch queue.

 Clients get SESSION MOVED exception when switching from follower to a leader.
 -

 Key: ZOOKEEPER-484
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.0
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: sessionTest.patch, ZOOKEEPER-484.patch


 When a client is connected to follower and get disconnected and connects to a 
 leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new 
 feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO 
 NOT have this problem. The fix is to make sure the ownership of a connection 
 gets changed when a session moves from follower to the leader. The workaround 
 to it in 3.2.0 would be to swithc off connection from clients to the leader. 
 take a look at *leaderServers* java property in 
 http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.

2009-08-05 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated ZOOKEEPER-484:
-

Status: Patch Available  (was: Open)

 Clients get SESSION MOVED exception when switching from follower to a leader.
 -

 Key: ZOOKEEPER-484
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.0
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: sessionTest.patch, ZOOKEEPER-484.patch


 When a client is connected to follower and get disconnected and connects to a 
 leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new 
 feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO 
 NOT have this problem. The fix is to make sure the ownership of a connection 
 gets changed when a session moves from follower to the leader. The workaround 
 to it in 3.2.0 would be to swithc off connection from clients to the leader. 
 take a look at *leaderServers* java property in 
 http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



hudson patch build back to normal

2009-08-05 Thread Giridharan Kesavan
Sendmail issues on hudson.zones is fixed now and patch build for zookeeper is 
restarted.

Regards,
Giri


RE: hudson patch build back to normal

2009-08-05 Thread Giridharan Kesavan
If you have changed the jira status to patch available in the last couple of 
days please resubmit your patch for hudson to pick your patch for testing.
-Giri

 -Original Message-
 From: Giridharan Kesavan [mailto:gkesa...@yahoo-inc.com]
 Sent: Wednesday, August 05, 2009 7:18 PM
 To: zookeeper-dev@hadoop.apache.org
 Cc: Nigel Daley
 Subject: hudson patch build back to normal
 
 Sendmail issues on hudson.zones is fixed now and patch build for
 zookeeper is restarted.
 
 Regards,
 Giri


Re: hudson patch build back to normal

2009-08-05 Thread Patrick Hunt

Thanks Giri!

Patrick

Giridharan Kesavan wrote:

If you have changed the jira status to patch available in the last couple of 
days please resubmit your patch for hudson to pick your patch for testing.
-Giri


-Original Message-
From: Giridharan Kesavan [mailto:gkesa...@yahoo-inc.com]
Sent: Wednesday, August 05, 2009 7:18 PM
To: zookeeper-dev@hadoop.apache.org
Cc: Nigel Daley
Subject: hudson patch build back to normal

Sendmail issues on hudson.zones is fixed now and patch build for
zookeeper is restarted.

Regards,
Giri


[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739609#action_12739609
 ] 

Patrick Hunt commented on ZOOKEEPER-498:


Looks to me like 0 weight is still busted, fle0weighttest is actually failing 
on my machine, however it's reported as success:
- Standard Error -
Exception in thread Thread-108 junit.framework.AssertionFailedError: Elected 
zero-weight server
at junit.framework.Assert.fail(Assert.java:47)
at 
org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138)
-  ---

this is probably due because the test is calling assert in a thread other than 
the main test thread - which junit will not track/knowabout.

One problem I see with these tests (0weight test I looked at) -- it doesn't 
have a client attempt to connect to the various servers as part of declaring 
success. Really we should only consider successful test (ie assert that) if a 
client can connect to each server in the cluster and change/seechanges. As part 
of fixing this we really need to do a sanity check by testing the various 
command lines and checking that a client can connect.

I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new epoch 
seems to just thrash...

Also I tried 3  5 server quorums by hand from the command line with 0 weight 
and they see similar issues to what Todd is seeing.

this is happening for me on both the trunk and 3.2 branch source.

 Unending Leader Elections : WAN configuration
 -

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:
 CentOS 5.2 64-bit
 2GB ram
 java version 1.6.0_13
 Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
 Network Topology:
 DC : central data center
 POD(N): remote data center
 Zookeeper Topology:
 Leaders may be elected only in DC (weight = 1)
 Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.2.1, 3.3.0

 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg


 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
 re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
 central DC group of ZK servers that have a voting weight = 1, and a group of 
 servers in remote pods with a voting weight of 0.
 What we expect to see is leaders elected only in the DC, and the pods to 
 contain only followers. What we are seeing is a continuous cycling of 
 leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
 patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-498:
---

Attachment: zk498-test.tar.gz

I attached zk498-test.tar.gz - this is a 5 server config (2 0weight) that fails 
to achieve quorum.

run start.sh/stop.sh and checkout the individual logs for details.



 Unending Leader Elections : WAN configuration
 -

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:
 CentOS 5.2 64-bit
 2GB ram
 java version 1.6.0_13
 Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
 Network Topology:
 DC : central data center
 POD(N): remote data center
 Zookeeper Topology:
 Leaders may be elected only in DC (weight = 1)
 Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.2.1, 3.3.0

 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
 zk498-test.tar.gz, zoo.cfg


 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
 re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
 central DC group of ZK servers that have a voting weight = 1, and a group of 
 servers in remote pods with a voting weight of 0.
 What we expect to see is leaders elected only in the DC, and the pods to 
 contain only followers. What we are seeing is a continuous cycling of 
 leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
 patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-499:
--

Assignee: Patrick Hunt

 electionAlg should default to FLE (3) - regression
 --

 Key: ZOOKEEPER-499
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499
 Project: Zookeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Fix For: 3.2.1, 3.3.0


 there's a regression in 3.2 - electionAlg is no longer defaulting to 3 
 (incorrectly defaults to 0)
 also - need to have tests to validate this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-462) Last hint for open ledger

2009-08-05 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-462:


Fix Version/s: 3.3.0

 Last hint for open ledger
 -

 Key: ZOOKEEPER-462
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-462
 Project: Zookeeper
  Issue Type: New Feature
  Components: contrib-bookkeeper
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.3.0

 Attachments: ZOOKEEPER-462.patch


 In some use cases of BookKeeper, it is useful to be able to read from a 
 ledger before closing the ledger. To enable such a feature, the writer has to 
 be able to communicate to a reader how many entries it has been able to write 
 successfully. The main idea of this jira is to continuously update a znode 
 with the number of successful writes, and a reader can, for example, watch 
 the node for changes.
  I was thinking of having a configuration parameter to state how often a 
 writer should update the hint on ZooKeeper (e.g., every 1000 requests, every 
 10,000 requests). Clearly updating more often increases the overhead of 
 writing to ZooKeeper, although the impact on the performance of writes to 
 BookKeeper should be minimal given that we make an asynchronous call to 
 update the hint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression

2009-08-05 Thread Patrick Hunt (JIRA)
electionAlg should default to FLE (3) - regression
--

 Key: ZOOKEEPER-499
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499
 Project: Zookeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Priority: Blocker
 Fix For: 3.2.1, 3.3.0


there's a regression in 3.2 - electionAlg is no longer defaulting to 3 
(incorrectly defaults to 0)

also - need to have tests to validate this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-499:
---

Release Note: 
workaround in 3.2.0 (this only effects 3.2.0)

set electionAlg=3 in server config files.

 electionAlg should default to FLE (3) - regression
 --

 Key: ZOOKEEPER-499
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499
 Project: Zookeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Fix For: 3.2.1, 3.3.0


 there's a regression in 3.2 - electionAlg is no longer defaulting to 3 
 (incorrectly defaults to 0)
 also - need to have tests to validate this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Optimized WAN ZooKeeper Config : Multi-Ensemble configuration

2009-08-05 Thread Mahadev Konar
Todd,
 Comments in line:


On 8/5/09 12:10 PM, Todd Greenwood to...@audiencescience.com wrote:

 Flavio/Patrick/Mahadev -
 
 Thanks for your support to date. As I understand it, the sticky points
 w/ respect to WAN deployments are:
 
 1. Leader Election:
 
 Leader elections in the WAN config (pod zk server weight = 0) is a bit
 troublesome (ZOOKEEPER-498)
Yes, until ZOOKEEPER-498 is fixed, you wont be able to use it with groups
and zero weight.

 
 2. Network Connectivity Required:
 
 ZooKeeper clients cannot read/write to ZK Servers if the Server does not
 have network connectivity to the quorum. In short, there is a hard
 requirement to have network connectivity in order for the clients to
 access the shared memory graph in ZK.
Yes

 
 Alternative
 ---
 
 I have seen some discussion about in the past re: multi-ensemble
 solutions. Essentially, put one ensemble in each physical location
 (POD), and another in your DC, and have a fairly simple process
 coordinate synchronizing the various ensembles. If the POD writes can be
 confined to a sub-tree in the master graph, then this should be fairly
 simple. I'm imagining the following:
 
 DC (master) graph:
 /root/pods/1/data/item1
 /root/pods/1/data/item2
 /root/pods/1/data/item3
 /root/pods/2
 /root/pods/3
 ...etc
 /root/shared/allpods/readonly/data/item1
 /root/shared/allpods/readonly/data/item2
 ...etc
 
 This has the advantage of minimizing cross pod traffic, which could be a
 real perf killer in an WAN. It also provides transacted writes in the
 PODs, even in the disconnected state. Clearly, another portion of the
 business logic has to reconcile the DC (master) graph such that each of
 the pods data items are processed, etc.
 
 Does anyone have any experience with this (pitfalls, suggestions, etc.?)
As far as I understand is that you mean that have a master Cluster with
other in a different data center syncing with the master (just a subtree)?
Is that correct? 

If yes, this is what one of our users in Yahoo! Search do. They have a
master cluster and a smaller cluster in a different datacenter and a brdige
that copies data from the master cluster (only a subtree) to the smaller one
and keeps them in syncs.


Thanks
mahadev
 
 -Todd



RE: Optimized WAN ZooKeeper Config : Multi-Ensemble configuration

2009-08-05 Thread Todd Greenwood
Mahadev, comments inline:

 -Original Message-
 From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
 Sent: Wednesday, August 05, 2009 1:47 PM
 To: zookeeper-dev@hadoop.apache.org
 Subject: Re: Optimized WAN ZooKeeper Config : Multi-Ensemble
configuration
 
 Todd,
  Comments in line:
 
 
 On 8/5/09 12:10 PM, Todd Greenwood to...@audiencescience.com
wrote:
 
  Flavio/Patrick/Mahadev -
 
  Thanks for your support to date. As I understand it, the sticky
points
  w/ respect to WAN deployments are:
 
  1. Leader Election:
 
  Leader elections in the WAN config (pod zk server weight = 0) is a
bit
  troublesome (ZOOKEEPER-498)
 Yes, until ZOOKEEPER-498 is fixed, you wont be able to use it with
groups
 and zero weight.
 
 
  2. Network Connectivity Required:
 
  ZooKeeper clients cannot read/write to ZK Servers if the Server does
not
  have network connectivity to the quorum. In short, there is a hard
  requirement to have network connectivity in order for the clients to
  access the shared memory graph in ZK.
 Yes
 
 
  Alternative
  ---
 
  I have seen some discussion about in the past re: multi-ensemble
  solutions. Essentially, put one ensemble in each physical location
  (POD), and another in your DC, and have a fairly simple process
  coordinate synchronizing the various ensembles. If the POD writes
can be
  confined to a sub-tree in the master graph, then this should be
fairly
  simple. I'm imagining the following:
 
  DC (master) graph:
  /root/pods/1/data/item1
  /root/pods/1/data/item2
  /root/pods/1/data/item3
  /root/pods/2
  /root/pods/3
  ...etc
  /root/shared/allpods/readonly/data/item1
  /root/shared/allpods/readonly/data/item2
  ...etc
 
  This has the advantage of minimizing cross pod traffic, which could
be a
  real perf killer in an WAN. It also provides transacted writes in
the
  PODs, even in the disconnected state. Clearly, another portion of
the
  business logic has to reconcile the DC (master) graph such that each
of
  the pods data items are processed, etc.
 
  Does anyone have any experience with this (pitfalls, suggestions,
etc.?)
 As far as I understand is that you mean that have a master Cluster
with
 other in a different data center syncing with the master (just a
subtree)?
 Is that correct?
 
 If yes, this is what one of our users in Yahoo! Search do. They have a
 master cluster and a smaller cluster in a different datacenter and a
 brdige
 that copies data from the master cluster (only a subtree) to the
smaller
 one
 and keeps them in syncs.
 

Yes, this is exactly what I'm proposing. With the addition that I'll
sync subtrees in both directions, and have a separate process reconcile
data from the various pods, like so:

#pod1 ensemble
/root/a/b

#pod2 ensemble
/root/a/b

#dc ensemble
/root/shared/foo/bar

# Mapping (modeled after perforce client config)
# [ensemble]:[path] [ensemble]:[path]
# sync pods to dc
[POD1]:/root/... [DC]:/root/pods/POD1/...
[POD2]:/root/... [DC]:/root/pods/POD2/...
# sync dc to pods
[DC]:/root/shared/... [POD1]:/shared/...
[DC]:/root/shared/... [POD2]:/shared/...
[DC]:/root/shared/... [POD3]:/shared/...

Now, for our needs, we'd like the DC data aggregated, so I'll have
another process handle aggregating the pod specific data like so:

POD Data Aggregator: aggregate data in [DC]:/root/pods/POD(N) to
[DC]:/root/aggregated/data.

This is just off the top of my head.

-Todd

 
 Thanks
 mahadev
 
  -Todd



[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-499:
---

Attachment: ZOOKEEPER-499_br3.2.patch
ZOOKEEPER-499.patch

patches to fix on trunk and branch (br3.2 is the branch patch)

this fixes the problem - electionAlg again defaults to 3
it also adds a test to verify fle is used by default
it also fixes a test that fails if fle is used (vs algo 0) which is due to a 
difference in the way jdk exposes
  unresolved host names when using udp vs tcp.


 electionAlg should default to FLE (3) - regression
 --

 Key: ZOOKEEPER-499
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499
 Project: Zookeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-499.patch, ZOOKEEPER-499_br3.2.patch


 there's a regression in 3.2 - electionAlg is no longer defaulting to 3 
 (incorrectly defaults to 0)
 also - need to have tests to validate this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-499:
---

Status: Patch Available  (was: Open)

 electionAlg should default to FLE (3) - regression
 --

 Key: ZOOKEEPER-499
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499
 Project: Zookeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-499.patch, ZOOKEEPER-499_br3.2.patch


 there's a regression in 3.2 - electionAlg is no longer defaulting to 3 
 (incorrectly defaults to 0)
 also - need to have tests to validate this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-498:
--

Assignee: Flavio Paiva Junqueira  (was: Patrick Hunt)

 Unending Leader Elections : WAN configuration
 -

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:
 CentOS 5.2 64-bit
 2GB ram
 java version 1.6.0_13
 Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
 Network Topology:
 DC : central data center
 POD(N): remote data center
 Zookeeper Topology:
 Leaders may be elected only in DC (weight = 1)
 Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Assignee: Flavio Paiva Junqueira
Priority: Critical
 Fix For: 3.2.1, 3.3.0

 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
 zk498-test.tar.gz, zoo.cfg


 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
 re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
 central DC group of ZK servers that have a voting weight = 1, and a group of 
 servers in remote pods with a voting weight of 0.
 What we expect to see is leaders elected only in the DC, and the pods to 
 contain only followers. What we are seeing is a continuous cycling of 
 leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
 patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739787#action_12739787
 ] 

Patrick Hunt commented on ZOOKEEPER-498:


Please fix the following as well - incorrect logging levels are being used in 
quorum code, example:

2009-08-05 15:17:02,733 - ERROR [WorkerSender Thread:quorumcnxmana...@341] - 
There is a connection for server 1
2009-08-05 15:17:02,753 - ERROR [WorkerSender Thread:quorumcnxmana...@341] - 
There is a connection for server 2

this is INFO, not ERROR


 Unending Leader Elections : WAN configuration
 -

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:
 CentOS 5.2 64-bit
 2GB ram
 java version 1.6.0_13
 Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
 Network Topology:
 DC : central data center
 POD(N): remote data center
 Zookeeper Topology:
 Leaders may be elected only in DC (weight = 1)
 Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Assignee: Flavio Paiva Junqueira
Priority: Critical
 Fix For: 3.2.1, 3.3.0

 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
 zk498-test.tar.gz, zoo.cfg


 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
 re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
 central DC group of ZK servers that have a voting weight = 1, and a group of 
 servers in remote pods with a voting weight of 0.
 What we expect to see is leaders elected only in the DC, and the pods to 
 contain only followers. What we are seeing is a continuous cycling of 
 leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
 patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739789#action_12739789
 ] 

Patrick Hunt commented on ZOOKEEPER-498:


Todd,I did see an issue with your config, it's not:

group.1:1:2:3

rather it's:

group.1=1:2:3

(should be = not : )


Regardless though - even after I fix this it's still not forming a cluster 
properly, we're still looking.


 Unending Leader Elections : WAN configuration
 -

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:
 CentOS 5.2 64-bit
 2GB ram
 java version 1.6.0_13
 Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
 Network Topology:
 DC : central data center
 POD(N): remote data center
 Zookeeper Topology:
 Leaders may be elected only in DC (weight = 1)
 Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Assignee: Flavio Paiva Junqueira
Priority: Critical
 Fix For: 3.2.1, 3.3.0

 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
 zk498-test.tar.gz, zoo.cfg


 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
 re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
 central DC group of ZK servers that have a voting weight = 1, and a group of 
 servers in remote pods with a voting weight of 0.
 What we expect to see is leaders elected only in the DC, and the pods to 
 contain only followers. What we are seeing is a continuous cycling of 
 leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
 patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-500) Async methods shouldnt throw exceptions

2009-08-05 Thread Utkarsh Srivastava (JIRA)
Async methods shouldnt throw exceptions
---

 Key: ZOOKEEPER-500
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-500
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bookkeeper
Reporter: Utkarsh Srivastava


Async methods like asyncLedgerCreate and Open shouldnt be throwing 
InterruptedException and BKExceptions. 

The present method signatures lead to messy application code since one is 
forced to have error handling code in 2 places: inside the callback to handler 
a non-OK return code, and outside for handling the exceptions thrown by the 
call. 

There should be only one way to indicate error conditions, and that should be 
through a non-ok return code to the callback.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-490:
---

Attachment: ZOOKEEPER-490.patch

this patch updates the javadoc for zk construction
talks about async nature
talks about thread safety


 the java docs for session creation are misleading/incomplete
 

 Key: ZOOKEEPER-490
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1, 3.2.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-490.patch


 the javadoc for ZooKeeper constructor says:
  * The client object will pick an arbitrary server and try to connect to 
 it.
  * If failed, it will try the next one in the list, until a connection is
  * established, or all the servers have been tried.
 the or all server tried phrase is misleading, it should indicate that we 
 retry until success, con closed, or session expired. 
 we also need ot mention that connection is async, that constructor returns 
 immed and you need to look for connection event in watcher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete

2009-08-05 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-490:
---

Status: Patch Available  (was: Open)

 the java docs for session creation are misleading/incomplete
 

 Key: ZOOKEEPER-490
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.0, 3.1.1
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-490.patch


 the javadoc for ZooKeeper constructor says:
  * The client object will pick an arbitrary server and try to connect to 
 it.
  * If failed, it will try the next one in the list, until a connection is
  * established, or all the servers have been tried.
 the or all server tried phrase is misleading, it should indicate that we 
 retry until success, con closed, or session expired. 
 we also need ot mention that connection is async, that constructor returns 
 immed and you need to look for connection event in watcher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739891#action_12739891
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-498:
--

Pat, we have a description of how to configure in the Cluster options of the 
Administrator guide. We are missing an example, which is in the source code as 
you point out.

 Unending Leader Elections : WAN configuration
 -

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:
 CentOS 5.2 64-bit
 2GB ram
 java version 1.6.0_13
 Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
 Network Topology:
 DC : central data center
 POD(N): remote data center
 Zookeeper Topology:
 Leaders may be elected only in DC (weight = 1)
 Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Assignee: Flavio Paiva Junqueira
Priority: Critical
 Fix For: 3.2.1, 3.3.0

 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
 zk498-test.tar.gz, zoo.cfg


 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
 re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
 central DC group of ZK servers that have a voting weight = 1, and a group of 
 servers in remote pods with a voting weight of 0.
 What we expect to see is leaders elected only in the DC, and the pods to 
 contain only followers. What we are seeing is a continuous cycling of 
 leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
 patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Flavio Paiva Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Paiva Junqueira updated ZOOKEEPER-498:
-

Attachment: ZOOKEEPER-498.patch

I have generated a patch for this issue. I verified that I didn't do the 
correct checks in ZOOKEEPER-491, so I try to fix it in this patch. I have also 
modified the test to fix the problem with the fail assertion, and I have 
inspected the logs to see if it is behaving as expected. I can see no problem 
at this time with this patch.

If someone else is interested in checking it out, please do it.

 Unending Leader Elections : WAN configuration
 -

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:
 CentOS 5.2 64-bit
 2GB ram
 java version 1.6.0_13
 Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
 Network Topology:
 DC : central data center
 POD(N): remote data center
 Zookeeper Topology:
 Leaders may be elected only in DC (weight = 1)
 Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Assignee: Flavio Paiva Junqueira
Priority: Critical
 Fix For: 3.2.1, 3.3.0

 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
 zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch


 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
 re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
 central DC group of ZK servers that have a voting weight = 1, and a group of 
 servers in remote pods with a voting weight of 0.
 What we expect to see is leaders elected only in the DC, and the pods to 
 contain only followers. What we are seeing is a continuous cycling of 
 leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
 patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-05 Thread Flavio Paiva Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Paiva Junqueira updated ZOOKEEPER-498:
-

Status: Patch Available  (was: Open)

 Unending Leader Elections : WAN configuration
 -

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:
 CentOS 5.2 64-bit
 2GB ram
 java version 1.6.0_13
 Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
 Network Topology:
 DC : central data center
 POD(N): remote data center
 Zookeeper Topology:
 Leaders may be elected only in DC (weight = 1)
 Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Assignee: Flavio Paiva Junqueira
Priority: Critical
 Fix For: 3.2.1, 3.3.0

 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
 zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch


 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
 re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
 central DC group of ZK servers that have a voting weight = 1, and a group of 
 servers in remote pods with a voting weight of 0.
 What we expect to see is leaders elected only in the DC, and the pods to 
 contain only followers. What we are seeing is a continuous cycling of 
 leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
 patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



build failures on hudson zones

2009-08-05 Thread Giridharan Kesavan
Build on hudson.zones are failing as the zonestorage for hudson is full.
I 've sent an email to the ASF infra team about the space issues on hudson 
zones.

Once the issues is resolved I would restart hudson for builds.

Thanks,
Giri




BUILDS ARE BACK NORMAL

2009-08-05 Thread Giridharan Kesavan
Restarted all the build jobs on hudson; Builds are running fine.
Build failures are due to   /tmp: File system full, swap space limit exceeded 

Thanks,
-Giri

 -Original Message-
 From: Giridharan Kesavan [mailto:gkesa...@yahoo-inc.com]
 Sent: Thursday, August 06, 2009 9:16 AM
 To: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
 common-...@hadoop.apache.org; pig-...@hadoop.apache.org; zookeeper-
 d...@hadoop.apache.org
 Subject: build failures on hudson zones
 
 Build on hudson.zones are failing as the zonestorage for hudson is
 full.
 I 've sent an email to the ASF infra team about the space issues on
 hudson zones.
 
 Once the issues is resolved I would restart hudson for builds.
 
 Thanks,
 Giri
 



[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-05 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Attachment: ZOOKEEPER-483.patch

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:34032]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d516011c NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 

[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-05 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739898#action_12739898
 ] 

Benjamin Reed commented on ZOOKEEPER-483:
-

I've addressed 1) in the attached patch.

for 2) we are not eating the IOException. we are actually shutting things down. 
the bug is actually that we are passing it up to the upper layer, which does 
not know anything about the follower thread. we need to handle it here.

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-05 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Patch Available  (was: Open)

 ZK fataled on me, and ugly
 --

 Key: ZOOKEEPER-483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: ryan rawson
Assignee: Benjamin Reed
 Fix For: 3.2.1, 3.3.0

 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch


 here are the part of the log whereby my zookeeper instance crashed, taking 3 
 out of 5 down, and thus ruining the quorum for all clients:
 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5161350 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
 Exception when following the leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.168:39489]
 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0578 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46797]
 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa013e NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:33998]
 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
 Exception causing close of session 0x52276d1d5160593 due to 
 java.io.IOException: Read error
 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e02bb NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.158:53758]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.154:58681]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691382 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59967]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb1354 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.163:49957]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x42276d1d3fa13cd NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.150:34212]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x22276d15e691383 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.159:46813]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x12276d15dfb0350 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.162:59956]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e139b NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.156:55138]
 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x32276d15d2e1398 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.167:41257]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d5161355 NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 remote=/10.20.20.153:34032]
 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
 closing session:0x52276d1d516011c NIOServerCnxn: 
 java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
 

Re: BUILDS ARE BACK NORMAL

2009-08-05 Thread Mahadev Konar
HI all, 
 As giri mentioned, the builds are back to normal and so is the patch
process.
http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/Zookeeper-Patch-ves
ta.apache.org/

The patches are being run against hudson, so you DO NOT need to cancel and
resubmit patches.

Thanks
mahadev


On 8/5/09 9:50 PM, Giridharan  Kesavan gkesa...@yahoo-inc.com wrote:

 Restarted all the build jobs on hudson; Builds are running fine.
 Build failures are due to   /tmp: File system full, swap space limit exceeded
 
 
 Thanks,
 -Giri
 
 -Original Message-
 From: Giridharan Kesavan [mailto:gkesa...@yahoo-inc.com]
 Sent: Thursday, August 06, 2009 9:16 AM
 To: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
 common-...@hadoop.apache.org; pig-...@hadoop.apache.org; zookeeper-
 d...@hadoop.apache.org
 Subject: build failures on hudson zones
 
 Build on hudson.zones are failing as the zonestorage for hudson is
 full.
 I 've sent an email to the ASF infra team about the space issues on
 hudson zones.
 
 Once the issues is resolved I would restart hudson for builds.
 
 Thanks,
 Giri