[jira] Commented: (ZOOKEEPER-493) patch for command line setquota
[ https://issues.apache.org/jira/browse/ZOOKEEPER-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739428#action_12739428 ] Hudson commented on ZOOKEEPER-493: -- Integrated in ZooKeeper-trunk #405 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/405/]) . patch for command line setquota patch for command line setquota Key: ZOOKEEPER-493 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-493 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.2.0 Reporter: steve bendiola Assignee: steve bendiola Priority: Minor Fix For: 3.2.1, 3.3.0 Attachments: quotafix.patch, ZOOKEEPER-493.patch the command line setquota tries to use argument 3 as both a path and a value -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-491) Prevent zero-weight servers from being elected
[ https://issues.apache.org/jira/browse/ZOOKEEPER-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739426#action_12739426 ] Hudson commented on ZOOKEEPER-491: -- Integrated in ZooKeeper-trunk #405 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/405/]) . Prevent zero-weight servers from being elected. (flavio via mahadev) Prevent zero-weight servers from being elected -- Key: ZOOKEEPER-491 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-491 Project: Zookeeper Issue Type: New Feature Components: leaderElection Affects Versions: 3.2.0 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-491-3.2branch.patch, ZOOKEEPER-491.patch This is a fix to prevent zero-weight servers from being elected leaders. This will allow in wide-area scenarios to restrict the set of servers that can lead the ensemble. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-480) FLE should perform leader check when node is not leading and add vote of follower
[ https://issues.apache.org/jira/browse/ZOOKEEPER-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739427#action_12739427 ] Hudson commented on ZOOKEEPER-480: -- Integrated in ZooKeeper-trunk #405 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/405/]) . FLE should perform leader check when node is not leading and add vote of follower (flavio via mahadev) FLE should perform leader check when node is not leading and add vote of follower - Key: ZOOKEEPER-480 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-480 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.0 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-480-3.2branch.patch, ZOOKEEPER-480-3.2branch.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch As a server may join leader election while others have already elected a leader, it is necessary that a server handles some special cases of leader election when notifications are from servers that are either LEADING or FOLLOWING. In such special cases, we check if we have received a message from the leader to declare a leader elected. This check does not consider the case that the process performing the check might be a recently elected leader, and consequently the check fails. This patch also adds a new case, which corresponds to adding a vote to recvset when the notification is from a process LEADING or FOLLOWING. This fixes the case raised in ZOOKEEPER-475. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-447) zkServer.sh doesn't allow different config files to be specified on the command line
[ https://issues.apache.org/jira/browse/ZOOKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739429#action_12739429 ] Hudson commented on ZOOKEEPER-447: -- Integrated in ZooKeeper-trunk #405 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/405/]) . zkServer.sh doesn't allow different config files to be specified on the command line zkServer.sh doesn't allow different config files to be specified on the command line Key: ZOOKEEPER-447 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-447 Project: Zookeeper Issue Type: Improvement Affects Versions: 3.1.1, 3.2.0 Reporter: Henry Robinson Assignee: Henry Robinson Priority: Minor Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-447.patch Unless I'm missing something, you can change the directory that the zoo.cfg file is in by setting ZOOCFGDIR but not the name of the file itself. I find it convenient myself to specify the config file on the command line, but we should also let it be specified by environment variable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giridharan Kesavan updated ZOOKEEPER-484: - Status: Open (was: Patch Available) resubmitting the patch to the patch queue. Clients get SESSION MOVED exception when switching from follower to a leader. - Key: ZOOKEEPER-484 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.2.1, 3.3.0 Attachments: sessionTest.patch, ZOOKEEPER-484.patch When a client is connected to follower and get disconnected and connects to a leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO NOT have this problem. The fix is to make sure the ownership of a connection gets changed when a session moves from follower to the leader. The workaround to it in 3.2.0 would be to swithc off connection from clients to the leader. take a look at *leaderServers* java property in http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giridharan Kesavan updated ZOOKEEPER-484: - Status: Patch Available (was: Open) Clients get SESSION MOVED exception when switching from follower to a leader. - Key: ZOOKEEPER-484 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.2.1, 3.3.0 Attachments: sessionTest.patch, ZOOKEEPER-484.patch When a client is connected to follower and get disconnected and connects to a leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO NOT have this problem. The fix is to make sure the ownership of a connection gets changed when a session moves from follower to the leader. The workaround to it in 3.2.0 would be to swithc off connection from clients to the leader. take a look at *leaderServers* java property in http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
hudson patch build back to normal
Sendmail issues on hudson.zones is fixed now and patch build for zookeeper is restarted. Regards, Giri
RE: hudson patch build back to normal
If you have changed the jira status to patch available in the last couple of days please resubmit your patch for hudson to pick your patch for testing. -Giri -Original Message- From: Giridharan Kesavan [mailto:gkesa...@yahoo-inc.com] Sent: Wednesday, August 05, 2009 7:18 PM To: zookeeper-dev@hadoop.apache.org Cc: Nigel Daley Subject: hudson patch build back to normal Sendmail issues on hudson.zones is fixed now and patch build for zookeeper is restarted. Regards, Giri
Re: hudson patch build back to normal
Thanks Giri! Patrick Giridharan Kesavan wrote: If you have changed the jira status to patch available in the last couple of days please resubmit your patch for hudson to pick your patch for testing. -Giri -Original Message- From: Giridharan Kesavan [mailto:gkesa...@yahoo-inc.com] Sent: Wednesday, August 05, 2009 7:18 PM To: zookeeper-dev@hadoop.apache.org Cc: Nigel Daley Subject: hudson patch build back to normal Sendmail issues on hudson.zones is fixed now and patch build for zookeeper is restarted. Regards, Giri
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739609#action_12739609 ] Patrick Hunt commented on ZOOKEEPER-498: Looks to me like 0 weight is still busted, fle0weighttest is actually failing on my machine, however it's reported as success: - Standard Error - Exception in thread Thread-108 junit.framework.AssertionFailedError: Elected zero-weight server at junit.framework.Assert.fail(Assert.java:47) at org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138) - --- this is probably due because the test is calling assert in a thread other than the main test thread - which junit will not track/knowabout. One problem I see with these tests (0weight test I looked at) -- it doesn't have a client attempt to connect to the various servers as part of declaring success. Really we should only consider successful test (ie assert that) if a client can connect to each server in the cluster and change/seechanges. As part of fixing this we really need to do a sanity check by testing the various command lines and checking that a client can connect. I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new epoch seems to just thrash... Also I tried 3 5 server quorums by hand from the command line with 0 weight and they see similar issues to what Todd is seeing. this is happening for me on both the trunk and 3.2 branch source. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Patrick Hunt Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-498: --- Attachment: zk498-test.tar.gz I attached zk498-test.tar.gz - this is a 5 server config (2 0weight) that fails to achieve quorum. run start.sh/stop.sh and checkout the individual logs for details. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Patrick Hunt Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression
[ https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-499: -- Assignee: Patrick Hunt electionAlg should default to FLE (3) - regression -- Key: ZOOKEEPER-499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499 Project: Zookeeper Issue Type: Bug Components: server, tests Affects Versions: 3.2.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Fix For: 3.2.1, 3.3.0 there's a regression in 3.2 - electionAlg is no longer defaulting to 3 (incorrectly defaults to 0) also - need to have tests to validate this -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-462) Last hint for open ledger
[ https://issues.apache.org/jira/browse/ZOOKEEPER-462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-462: Fix Version/s: 3.3.0 Last hint for open ledger - Key: ZOOKEEPER-462 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-462 Project: Zookeeper Issue Type: New Feature Components: contrib-bookkeeper Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.3.0 Attachments: ZOOKEEPER-462.patch In some use cases of BookKeeper, it is useful to be able to read from a ledger before closing the ledger. To enable such a feature, the writer has to be able to communicate to a reader how many entries it has been able to write successfully. The main idea of this jira is to continuously update a znode with the number of successful writes, and a reader can, for example, watch the node for changes. I was thinking of having a configuration parameter to state how often a writer should update the hint on ZooKeeper (e.g., every 1000 requests, every 10,000 requests). Clearly updating more often increases the overhead of writing to ZooKeeper, although the impact on the performance of writes to BookKeeper should be minimal given that we make an asynchronous call to update the hint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression
electionAlg should default to FLE (3) - regression -- Key: ZOOKEEPER-499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499 Project: Zookeeper Issue Type: Bug Components: server, tests Affects Versions: 3.2.0 Reporter: Patrick Hunt Priority: Blocker Fix For: 3.2.1, 3.3.0 there's a regression in 3.2 - electionAlg is no longer defaulting to 3 (incorrectly defaults to 0) also - need to have tests to validate this -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression
[ https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-499: --- Release Note: workaround in 3.2.0 (this only effects 3.2.0) set electionAlg=3 in server config files. electionAlg should default to FLE (3) - regression -- Key: ZOOKEEPER-499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499 Project: Zookeeper Issue Type: Bug Components: server, tests Affects Versions: 3.2.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Fix For: 3.2.1, 3.3.0 there's a regression in 3.2 - electionAlg is no longer defaulting to 3 (incorrectly defaults to 0) also - need to have tests to validate this -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Optimized WAN ZooKeeper Config : Multi-Ensemble configuration
Todd, Comments in line: On 8/5/09 12:10 PM, Todd Greenwood to...@audiencescience.com wrote: Flavio/Patrick/Mahadev - Thanks for your support to date. As I understand it, the sticky points w/ respect to WAN deployments are: 1. Leader Election: Leader elections in the WAN config (pod zk server weight = 0) is a bit troublesome (ZOOKEEPER-498) Yes, until ZOOKEEPER-498 is fixed, you wont be able to use it with groups and zero weight. 2. Network Connectivity Required: ZooKeeper clients cannot read/write to ZK Servers if the Server does not have network connectivity to the quorum. In short, there is a hard requirement to have network connectivity in order for the clients to access the shared memory graph in ZK. Yes Alternative --- I have seen some discussion about in the past re: multi-ensemble solutions. Essentially, put one ensemble in each physical location (POD), and another in your DC, and have a fairly simple process coordinate synchronizing the various ensembles. If the POD writes can be confined to a sub-tree in the master graph, then this should be fairly simple. I'm imagining the following: DC (master) graph: /root/pods/1/data/item1 /root/pods/1/data/item2 /root/pods/1/data/item3 /root/pods/2 /root/pods/3 ...etc /root/shared/allpods/readonly/data/item1 /root/shared/allpods/readonly/data/item2 ...etc This has the advantage of minimizing cross pod traffic, which could be a real perf killer in an WAN. It also provides transacted writes in the PODs, even in the disconnected state. Clearly, another portion of the business logic has to reconcile the DC (master) graph such that each of the pods data items are processed, etc. Does anyone have any experience with this (pitfalls, suggestions, etc.?) As far as I understand is that you mean that have a master Cluster with other in a different data center syncing with the master (just a subtree)? Is that correct? If yes, this is what one of our users in Yahoo! Search do. They have a master cluster and a smaller cluster in a different datacenter and a brdige that copies data from the master cluster (only a subtree) to the smaller one and keeps them in syncs. Thanks mahadev -Todd
RE: Optimized WAN ZooKeeper Config : Multi-Ensemble configuration
Mahadev, comments inline: -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Wednesday, August 05, 2009 1:47 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Optimized WAN ZooKeeper Config : Multi-Ensemble configuration Todd, Comments in line: On 8/5/09 12:10 PM, Todd Greenwood to...@audiencescience.com wrote: Flavio/Patrick/Mahadev - Thanks for your support to date. As I understand it, the sticky points w/ respect to WAN deployments are: 1. Leader Election: Leader elections in the WAN config (pod zk server weight = 0) is a bit troublesome (ZOOKEEPER-498) Yes, until ZOOKEEPER-498 is fixed, you wont be able to use it with groups and zero weight. 2. Network Connectivity Required: ZooKeeper clients cannot read/write to ZK Servers if the Server does not have network connectivity to the quorum. In short, there is a hard requirement to have network connectivity in order for the clients to access the shared memory graph in ZK. Yes Alternative --- I have seen some discussion about in the past re: multi-ensemble solutions. Essentially, put one ensemble in each physical location (POD), and another in your DC, and have a fairly simple process coordinate synchronizing the various ensembles. If the POD writes can be confined to a sub-tree in the master graph, then this should be fairly simple. I'm imagining the following: DC (master) graph: /root/pods/1/data/item1 /root/pods/1/data/item2 /root/pods/1/data/item3 /root/pods/2 /root/pods/3 ...etc /root/shared/allpods/readonly/data/item1 /root/shared/allpods/readonly/data/item2 ...etc This has the advantage of minimizing cross pod traffic, which could be a real perf killer in an WAN. It also provides transacted writes in the PODs, even in the disconnected state. Clearly, another portion of the business logic has to reconcile the DC (master) graph such that each of the pods data items are processed, etc. Does anyone have any experience with this (pitfalls, suggestions, etc.?) As far as I understand is that you mean that have a master Cluster with other in a different data center syncing with the master (just a subtree)? Is that correct? If yes, this is what one of our users in Yahoo! Search do. They have a master cluster and a smaller cluster in a different datacenter and a brdige that copies data from the master cluster (only a subtree) to the smaller one and keeps them in syncs. Yes, this is exactly what I'm proposing. With the addition that I'll sync subtrees in both directions, and have a separate process reconcile data from the various pods, like so: #pod1 ensemble /root/a/b #pod2 ensemble /root/a/b #dc ensemble /root/shared/foo/bar # Mapping (modeled after perforce client config) # [ensemble]:[path] [ensemble]:[path] # sync pods to dc [POD1]:/root/... [DC]:/root/pods/POD1/... [POD2]:/root/... [DC]:/root/pods/POD2/... # sync dc to pods [DC]:/root/shared/... [POD1]:/shared/... [DC]:/root/shared/... [POD2]:/shared/... [DC]:/root/shared/... [POD3]:/shared/... Now, for our needs, we'd like the DC data aggregated, so I'll have another process handle aggregating the pod specific data like so: POD Data Aggregator: aggregate data in [DC]:/root/pods/POD(N) to [DC]:/root/aggregated/data. This is just off the top of my head. -Todd Thanks mahadev -Todd
[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression
[ https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-499: --- Attachment: ZOOKEEPER-499_br3.2.patch ZOOKEEPER-499.patch patches to fix on trunk and branch (br3.2 is the branch patch) this fixes the problem - electionAlg again defaults to 3 it also adds a test to verify fle is used by default it also fixes a test that fails if fle is used (vs algo 0) which is due to a difference in the way jdk exposes unresolved host names when using udp vs tcp. electionAlg should default to FLE (3) - regression -- Key: ZOOKEEPER-499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499 Project: Zookeeper Issue Type: Bug Components: server, tests Affects Versions: 3.2.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-499.patch, ZOOKEEPER-499_br3.2.patch there's a regression in 3.2 - electionAlg is no longer defaulting to 3 (incorrectly defaults to 0) also - need to have tests to validate this -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression
[ https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-499: --- Status: Patch Available (was: Open) electionAlg should default to FLE (3) - regression -- Key: ZOOKEEPER-499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499 Project: Zookeeper Issue Type: Bug Components: server, tests Affects Versions: 3.2.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-499.patch, ZOOKEEPER-499_br3.2.patch there's a regression in 3.2 - electionAlg is no longer defaulting to 3 (incorrectly defaults to 0) also - need to have tests to validate this -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-498: -- Assignee: Flavio Paiva Junqueira (was: Patrick Hunt) Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739787#action_12739787 ] Patrick Hunt commented on ZOOKEEPER-498: Please fix the following as well - incorrect logging levels are being used in quorum code, example: 2009-08-05 15:17:02,733 - ERROR [WorkerSender Thread:quorumcnxmana...@341] - There is a connection for server 1 2009-08-05 15:17:02,753 - ERROR [WorkerSender Thread:quorumcnxmana...@341] - There is a connection for server 2 this is INFO, not ERROR Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739789#action_12739789 ] Patrick Hunt commented on ZOOKEEPER-498: Todd,I did see an issue with your config, it's not: group.1:1:2:3 rather it's: group.1=1:2:3 (should be = not : ) Regardless though - even after I fix this it's still not forming a cluster properly, we're still looking. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-500) Async methods shouldnt throw exceptions
Async methods shouldnt throw exceptions --- Key: ZOOKEEPER-500 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-500 Project: Zookeeper Issue Type: Improvement Components: contrib-bookkeeper Reporter: Utkarsh Srivastava Async methods like asyncLedgerCreate and Open shouldnt be throwing InterruptedException and BKExceptions. The present method signatures lead to messy application code since one is forced to have error handling code in 2 places: inside the callback to handler a non-OK return code, and outside for handling the exceptions thrown by the call. There should be only one way to indicate error conditions, and that should be through a non-ok return code to the callback. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-490: --- Attachment: ZOOKEEPER-490.patch this patch updates the javadoc for zk construction talks about async nature talks about thread safety the java docs for session creation are misleading/incomplete Key: ZOOKEEPER-490 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1, 3.2.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-490.patch the javadoc for ZooKeeper constructor says: * The client object will pick an arbitrary server and try to connect to it. * If failed, it will try the next one in the list, until a connection is * established, or all the servers have been tried. the or all server tried phrase is misleading, it should indicate that we retry until success, con closed, or session expired. we also need ot mention that connection is async, that constructor returns immed and you need to look for connection event in watcher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-490: --- Status: Patch Available (was: Open) the java docs for session creation are misleading/incomplete Key: ZOOKEEPER-490 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.0, 3.1.1 Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-490.patch the javadoc for ZooKeeper constructor says: * The client object will pick an arbitrary server and try to connect to it. * If failed, it will try the next one in the list, until a connection is * established, or all the servers have been tried. the or all server tried phrase is misleading, it should indicate that we retry until success, con closed, or session expired. we also need ot mention that connection is async, that constructor returns immed and you need to look for connection event in watcher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739891#action_12739891 ] Flavio Paiva Junqueira commented on ZOOKEEPER-498: -- Pat, we have a description of how to configure in the Cluster options of the Administrator guide. We are missing an example, which is in the source code as you point out. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Paiva Junqueira updated ZOOKEEPER-498: - Attachment: ZOOKEEPER-498.patch I have generated a patch for this issue. I verified that I didn't do the correct checks in ZOOKEEPER-491, so I try to fix it in this patch. I have also modified the test to fix the problem with the fail assertion, and I have inspected the logs to see if it is behaving as expected. I can see no problem at this time with this patch. If someone else is interested in checking it out, please do it. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Paiva Junqueira updated ZOOKEEPER-498: - Status: Patch Available (was: Open) Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
build failures on hudson zones
Build on hudson.zones are failing as the zonestorage for hudson is full. I 've sent an email to the ASF infra team about the space issues on hudson zones. Once the issues is resolved I would restart hudson for builds. Thanks, Giri
BUILDS ARE BACK NORMAL
Restarted all the build jobs on hudson; Builds are running fine. Build failures are due to /tmp: File system full, swap space limit exceeded Thanks, -Giri -Original Message- From: Giridharan Kesavan [mailto:gkesa...@yahoo-inc.com] Sent: Thursday, August 06, 2009 9:16 AM To: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; common-...@hadoop.apache.org; pig-...@hadoop.apache.org; zookeeper- d...@hadoop.apache.org Subject: build failures on hudson zones Build on hudson.zones are failing as the zonestorage for hudson is full. I 've sent an email to the ASF infra team about the space issues on hudson zones. Once the issues is resolved I would restart hudson for builds. Thanks, Giri
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-483: Attachment: ZOOKEEPER-483.patch ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:34032] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d516011c NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181
[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739898#action_12739898 ] Benjamin Reed commented on ZOOKEEPER-483: - I've addressed 1) in the attached patch. for 2) we are not eating the IOException. we are actually shutting things down. the bug is actually that we are passing it up to the upper layer, which does not know anything about the follower thread. we need to handle it here. ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn:
[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-483: Status: Patch Available (was: Open) ZK fataled on me, and ugly -- Key: ZOOKEEPER-483 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1 Reporter: ryan rawson Assignee: Benjamin Reed Fix For: 3.2.1, 3.3.0 Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch here are the part of the log whereby my zookeeper instance crashed, taking 3 out of 5 down, and thus ruining the quorum for all clients: 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5161350 due to java.io.IOException: Read error 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494) 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.168:39489] 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0578 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46797] 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa013e NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:33998] 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x52276d1d5160593 due to java.io.IOException: Read error 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e02bb NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.158:53758] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13e4 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.154:58681] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691382 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59967] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb1354 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.163:49957] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x42276d1d3fa13cd NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.150:34212] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x22276d15e691383 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.159:46813] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x12276d15dfb0350 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.162:59956] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e139b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.156:55138] 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x32276d15d2e1398 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.167:41257] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d5161355 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 remote=/10.20.20.153:34032] 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x52276d1d516011c NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181
Re: BUILDS ARE BACK NORMAL
HI all, As giri mentioned, the builds are back to normal and so is the patch process. http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/Zookeeper-Patch-ves ta.apache.org/ The patches are being run against hudson, so you DO NOT need to cancel and resubmit patches. Thanks mahadev On 8/5/09 9:50 PM, Giridharan Kesavan gkesa...@yahoo-inc.com wrote: Restarted all the build jobs on hudson; Builds are running fine. Build failures are due to /tmp: File system full, swap space limit exceeded Thanks, -Giri -Original Message- From: Giridharan Kesavan [mailto:gkesa...@yahoo-inc.com] Sent: Thursday, August 06, 2009 9:16 AM To: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; common-...@hadoop.apache.org; pig-...@hadoop.apache.org; zookeeper- d...@hadoop.apache.org Subject: build failures on hudson zones Build on hudson.zones are failing as the zonestorage for hudson is full. I 've sent an email to the ASF infra team about the space issues on hudson zones. Once the issues is resolved I would restart hudson for builds. Thanks, Giri