[jira] [Commented] (ZOOKEEPER-1804) Stat the realtime tps of zookeepr server

2013-11-01 Thread Leader Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811080#comment-13811080
 ] 

Leader Ni commented on ZOOKEEPER-1804:
--

patch2 sample

$echo rwps|nc localhost 2181
RealTime R/W Statistics:
getChildren2:0.5994005994005994
createSession:  1.6983016983016983
closeSession:0.999000999000999
setData:110.18981018981019
setWatches:  129.17082917082917
getChildren:   68.83116883116884
delete:   19.980019980019982
create:   22.27772227772228
exists:1806.2937062937062
getDate: 729.5704295704296

 Stat the realtime tps of zookeepr server
 

 Key: ZOOKEEPER-1804
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1804
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Leader Ni
Assignee: Leader Ni
 Attachments: ZOOKEEPER-1804.patch


 At this time, we assessed whether zookeeper supports some business scenarios, 
 always use the number of subscribers, or to assess the number of clients。
 You konw, some times, many client connection with zookeeper, but do noting, 
 and the onthers do complex business logic。
 So,we must stat the realtime tps of zookeepr。



--
This message was sent by Atlassian JIRA
(v6.1#6144)


ZooKeeper_branch33_solaris - Build # 694 - Failure

2013-11-01 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch33_solaris/694/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by timer
Building remotely on solaris1 in workspace 
/export/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris
FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
termination of the channel
hudson.remoting.RequestAbortedException: 
hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
termination of the channel
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
at hudson.remoting.Request.call(Request.java:174)
at hudson.remoting.Channel.call(Channel.java:714)
at hudson.EnvVars.getRemote(EnvVars.java:212)
at hudson.model.Computer.getEnvironment(Computer.java:909)
at 
jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:40)
at hudson.model.Job.getEnvironment(Job.java:374)
at hudson.model.AbstractProject.getEnvironment(AbstractProject.java:354)
at hudson.model.Run.getEnvironment(Run.java:2111)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:911)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:776)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1411)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:657)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:562)
at hudson.model.Run.execute(Run.java:1603)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:246)
Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: 
Unexpected termination of the channel
at hudson.remoting.Request.abort(Request.java:299)
at hudson.remoting.Channel.terminate(Channel.java:774)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
Caused by: java.io.IOException: Unexpected termination of the channel
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
at 
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at hudson.remoting.Command.readFrom(Command.java:92)
at 
hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Updated] (ZOOKEEPER-1804) Stat the realtime tps of zookeepr server

2013-11-01 Thread Leader Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leader Ni updated ZOOKEEPER-1804:
-

Description: 
At this time, we assessed whether zookeeper supports some business scenarios, 
always use the number of subscribers, or to assess the number of clients。

You konw, some times, many client connection with zookeeper, but do noting, and 
the onthers do complex business logic。

So,we must stat the realtime tps of zookeepr。


[-Solution---]

Solution1: 

If you only want to know the real time transaction processed, you can use the 
patch ZOOKEEPER-1804.patch.

Solution2:

If you also want to know how client use zookeeper, and the real time r/w ps of 
each zookeeper client, you can use the patch ZOOKEEPER-1804-2.patch

Sample:
$echo rwps|nc localhost 2181
RealTime R/W Statistics:
getChildren2:   0.5994005994005994
createSession:  1.6983016983016983
closeSession:   0.999000999000999
setData: 110.18981018981019
setWatches:   129.17082917082917
getChildren:    68.83116883116884
delete:  19.980019980019982
create:  22.27772227772228
exists:  1806.2937062937062
getDate: 729.5704295704296

  was:
At this time, we assessed whether zookeeper supports some business scenarios, 
always use the number of subscribers, or to assess the number of clients。

You konw, some times, many client connection with zookeeper, but do noting, and 
the onthers do complex business logic。

So,we must stat the realtime tps of zookeepr。


 Stat the realtime tps of zookeepr server
 

 Key: ZOOKEEPER-1804
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1804
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Leader Ni
Assignee: Leader Ni
 Attachments: ZOOKEEPER-1804.patch


 At this time, we assessed whether zookeeper supports some business scenarios, 
 always use the number of subscribers, or to assess the number of clients。
 You konw, some times, many client connection with zookeeper, but do noting, 
 and the onthers do complex business logic。
 So,we must stat the realtime tps of zookeepr。
 [-Solution---]
 Solution1: 
 If you only want to know the real time transaction processed, you can use the 
 patch ZOOKEEPER-1804.patch.
 Solution2:
 If you also want to know how client use zookeeper, and the real time r/w ps 
 of each zookeeper client, you can use the patch ZOOKEEPER-1804-2.patch
 Sample:
 $echo rwps|nc localhost 2181
 RealTime R/W Statistics:
 getChildren2:   0.5994005994005994
 createSession:  1.6983016983016983
 closeSession:   0.999000999000999
 setData: 110.18981018981019
 setWatches:   129.17082917082917
 getChildren:    68.83116883116884
 delete:  19.980019980019982
 create:  22.27772227772228
 exists:  1806.2937062937062
 getDate: 729.5704295704296



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1798) Fix race condition in testNormalObserverRun

2013-11-01 Thread Thawan Kooburat (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811090#comment-13811090
 ] 

Thawan Kooburat commented on ZOOKEEPER-1798:


Just for the record, this test is not known to be flaky in our internal Jenkins 
(that test our internal branch).  

I am able to repro this on my mac.  (Java 1.7.0_15, OSX 10.7.5).  When this 
happen, it looks txnlog doesn't have any valid content in it.  So the zkdb that 
we loaded after shutting down the observer never have txn that its znodes to 
data2.   I also modified the test to leave the data files around and try to 
load it manually after the test fail. The txnlog is loaded successfully with 
the right content. 

I am thinking that the data flushed to disk by one thread is not visible by the 
other thread even after thread.join() is called in between. However, this 
really seem unlikely. But I ran the same test in our production host, I cannot 
repro the issue (yet)

In Patrick log, this is slightly different. The test failed at line 1105, this 
means that the first txn in txnlog is read correctly, but not the second one. 

 Fix race condition in testNormalObserverRun
 ---

 Key: ZOOKEEPER-1798
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1798
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Assignee: Thawan Kooburat
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: TEST-org.apache.zookeeper.server.quorum.Zab1_0Test.txt, 
 ZOOKEEPER-1798-b3.4.patch, ZOOKEEPER-1798-b3.4.patch, 
 ZOOKEEPER-1798-b3.4.patch, ZOOKEEPER-1798.patch, ZOOKEEPER-1798.patch


 This is the output messges:
 noformat
 Testcase: testNormalObserverRun took 4.221 sec
 FAILED
 expected:data[2] but was:data[1]
 junit.framework.AssertionFailedError: expected:data[2] but was:data[1]
 at 
 org.apache.zookeeper.server.quorum.Zab1_0Test$8.converseWithObserver(Zab1_0Test.java:1118)
 at 
 org.apache.zookeeper.server.quorum.Zab1_0Test.testObserverConversation(Zab1_0Test.java:546)
 at 
 org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalObserverRun(Zab1_0Test.java:994)
 noformat



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1805) Don't care value in ZooKeeper election breaks rolling upgrades

2013-11-01 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811092#comment-13811092
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1805:
---

+1. 

 Don't care value in ZooKeeper election breaks rolling upgrades
 

 Key: ZOOKEEPER-1805
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1805
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1805-b3.4.patch, ZOOKEEPER-1805.patch, 
 ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, 
 ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch


 This is an issue that has been originally reported in ZOOKEEPER-1732.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


ZooKeeper_branch34_solaris - Build # 696 - Still Failing

2013-11-01 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_solaris/696/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 156397 lines...]
[junit] 2013-11-01 07:56:17,827 [myid:] - INFO  
[Thread-4:NIOServerCnxn@997] - Closed socket connection for client 
/127.0.0.1:51448 (no session established for client)
[junit] 2013-11-01 07:56:17,827 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[InMemoryDataTree, StandaloneServer_port]
[junit] 2013-11-01 07:56:17,828 [myid:] - INFO  [main:JMXEnv@105] - 
expect:InMemoryDataTree
[junit] 2013-11-01 07:56:17,829 [myid:] - INFO  [main:JMXEnv@108] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
[junit] 2013-11-01 07:56:17,829 [myid:] - INFO  [main:JMXEnv@105] - 
expect:StandaloneServer_port
[junit] 2013-11-01 07:56:17,829 [myid:] - INFO  [main:JMXEnv@108] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1
[junit] 2013-11-01 07:56:17,829 [myid:] - INFO  [main:ClientBase@421] - 
STOPPING server
[junit] 2013-11-01 07:56:17,830 [myid:] - INFO  [main:ZooKeeperServer@441] 
- shutting down
[junit] 2013-11-01 07:56:17,830 [myid:] - INFO  
[main:SessionTrackerImpl@225] - Shutting down
[junit] 2013-11-01 07:56:17,830 [myid:] - INFO  
[main:PrepRequestProcessor@761] - Shutting down
[junit] 2013-11-01 07:56:17,830 [myid:] - INFO  
[main:SyncRequestProcessor@209] - Shutting down
[junit] 2013-11-01 07:56:17,830 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop!
[junit] 2013-11-01 07:56:17,830 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@187] - SyncRequestProcessor exited!
[junit] 2013-11-01 07:56:17,831 [myid:] - INFO  
[main:FinalRequestProcessor@415] - shutdown of request processor complete
[junit] 2013-11-01 07:56:17,831 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2013-11-01 07:56:17,831 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[]
[junit] 2013-11-01 07:56:17,832 [myid:] - INFO  [main:ClientBase@414] - 
STARTING server
[junit] 2013-11-01 07:56:17,833 [myid:] - INFO  [main:ZooKeeperServer@162] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch34_solaris/trunk/build/test/tmp/test2587819802444114277.junit.dir/version-2
 snapdir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch34_solaris/trunk/build/test/tmp/test2587819802444114277.junit.dir/version-2
[junit] 2013-11-01 07:56:17,833 [myid:] - INFO  
[main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2013-11-01 07:56:17,835 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2013-11-01 07:56:17,836 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - 
Accepted socket connection from /127.0.0.1:51450
[junit] 2013-11-01 07:56:17,836 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@817] - Processing 
stat command from /127.0.0.1:51450
[junit] 2013-11-01 07:56:17,836 [myid:] - INFO  
[Thread-5:NIOServerCnxn$StatCommand@653] - Stat command output
[junit] 2013-11-01 07:56:17,837 [myid:] - INFO  
[Thread-5:NIOServerCnxn@997] - Closed socket connection for client 
/127.0.0.1:51450 (no session established for client)
[junit] 2013-11-01 07:56:17,837 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[InMemoryDataTree, StandaloneServer_port]
[junit] 2013-11-01 07:56:17,838 [myid:] - INFO  [main:JMXEnv@105] - 
expect:InMemoryDataTree
[junit] 2013-11-01 07:56:17,838 [myid:] - INFO  [main:JMXEnv@108] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
[junit] 2013-11-01 07:56:17,838 [myid:] - INFO  [main:JMXEnv@105] - 
expect:StandaloneServer_port
[junit] 2013-11-01 07:56:17,838 [myid:] - INFO  [main:JMXEnv@108] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1
[junit] 2013-11-01 07:56:17,838 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota
[junit] 2013-11-01 07:56:17,839 [myid:] - INFO  [main:ClientBase@451] - 
tearDown starting
[junit] 2013-11-01 07:56:17,921 [myid:] - INFO  [main:ZooKeeper@684] - 
Session: 0x14212a9a76d closed
[junit] 2013-11-01 07:56:17,921 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@509] - EventThread shut down
[junit] 2013-11-01 07:56:17,921 [myid:] - INFO  [main:ClientBase@421] - 
STOPPING server
[junit] 2013-11-01 07:56:17,923 [myid:] - INFO  [main:ZooKeeperServer@441] 
- shutting down
[junit] 2013-11-01 07:56:17,923 

ZooKeeper-trunk-solaris - Build # 718 - Still Failing

2013-11-01 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/718/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 222312 lines...]
[junit] 2013-11-01 09:04:58,692 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2013-11-01 09:04:58,693 [myid:] - INFO  [main:ZooKeeperServer@428] 
- shutting down
[junit] 2013-11-01 09:04:58,693 [myid:] - INFO  
[main:SessionTrackerImpl@183] - Shutting down
[junit] 2013-11-01 09:04:58,693 [myid:] - INFO  
[main:PrepRequestProcessor@972] - Shutting down
[junit] 2013-11-01 09:04:58,694 [myid:] - INFO  
[main:SyncRequestProcessor@190] - Shutting down
[junit] 2013-11-01 09:04:58,694 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@156] - PrepRequestProcessor exited loop!
[junit] 2013-11-01 09:04:58,694 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@168] - SyncRequestProcessor exited!
[junit] 2013-11-01 09:04:58,694 [myid:] - INFO  
[main:FinalRequestProcessor@442] - shutdown of request processor complete
[junit] 2013-11-01 09:04:58,695 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2013-11-01 09:04:58,695 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[]
[junit] 2013-11-01 09:04:58,696 [myid:] - INFO  [main:ClientBase@414] - 
STARTING server
[junit] 2013-11-01 09:04:58,696 [myid:] - INFO  [main:ZooKeeperServer@149] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test5069164875118837083.junit.dir/version-2
 snapdir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test5069164875118837083.junit.dir/version-2
[junit] 2013-11-01 09:04:58,697 [myid:] - INFO  
[main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s 
sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 
kB direct buffers.
[junit] 2013-11-01 09:04:58,697 [myid:] - INFO  
[main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2013-11-01 09:04:58,698 [myid:] - INFO  [main:FileSnap@83] - 
Reading snapshot 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test5069164875118837083.junit.dir/version-2/snapshot.b
[junit] 2013-11-01 09:04:58,700 [myid:] - INFO  [main:FileTxnSnapLog@297] - 
Snapshotting: 0xb to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test5069164875118837083.junit.dir/version-2/snapshot.b
[junit] 2013-11-01 09:04:58,702 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2013-11-01 09:04:58,702 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:62958
[junit] 2013-11-01 09:04:58,703 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@828] - Processing stat command from 
/127.0.0.1:62958
[junit] 2013-11-01 09:04:58,703 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn$StatCommand@677] - Stat command output
[junit] 2013-11-01 09:04:58,704 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@999] - Closed socket connection for client 
/127.0.0.1:62958 (no session established for client)
[junit] 2013-11-01 09:04:58,704 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[InMemoryDataTree, StandaloneServer_port]
[junit] 2013-11-01 09:04:58,705 [myid:] - INFO  [main:JMXEnv@105] - 
expect:InMemoryDataTree
[junit] 2013-11-01 09:04:58,705 [myid:] - INFO  [main:JMXEnv@108] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
[junit] 2013-11-01 09:04:58,705 [myid:] - INFO  [main:JMXEnv@105] - 
expect:StandaloneServer_port
[junit] 2013-11-01 09:04:58,706 [myid:] - INFO  [main:JMXEnv@108] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1
[junit] 2013-11-01 09:04:58,706 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota
[junit] 2013-11-01 09:04:58,706 [myid:] - INFO  [main:ClientBase@451] - 
tearDown starting
[junit] 2013-11-01 09:04:58,777 [myid:] - INFO  [main:ZooKeeper@777] - 
Session: 0x14212e88860 closed
[junit] 2013-11-01 09:04:58,778 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down
[junit] 2013-11-01 09:04:58,778 [myid:] - INFO  [main:ClientBase@421] - 
STOPPING server
[junit] 2013-11-01 09:04:58,778 [myid:] - INFO  

ZooKeeper-3.4-WinVS2008_java - Build # 340 - Still Failing

2013-11-01 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-3.4-WinVS2008_java/340/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 213188 lines...]
[junit] 2013-11-01 10:12:28,108 [myid:] - INFO  [main:ZooKeeperServer@441] 
- shutting down
[junit] 2013-11-01 10:12:28,108 [myid:] - INFO  
[main:SessionTrackerImpl@225] - Shutting down
[junit] 2013-11-01 10:12:28,209 [myid:] - INFO  
[main:PrepRequestProcessor@761] - Shutting down
[junit] 2013-11-01 10:12:28,209 [myid:] - INFO  
[main:SyncRequestProcessor@209] - Shutting down
[junit] 2013-11-01 10:12:28,209 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop!
[junit] 2013-11-01 10:12:28,209 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@187] - SyncRequestProcessor exited!
[junit] 2013-11-01 10:12:28,210 [myid:] - INFO  
[main:FinalRequestProcessor@415] - shutdown of request processor complete
[junit] 2013-11-01 10:12:28,309 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2013-11-01 10:12:29,300 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[]
[junit] 2013-11-01 10:12:29,301 [myid:] - INFO  [main:ClientBase@414] - 
STARTING server
[junit] 2013-11-01 10:12:29,301 [myid:] - INFO  [main:ZooKeeperServer@162] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test1011140216037316854.junit.dir\version-2
 snapdir 
f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test1011140216037316854.junit.dir\version-2
[junit] 2013-11-01 10:12:29,316 [myid:] - INFO  
[main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2013-11-01 10:12:29,320 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2013-11-01 10:12:29,321 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - 
Accepted socket connection from /127.0.0.1:65321
[junit] 2013-11-01 10:12:29,321 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@817] - Processing 
stat command from /127.0.0.1:65321
[junit] 2013-11-01 10:12:29,416 [myid:] - INFO  
[Thread-5:NIOServerCnxn$StatCommand@653] - Stat command output
[junit] 2013-11-01 10:12:29,418 [myid:] - INFO  
[Thread-5:NIOServerCnxn@997] - Closed socket connection for client 
/127.0.0.1:65321 (no session established for client)
[junit] 2013-11-01 10:12:29,418 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[InMemoryDataTree, StandaloneServer_port]
[junit] 2013-11-01 10:12:29,420 [myid:] - INFO  [main:JMXEnv@105] - 
expect:InMemoryDataTree
[junit] 2013-11-01 10:12:29,420 [myid:] - INFO  [main:JMXEnv@108] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
[junit] 2013-11-01 10:12:29,516 [myid:] - INFO  [main:JMXEnv@105] - 
expect:StandaloneServer_port
[junit] 2013-11-01 10:12:29,516 [myid:] - INFO  [main:JMXEnv@108] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1
[junit] 2013-11-01 10:12:29,516 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota
[junit] 2013-11-01 10:12:29,516 [myid:] - INFO  [main:ClientBase@451] - 
tearDown starting
[junit] 2013-11-01 10:12:29,815 [myid:] - INFO  
[main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@968] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11221. Will not attempt to 
authenticate using SASL (java.lang.SecurityException: Unable to locate a login 
configuration)
[junit] 2013-11-01 10:12:29,816 [myid:] - INFO  
[main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@849] - Socket 
connection established to 127.0.0.1/127.0.0.1:11221, initiating session
[junit] 2013-11-01 10:12:29,816 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - 
Accepted socket connection from /127.0.0.1:65327
[junit] 2013-11-01 10:12:29,820 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@861] - Client 
attempting to renew session 0x14213264b5c at /127.0.0.1:65327
[junit] 2013-11-01 10:12:29,821 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@617] - Established 
session 0x14213264b5c with negotiated timeout 3 for client 
/127.0.0.1:65327
[junit] 2013-11-01 10:12:29,821 [myid:] - INFO  
[main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1228] - Session 
establishment complete on server 127.0.0.1/127.0.0.1:11221, sessionid = 
0x14213264b5c, negotiated timeout = 3
[junit] 2013-11-01 10:12:29,920 [myid:] - INFO  [ProcessThread(sid:0 

ZooKeeper-trunk-WinVS2008_java - Build # 588 - Failure

2013-11-01 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008_java/588/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 283707 lines...]
[junit] 2013-11-01 10:26:48,001 [myid:] - INFO  
[SessionTracker:SessionTrackerImpl@134] - SessionTrackerImpl exited loop!
[junit] 2013-11-01 10:26:48,000 [myid:] - INFO  
[SessionTracker:SessionTrackerImpl@134] - SessionTrackerImpl exited loop!
[junit] 2013-11-01 10:26:48,093 [myid:] - INFO  
[main:FinalRequestProcessor@442] - shutdown of request processor complete
[junit] 2013-11-01 10:26:48,094 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2013-11-01 10:26:48,608 [myid:] - INFO  
[main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1008] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11221. Will not attempt to 
authenticate using SASL (java.lang.SecurityException: Unable to locate a login 
configuration)
[junit] 2013-11-01 10:26:49,093 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[]
[junit] 2013-11-01 10:26:49,094 [myid:] - INFO  [main:ClientBase@414] - 
STARTING server
[junit] 2013-11-01 10:26:49,094 [myid:] - INFO  [main:ZooKeeperServer@149] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test4049243956898429553.junit.dir\version-2
 snapdir 
f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test4049243956898429553.junit.dir\version-2
[junit] 2013-11-01 10:26:49,099 [myid:] - INFO  
[main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s 
sessionless connection timeout, 1 selector thread(s), 4 worker threads, and 64 
kB direct buffers.
[junit] 2013-11-01 10:26:49,100 [myid:] - INFO  
[main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2013-11-01 10:26:49,102 [myid:] - INFO  [main:FileSnap@83] - 
Reading snapshot 
f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test4049243956898429553.junit.dir\version-2\snapshot.b
[junit] 2013-11-01 10:26:49,105 [myid:] - INFO  [main:FileTxnSnapLog@297] - 
Snapshotting: 0xb to 
f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test4049243956898429553.junit.dir\version-2\snapshot.b
[junit] 2013-11-01 10:26:49,106 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2013-11-01 10:26:49,110 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:51549
[junit] 2013-11-01 10:26:49,111 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@828] - Processing stat command from 
/127.0.0.1:51549
[junit] 2013-11-01 10:26:49,112 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn$StatCommand@677] - Stat command output
[junit] 2013-11-01 10:26:49,112 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@999] - Closed socket connection for client 
/127.0.0.1:51549 (no session established for client)
[junit] 2013-11-01 10:26:49,208 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[InMemoryDataTree, StandaloneServer_port]
[junit] 2013-11-01 10:26:49,210 [myid:] - INFO  [main:JMXEnv@105] - 
expect:InMemoryDataTree
[junit] 2013-11-01 10:26:49,210 [myid:] - INFO  [main:JMXEnv@108] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
[junit] 2013-11-01 10:26:49,210 [myid:] - INFO  [main:JMXEnv@105] - 
expect:StandaloneServer_port
[junit] 2013-11-01 10:26:49,210 [myid:] - INFO  [main:JMXEnv@108] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1
[junit] 2013-11-01 10:26:49,211 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota
[junit] 2013-11-01 10:26:49,211 [myid:] - INFO  [main:ClientBase@451] - 
tearDown starting
[junit] 2013-11-01 10:26:49,598 [myid:] - INFO  
[main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@882] - Socket 
connection established to 127.0.0.1/127.0.0.1:11221, initiating session
[junit] 2013-11-01 10:26:49,598 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:51544
[junit] 2013-11-01 10:26:49,599 [myid:] - INFO  
[NIOWorkerThread-2:ZooKeeperServer@858] - Client attempting to renew session 
0x142133367d1 at /127.0.0.1:51544
[junit] 2013-11-01 10:26:49,615 [myid:] - INFO  
[NIOWorkerThread-2:ZooKeeperServer@604] - Established session 0x142133367d1 
with negotiated timeout 3 for client /127.0.0.1:51544
[junit] 2013-11-01 10:26:49,616 [myid:] - INFO  

[jira] [Commented] (ZOOKEEPER-1805) Don't care value in ZooKeeper election breaks rolling upgrades

2013-11-01 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811263#comment-13811263
 ] 

Flavio Junqueira commented on ZOOKEEPER-1805:
-

Thanks, guys.

bq. Only a very mild suggestion, if you wish you could remove the changes 
introduced by ZOOKEEPER-1732 in Leader.java, Learner.java and QuorumPeer.java.

I don't understand this comment. The change here simply detects that there is a 
mix of messages with and without don't care values, which must correspond to a 
rolling upgrade, so it ignores the corresponding fields and simply verifies 
that the epoch is greater. If this is right, then the changes of ZOOKEEPER-1732 
are still required when everyone is sending don't care values. Am I missing 
anything? 

 Don't care value in ZooKeeper election breaks rolling upgrades
 

 Key: ZOOKEEPER-1805
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1805
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1805-b3.4.patch, ZOOKEEPER-1805.patch, 
 ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, 
 ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch


 This is an issue that has been originally reported in ZOOKEEPER-1732.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos

2013-11-01 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811314#comment-13811314
 ] 

Benjamin Reed commented on ZOOKEEPER-1742:
--

i propose that i open up another issue to not build these tests on the Mac and 
do a patch there and then lower the priority of this issue. does that sound ok?

 make check doesn't work on macos
 --

 Key: ZOOKEEPER-1742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Assignee: Benjamin Reed
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742.patch


 There are two problems I have spotted when running make check with the C 
 client. First, it complains that the sleep call is not defined in two test 
 files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. 
 Including unistd.h works. The second problem is with linker options. It 
 complains that --wrap is not a valid. I'm not sure how to deal with this 
 one yet, since I'm not sure why we are using it.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos

2013-11-01 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811391#comment-13811391
 ] 

Flavio Junqueira commented on ZOOKEEPER-1742:
-

If we disable the tests for on Mac OS, aren't we exposing ourselves to trouble? 
The tests are there for a reason, but granted that I can't remember what these 
tests are doing, so it is not clear how critical it is that we run them on Mac 
OS. 

Just so that I understand, trying to fix the tests so that they run on Mac OS 
would be hard? They are running on Ubuntu fine now, is it right? 

 make check doesn't work on macos
 --

 Key: ZOOKEEPER-1742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Assignee: Benjamin Reed
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742.patch


 There are two problems I have spotted when running make check with the C 
 client. First, it complains that the sleep call is not defined in two test 
 files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. 
 Including unistd.h works. The second problem is with linker options. It 
 complains that --wrap is not a valid. I'm not sure how to deal with this 
 one yet, since I'm not sure why we are using it.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)
Raul Gutierrez Segales created ZOOKEEPER-1807:
-

 Summary: Observers spam each other creating connections to the 
election addr
 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales


Hey [~shralex],

I noticed today that my Observers are spamming each other trying to open 
connections to the election port. I've got tons of these:

{noformat}
2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection 
already for server 9
2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection 
already for server 10
2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection 
already for server 6
2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection 
already for server 12
2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection 
already for server 14
{noformat}

and so and so on ad nauseam. 

Now, looking around I found this inside FastLeaderElection.java from when you 
committed ZOOKEEPER-107:

{noformat}
 private void sendNotifications() {
-for (QuorumServer server : self.getVotingView().values()) {
-long sid = server.id;
-
+for (long sid : self.getAllKnownServerIds()) {
+QuorumVerifier qv = self.getQuorumVerifier();
{noformat}

Is that really desired? I suspect that is what's causing Observers to try to 
connect to each other (as opposed as just connecting to participants). I'll 
give it a try now and let you know. (Also, we use observer ids that are  0, 
and I saw some parts of the code that might not deal with that assumption - so 
it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Alexander Shraer (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811771#comment-13811771
 ] 

Alexander Shraer commented on ZOOKEEPER-1807:
-

Hi Raul,

ZK-107 allows changing server roles. In one config a server is an observer, in 
the next one it may be a follower. I haven't looked closely, but I think the 
intention was to talk with everyone you know to try to get the most up-to-date 
config information. Instead of reverting this to the previous code, consider 
adding a check (regardless of whether this is an observer/participant server) 
that won't attempt to create a connection if one is already there to the same 
server with the same election address (election addresses may change from one 
view to the next). 

The code should handle observer id  0, please file a JIRA if you find that 
there is a problem somewhere.

Thanks,
Alex



 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales

 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos

2013-11-01 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811769#comment-13811769
 ] 

Patrick Hunt commented on ZOOKEEPER-1742:
-

bq. They are running on Ubuntu fine now, is it right?

correct in my case (ubuntu 13.10)

 make check doesn't work on macos
 --

 Key: ZOOKEEPER-1742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Assignee: Benjamin Reed
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742.patch


 There are two problems I have spotted when running make check with the C 
 client. First, it complains that the sleep call is not defined in two test 
 files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. 
 Including unistd.h works. The second problem is with linker options. It 
 complains that --wrap is not a valid. I'm not sure how to deal with this 
 one yet, since I'm not sure why we are using it.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811782#comment-13811782
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

Oh - fair enough. So I suspect QuorumCnxManager isn't doing the right thing 
then. Will take look. Thanks for the quick reply!

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales

 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811798#comment-13811798
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

Actually - my initial assessment was wrong (the spammy there is already a 
connection.. message  confused me).I am seeing an excess in traffic between 
Observers through the election port, but it's not due to connection attempts. 
I'll come back with the actual messages. Sorry if this isn't actually related 
to ZOOKEEPER-107, [~shralex].

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales

 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-1807:


Fix Version/s: 3.5.0

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811800#comment-13811800
 ] 

Flavio Junqueira commented on ZOOKEEPER-1807:
-

It would be good to understand if this is a bug that affects the 3.4 branch as 
well and if it is a blocker, [~rgs].

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811802#comment-13811802
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

Yes - absolutely [~fpj]. The amount of traffic that I am seeing between 
Observers through the election port is... scary. I am still trying to figure 
out what is going on. Will be back in a bit when I have a proper analysis. 

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1790) Deal with special ObserverId in QuorumCnxManager.receiveConnection

2013-11-01 Thread Thawan Kooburat (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811837#comment-13811837
 ] 

Thawan Kooburat commented on ZOOKEEPER-1790:


In our internal deployment, we use negative sid to for observers  (actually all 
of them is -1) . This is probably not the intended usage but it worked so far.  
It would be nice to add to release note of 3.5 if there is a change in the 
valid sid range. 

 Deal with special ObserverId in QuorumCnxManager.receiveConnection
 --

 Key: ZOOKEEPER-1790
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1790
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Alexander Shraer
Assignee: Alexander Shraer
 Fix For: 3.4.6, 3.5.0


 QuorumCnxManager.receiveConnection assumes that a negative sid means that 
 this is a 3.5.0 server, which has a different communication protocol. This 
 doesn't account for the fact that ObserverId = -1 is a special id that may be 
 used by observers and is also negative. 
 This requires a fix to trunk and a separate fix to 3.4 branch, where this 
 function is different (see ZOOKEEPER-1633)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Thawan Kooburat (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811838#comment-13811838
 ] 

Thawan Kooburat commented on ZOOKEEPER-1807:


In our internal deployment, the host list in zoo.cfg for each observer only 
have the participants and itself.  This helps address this issue a bit but 
obviously, in 3.5 world, this won't work if you want to promote an observer to 
a participant. 

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1790) Deal with special ObserverId in QuorumCnxManager.receiveConnection

2013-11-01 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811839#comment-13811839
 ] 

Flavio Junqueira commented on ZOOKEEPER-1790:
-

I was wondering if this has anything to do with ZOOKEEPER-1807.

 Deal with special ObserverId in QuorumCnxManager.receiveConnection
 --

 Key: ZOOKEEPER-1790
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1790
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Alexander Shraer
Assignee: Alexander Shraer
 Fix For: 3.4.6, 3.5.0


 QuorumCnxManager.receiveConnection assumes that a negative sid means that 
 this is a 3.5.0 server, which has a different communication protocol. This 
 doesn't account for the fact that ObserverId = -1 is a special id that may be 
 used by observers and is also negative. 
 This requires a fix to trunk and a separate fix to 3.4 branch, where this 
 function is different (see ZOOKEEPER-1633)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811849#comment-13811849
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

Okey - this seems to actually be related to ZOOKEEPER-107, [~shralex].  I added 
some debugging logging and I've see that the spam, to all Observers, are the 
notifications:

{noformat}
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 9, 
peerEpoch = 130, configData = [B@5a0c0ce6
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 12, 
peerEpoch = 130, configData = [B@4d22fe39
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 6, 
peerEpoch = 130, configData = [B@346077bf
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 13, 
peerEpoch = 130, configData = [B@2955b776
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 11, 
peerEpoch = 130, configData = [B@3a7fb92d
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 14, 
peerEpoch = 130, configData = [B@1756575c
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 13, 
peerEpoch = 130, configData = [B@258164fc
{noformat}

As you can see, it's sending tons of notifications per second. Not good :)

With this diff in FastLeaderElection.java (i.e.: a revert of part of your 
change):

{noformat}
 private void sendNotifications() {
-for (long sid : self.getAllKnownServerIds()) {
+for (QuorumServer server : self.getVotingView().values()) {
+long sid = server.id;
{noformat}

observers, of course, don't get spammed. I am guessing some condition is 
failing for Observers that assumes the notifications are fresh and sends them 
repeatedly?

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811850#comment-13811850
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

[~fpj]: I think this is 3.5.0 specific since it goes away whilst reverting 
those bits from ZOOKEEPER-107 (there is a chance I am overlooking something, of 
course, and it's some other thing). But this is most likely a blocker for the 
3.5.0 release though. 

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811851#comment-13811851
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

[~thawan]: should omitting the Observers from zoo.cfg actually make any 
difference? If so we should document it somewhere (unless it already is is). In 
my case, where I do explicitly enumerate them I don't get 
observers-to-observers connections on the election port once I remove the bits 
I mentioned above in FLE (so it seems to me it isn't). 

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1790) Deal with special ObserverId in QuorumCnxManager.receiveConnection

2013-11-01 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811852#comment-13811852
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1790:
---

[~fpj]: I don't think it's related - my initial assessment was wrong. It isn't 
connection attempts that generate the extra traffic I am seeing but the 
Notifications (as commented in ZOOKEEPER-1807). 

 Deal with special ObserverId in QuorumCnxManager.receiveConnection
 --

 Key: ZOOKEEPER-1790
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1790
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Alexander Shraer
Assignee: Alexander Shraer
 Fix For: 3.4.6, 3.5.0


 QuorumCnxManager.receiveConnection assumes that a negative sid means that 
 this is a 3.5.0 server, which has a different communication protocol. This 
 doesn't account for the fact that ObserverId = -1 is a special id that may be 
 used by observers and is also negative. 
 This requires a fix to trunk and a separate fix to 3.4 branch, where this 
 function is different (see ZOOKEEPER-1633)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811858#comment-13811858
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

I think what's happening is that when we send the initial notifications to all 
members, as opposed to just voting members as it was before, we trigger off a 
self-replicating cascade of notifications. Each Observers gets the notification 
and then by virtue of:

{noformat}
/*  
  
 * If it is from a non-voting server (such as an 
observer or  
 * a non-voting follower), respond right away.  
  
 */
if(!self.getVotingView().containsKey(response.sid)){
   .
}
{noformat}

it replies back to each Observer and so on.  So sounds to me that this needs to 
match what we have  in sendNotifications and actually check response.sid 
against self.getAllKnownServerIds() to avoid the endless echoing of 
notifications that I am seeing.

Thoughts [~shralex], [~fpj] ?

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1807:
--

Attachment: ZOOKEEPER-1807.patch

The attached patch prevents sending replies back when we are an Observer. Since 
ZOOKEEPER-107 we send notifications to Observers because they can be promoted 
to Participants. But to avoid replicating replies forver (i.e.: an observer 
sends a notification and the receiving observer then sends another one and so 
on) we don't have to send notifications when we are a LearnerType.OBSERVER. 

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1807.patch


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811865#comment-13811865
 ] 

Hadoop QA commented on ZOOKEEPER-1807:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12611737/ZOOKEEPER-1807.patch
  against trunk revision 1535491.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//console

This message is automatically generated.

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1807.patch


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Failed: ZOOKEEPER-1807 PreCommit Build #1733

2013-11-01 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 314178 lines...]
 [exec] 
 [exec] -1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12611737/ZOOKEEPER-1807.patch
 [exec]   against trunk revision 1535491.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 84362112158ddbbb07230db45919a0034a96 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623:
 exec returned: 1

Total time: 33 minutes 16 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1807
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Created] (BOOKKEEPER-701) Improve exception handling of Bookkeeper threads

2013-11-01 Thread Rakesh R (JIRA)
Rakesh R created BOOKKEEPER-701:
---

 Summary: Improve exception handling of Bookkeeper threads
 Key: BOOKKEEPER-701
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-701
 Project: Bookkeeper
  Issue Type: Improvement
  Components: bookkeeper-auto-recovery, bookkeeper-client, 
bookkeeper-server
Reporter: Rakesh R
Assignee: Rakesh R


This JIRA discusses how to improve the exception handling of bookkeeper 
threads. As part of this it needs to review all the bookkeeper threads, if any 
unhandled exception from a thread, it should,
- log a loud error when a thread dies. 
- exit if any of the critical thread dies.

Please have a look at BOOKKEEPER-700 to know the initial discussions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)