from:"Raul Gutierrez Segales \(JIRA\)"

[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2012-11-18 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499893#comment-13499893
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1477:
---

FWIW I have the same issue with zkCli on Fedora 18 / JDK 1.7.0_09-icedtea.

 Test failures with Java 7 on Mac OS X
 -

 Key: ZOOKEEPER-1477
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
 Project: ZooKeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.4.3
 Environment: Mac OS X Lion (10.7.4)
 Java version:
 java version 1.7.0_04
 Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
 Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
Reporter: Diwaker Gupta
 Fix For: 3.4.6

 Attachments: with-ZK-1550.txt


 I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
 including ZooKeeperTest. A common symptom was spurious 
 {{ConnectionLossException}}:
 {code}
 2012-06-01 12:01:23,420 [myid:] - INFO  
 [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
 testDeleteRecursiveAsync
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 ... (snipped)
 {code}
 As background, I was actually investigating some non-deterministic failures 
 when using Netflix's Curator with Java 7 (see 
 https://github.com/Netflix/curator/issues/79). After a while, I figured I 
 should establish a clean ZK baseline first and realized it is actually a ZK 
 issue, not a Curator issue.
 We are trying to migrate to Java 7 but this is a blocking issue for us right 
 now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2012-11-18 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499895#comment-13499895
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1477:
---

(On 3.4.4)

 Test failures with Java 7 on Mac OS X
 -

 Key: ZOOKEEPER-1477
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
 Project: ZooKeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.4.3
 Environment: Mac OS X Lion (10.7.4)
 Java version:
 java version 1.7.0_04
 Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
 Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
Reporter: Diwaker Gupta
 Fix For: 3.4.6

 Attachments: with-ZK-1550.txt


 I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
 including ZooKeeperTest. A common symptom was spurious 
 {{ConnectionLossException}}:
 {code}
 2012-06-01 12:01:23,420 [myid:] - INFO  
 [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
 testDeleteRecursiveAsync
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 ... (snipped)
 {code}
 As background, I was actually investigating some non-deterministic failures 
 when using Netflix's Curator with Java 7 (see 
 https://github.com/Netflix/curator/issues/79). After a while, I figured I 
 should establish a clean ZK baseline first and realized it is actually a ZK 
 issue, not a Curator issue.
 We are trying to migrate to Java 7 but this is a blocking issue for us right 
 now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (ZOOKEEPER-1585) make dist for src/c broken in trunk

2012-11-18 Thread Raul Gutierrez Segales (JIRA)

Raul Gutierrez Segales created ZOOKEEPER-1585:
-

 Summary: make dist for src/c broken in trunk
 Key: ZOOKEEPER-1585
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1585
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1585) make dist for src/c broken in trunk

2012-11-18 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1585:
--

Description: make dist from trunk is failing because of a wrong reference 
to src/zookeeper_log.h (which exists in include/). 

 make dist for src/c broken in trunk
 ---

 Key: ZOOKEEPER-1585
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1585
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales

 make dist from trunk is failing because of a wrong reference to 
 src/zookeeper_log.h (which exists in include/). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1585) make dist for src/c broken in trunk

2012-11-18 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1585:
--

Attachment: 0001-ZOOKEEPER-1585.-make-dist-for-src-c-broken-in-trunk.patch

 make dist for src/c broken in trunk
 ---

 Key: ZOOKEEPER-1585
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1585
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
 Attachments: 
 0001-ZOOKEEPER-1585.-make-dist-for-src-c-broken-in-trunk.patch


 make dist from trunk is failing because of a wrong reference to 
 src/zookeeper_log.h (which exists in include/). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1585) make dist for src/c broken in trunk

2012-11-18 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1585:
--

Attachment: ZOOKEEPER-1585.patch

sorry, previous patch was created via git diff which obviously doesn't work. 
This one is created via git-svn-diff which *should* work. 

 make dist for src/c broken in trunk
 ---

 Key: ZOOKEEPER-1585
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1585
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
 Attachments: ZOOKEEPER-1585.patch


 make dist from trunk is failing because of a wrong reference to 
 src/zookeeper_log.h (which exists in include/). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1585) make dist for src/c broken in trunk

2012-11-18 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1585:
--

Attachment: (was: 
0001-ZOOKEEPER-1585.-make-dist-for-src-c-broken-in-trunk.patch)

 make dist for src/c broken in trunk
 ---

 Key: ZOOKEEPER-1585
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1585
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
 Attachments: ZOOKEEPER-1585.patch


 make dist from trunk is failing because of a wrong reference to 
 src/zookeeper_log.h (which exists in include/). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (ZOOKEEPER-1586) tarballs for zkfuse don't compile out of tree

2012-11-18 Thread Raul Gutierrez Segales (JIRA)

Raul Gutierrez Segales created ZOOKEEPER-1586:
-

 Summary: tarballs for zkfuse don't compile out of tree
 Key: ZOOKEEPER-1586
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1586
 Project: ZooKeeper
  Issue Type: Bug
  Components: contrib-zkfuse
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1586) tarballs for zkfuse don't compile out of tree

2012-11-18 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1586:
--

Attachment: ZOOKEEPER-1586.patch

 tarballs for zkfuse don't compile out of tree
 -

 Key: ZOOKEEPER-1586
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1586
 Project: ZooKeeper
  Issue Type: Bug
  Components: contrib-zkfuse
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
 Attachments: ZOOKEEPER-1586.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1519) Zookeeper Async calls can reference free()'d memory

2013-02-26 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587340#comment-13587340
]

Raul Gutierrez Segales commented on ZOOKEEPER-1519:
---

Does sizeof *(void *) work?

Zookeeper Async calls can reference free()'d memory
---

Key: ZOOKEEPER-1519
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1519
Project: ZooKeeper
Issue Type: Bug
Components: c client
Affects Versions: 3.3.3, 3.3.6
Environment: Ubuntu 11.10, Ubuntu packaged Zookeeper 3.3.3 with some
backported fixes.
Reporter: Mark Gius
Attachments: zookeeper-1519.patch

zoo_acreate() and zoo_aset() take a char * argument for data and prepare a
call to zookeeper. This char * doesn't seem to be duplicated at any point,
making it possible that the caller of the asynchronous function might
potentially free() the char * argument before the zookeeper library completes
its request. This is unlikely to present a real problem unless the freed
memory is re-used before zookeeper consumes it. I've been unable to
reproduce this issue using pure C as a result.
However, ZKPython is a whole different story. Consider this snippet:
ok = zookeeper.acreate(handle, path, json.dumps(value),
acl, flags, callback)
assert ok == zookeeper.OK
In this snippet, json.dumps() allocates a string which is passed into the
acreate(). When acreate() returns, the zookeeper request has been
constructed with a pointer to the string allocated by json.dumps(). Also
when acreate() returns, that string is now referenced by 0 things (ZKPython
doesn't bump the refcount) and the string is eligible for garbage collection
and re-use. The Zookeeper request now has a pointer to dangerous freed
memory.
I've been seeing odd behavior in our development environments for some time
now, where it appeared as though two separate JSON payloads had been joined
together. Python has been allocating a new JSON string in the middle of the
old string that an incomplete zookeeper async call had not yet processed.
I am not sure if this is a behavior that should be documented, or if the C
binding implementation needs to be updated to create copies of the data
payload provided for aset and acreate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1519) Zookeeper Async calls can reference free()'d memory

2013-02-26 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587351#comment-13587351
]

Raul Gutierrez Segales commented on ZOOKEEPER-1519:
---

Don't think so: http://fpaste.org/iwjf/

Zookeeper Async calls can reference free()'d memory
---

[jira] [Commented] (ZOOKEEPER-1552) Enable sync request processor in Observer

2013-03-02 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591454#comment-13591454
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1552:
---

Small nit: typo in patch (recieved as INFORM packet). 

 Enable sync request processor in Observer
 -

 Key: ZOOKEEPER-1552
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1552
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum, server
Affects Versions: 3.4.3
Reporter: Thawan Kooburat
Assignee: Thawan Kooburat
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1552.patch, ZOOKEEPER-1552.patch


 Observer doesn't forward its txns to SyncRequestProcessor. So it never 
 persists the txns onto disk or periodically creates snapshots. This increases 
 the start-up time since it will get the entire snapshot if the observer has 
 be running for a long time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2013-06-14 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684025#comment-13684025
]

Raul Gutierrez Segales commented on ZOOKEEPER-1147:
---

[~thawan]: how is this meant to work with a rolling update when enabling local
sessions? If the leader doesn't have local sessions enabled then all writes
from local sessions will fail with SessionExpired (because they'll be unknown
to the leader) - right?

The only way I could get a rolling update to work is with (something like) this:

http://www.itevenworks.net/~rgs/patches/0001-Add-support-to-enable-disable-sessions-validations.patch

I.e.: adding a way to temporarily disable sessions validations whilst you are
enabling local sessions on the cluster.
We should add some documentation about the right way to this. Thoughts? It
would be nice to get this merged.

Add support for local sessions
--

Key: ZOOKEEPER-1147
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
Project: ZooKeeper
Issue Type: Improvement
Components: server
Affects Versions: 3.3.3
Reporter: Vishal Kathuria
Assignee: Thawan Kooburat
Labels: api-change, scaling
Fix For: 3.5.0

Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch,
ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch,
ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch

Original Estimate: 840h
Remaining Estimate: 840h

This improvement is in the bucket of making ZooKeeper work at a large scale.
We are planning on having about a 1 million clients connect to a ZooKeeper
ensemble through a set of 50-100 observers. Majority of these clients are
read only - ie they do not do any updates or create ephemeral nodes.
In ZooKeeper today, the client creates a session and the session creation is
handled like any other update. In the above use case, the session create/drop
workload can easily overwhelm an ensemble. The following is a proposal for a
local session, to support a larger number of connections.
1. The idea is to introduce a new type of session - local session. A
local session doesn't have a full functionality of a normal session.
2. Local sessions cannot create ephemeral nodes.
3. Once a local session is lost, you cannot re-establish it using the
session-id/password. The session and its watches are gone for good.
4. When a local session connects, the session info is only maintained
on the zookeeper server (in this case, an observer) that it is connected to.
The leader is not aware of the creation of such a session and there is no
state written to disk.
5. The pings and expiration is handled by the server that the session
is connected to.
With the above changes, we can make ZooKeeper scale to a much larger number
of clients without making the core ensemble a bottleneck.
In terms of API, there are two options that are being considered
1. Let the client specify at the connect time which kind of session do they
want.
2. All sessions connect as local sessions and automatically get promoted to
global sessions when they do an operation that requires a global session
(e.g. creating an ephemeral node)
Chubby took the approach of lazily promoting all sessions to global, but I
don't think that would work in our case, where we want to keep sessions which
never create ephemeral nodes as always local. Option 2 would make it more
broadly usable but option 1 would be easier to implement.
We are thinking of implementing option 1 as the first cut. There would be a
client flag, IsLocalSession (much like the current readOnly flag) that would
be used to determine whether to create a local session or a global session.

[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2013-07-02 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697996#comment-13697996
]

Raul Gutierrez Segales commented on ZOOKEEPER-1147:
---

fwiw, we are using this patch in prod at Twitter so it would be awesome to have
this merged. Besides what I mentioned in my previous comment (having a way to
do rolling upgrades to enable local sessions) is there anything else that's
left to get this merged?

Add support for local sessions
--

Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch,
ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch,
ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch

Original Estimate: 840h
Remaining Estimate: 840h

[jira] [Commented] (ZOOKEEPER-1721) Ability to run without writing to disk

2013-07-09 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13703403#comment-13703403
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1721:
---

In Linux this is as easy as setting dataDir and dataLogDir to somewhere inside 
/dev/shm (possibly other platforms support something similar). Not sure it's 
worth supporting this with code as it might unnecessarily complicate other 
sections. 

 Ability to run without writing to disk
 --

 Key: ZOOKEEPER-1721
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1721
 Project: ZooKeeper
  Issue Type: New Feature
  Components: server
Affects Versions: 3.4.5
Reporter: Radim Kolar

 I use zookeeper for cluster synchronization. We have no need for keeping 
 persistent state across zookeeper restarts. For performance enhancement would 
 be good to have possibility to run without writing snapshots and logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1274) Support Child watcher to be displayed with 4 letter zookeeper commands i.e, wchs,wchp,wchc

2013-08-02 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1274:
--

Attachment: 0001-ZOOKEEPER-1274.-Display-child-watches-info-in-watch-.patch

 Support Child watcher to be displayed with 4 letter zookeeper commands i.e, 
 wchs,wchp,wchc
 --

 Key: ZOOKEEPER-1274
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1274
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
 Environment: Zookeeper Server
Reporter: amith
Assignee: Raul Gutierrez Segales
 Attachments: 
 0001-ZOOKEEPER-1274.-Display-child-watches-info-in-watch-.patch


 currently only data watchers (created by exists() and getdata() )are getting 
 displayed with wchs,wchp,wchc 4 letter command command 
 It would be useful to get the infomation related to childwatchers ( 
 getChildren() ) also with 4 letter words.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1607) Read-only Observer

2013-08-09 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Raul Gutierrez Segales updated ZOOKEEPER-1607:
--

Attachment: 0001-RFC-Don-t-tear-down-an-Observer-when-we-lose-connect.patch

Read-only Observer
--

Key: ZOOKEEPER-1607
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1607
Project: ZooKeeper
Issue Type: Improvement
Components: server
Affects Versions: 3.4.3
Reporter: Thawan Kooburat
Attachments:
0001-RFC-Don-t-tear-down-an-Observer-when-we-lose-connect.patch

This feature reused some of the mechanism already provided by
ReadOnlyZooKeeper (ZOOKEEPER-704) but implemented in a different way
Goal: read-only clients should be able to connect to the observer or continue
to read data from the observer event when there is an outage of underling
quorum. This means that it is possible for the observer to provide 100% read
uptime for read-only local session (ZOOKEEPER-1147)
Implementation:
The observer don't tear down itself when it lose connection with the leader.
It only close the connection associated with non read-only sessions and
global sessions. So the client can try other observer if this is a temporal
failure.
During the outage, the observer switch to read-only mode. All the pending and
future write requests get will get NOT_READONLY error. Read-only state
transition is sent to all session on that observer. The observer only accepts
a new connection from a read-only client.
When the observer is able to reconnect to the leader. It sends state
transition (CONNECTED_STATE) to all current session. If it is able to
synchronize with the leader using DIFF, the steam of txns is sent through the
commit processor instead of applying to the DataTree directly to prevent
raise condition between in-flight read requests (see ZOOKEEPER-1505). The
client will receive watch events correctly and can start issuing write
requests.
However, if the observer is getting the snapshot. It need to drop all the
connection since it cannot fire a watch correctly.

[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2013-09-09 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762695#comment-13762695
]

Raul Gutierrez Segales commented on ZOOKEEPER-1147:
---

ping? any progress to get this merged?

Add support for local sessions
--

Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch,
ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch,
ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch

Original Estimate: 840h
Remaining Estimate: 840h

[jira] [Updated] (ZOOKEEPER-1607) Read-only Observer

2013-09-13 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Raul Gutierrez Segales updated ZOOKEEPER-1607:
--

Attachment: persistent-read-only-for-observers.patch

New version of the patch with tests. Also - this is generated with git diff -p
so it should be Hadoop QA friendly.

Read-only Observer
--

[jira] [Updated] (ZOOKEEPER-1607) Read-only Observer

2013-09-13 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Raul Gutierrez Segales updated ZOOKEEPER-1607:
--

Attachment: (was:
0001-RFC-Don-t-tear-down-an-Observer-when-we-lose-connect.patch)

Read-only Observer
--

[jira] [Commented] (ZOOKEEPER-1607) Read-only Observer

2013-09-13 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766740#comment-13766740
]

Raul Gutierrez Segales commented on ZOOKEEPER-1607:
---

Arrrg I guess dependent patches aren't applied :(

Read-only Observer
--

[jira] [Commented] (ZOOKEEPER-1681) ZooKeeper 3.4.x can optionally use netty for nio but the pom does not declare the dep as optional

2013-09-23 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775636#comment-13775636
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1681:
---

I guess https://issues.apache.org/jira/browse/ZOOKEEPER-1763 would help. 

 ZooKeeper 3.4.x can optionally use netty for nio but the pom does not declare 
 the dep as optional
 -

 Key: ZOOKEEPER-1681
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1681
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.4.0, 3.4.1, 3.4.2, 3.4.4, 3.4.5
Reporter: John Sirois
 Fix For: 3.5.0


 For example in 
 [3.4.5|http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom]
  we see:
 {code}
 $ curl -sS 
 http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom
  | grep -B1 -A4 org.jboss.netty
 dependency
   groupIdorg.jboss.netty/groupId
   artifactIdnetty/artifactId
   version3.2.2.Final/version
   scopecompile/scope
 /dependency
 {code}
 As a consumer I can depend on zookeeper with an exclude for 
 org.jboss.netty#netty or I can let my transitive dep resolver pick a winner.  
 This might be fine, except for those who might be using a more modern netty 
 published under the newish io.netty groupId.  With this twist you get both 
 org.jboss.netty#netty;foo and io.netty#netty;bar on your classpath and 
 runtime errors ensue from incompatibilities. unless you add an exclude 
 against zookeeper (and clearly don't enable the zk netty nio handling.)
 I propose that this is a pom bug although this is debatable.  Clearly as 
 currently packaged zookeeper needs netty to compile, but I'd argue since it 
 does not need netty to run, either the scope should be provided or optional 
 or a zookeeper-netty lib should be broken out as an optional dependency and 
 this new dep published by zookeeper can have a proper compile dependency on 
 netty.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1638) Redundant zk.getZKDatabase().clear();

2013-10-07 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788293#comment-13788293
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1638:
---

[~neilb]: did you upload the right patch? I think you wanted:

{noformat}
- // clear our own database and read
+ // db is clear as part of deserializeSnapshot()
- zk.getZKDatabase().clear();
{noformat}


 Redundant zk.getZKDatabase().clear();
 -

 Key: ZOOKEEPER-1638
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1638
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Alexander Shraer
Assignee: neil bhakta
Priority: Trivial
  Labels: newbie
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1638.patch


 Learner.syncWithLeader calls zk.getZKDatabase().clear() right before 
 zk.getZKDatabase().deserializeSnapshot(leaderIs); Then the first thing 
 deserializeSnapshot does is another clear(). 
 Suggest to remove the clear() in syncWithLeader.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1784) Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus

2013-10-08 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1784:
--

Attachment: ZOOKEEPER-1784.patch

 Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus
 

 Key: ZOOKEEPER-1784
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1784
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Attachments: ZOOKEEPER-1784.patch


 If you look at Learner#syncWithLeader:
 {noformat}
 while (self.isRunning()) {
 readPacket(qp);
 switch(qp.getType()) {
 ...
 case Leader.INFORM:
 case Leader.INFORMANDACTIVATE:
 PacketInFlight packet = new PacketInFlight();
 packet.hdr = new TxnHeader();
 if (qp.getType() == Leader.COMMITANDACTIVATE) {
 {noformat}
 I guess qp.getType() == Leader.COMMITANDACTIVATE is a typo that should read 
 qp.getType() == Leader.INFORMANDACTIVATE.
 Assigning to Alexander for now since this is part of ZOOKEEPER-107.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (ZOOKEEPER-1784) Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus

2013-10-08 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales reassigned ZOOKEEPER-1784:
-

Assignee: Raul Gutierrez Segales  (was: Alexander Shraer)

 Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus
 

 Key: ZOOKEEPER-1784
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1784
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Attachments: ZOOKEEPER-1784.patch


 If you look at Learner#syncWithLeader:
 {noformat}
 while (self.isRunning()) {
 readPacket(qp);
 switch(qp.getType()) {
 ...
 case Leader.INFORM:
 case Leader.INFORMANDACTIVATE:
 PacketInFlight packet = new PacketInFlight();
 packet.hdr = new TxnHeader();
 if (qp.getType() == Leader.COMMITANDACTIVATE) {
 {noformat}
 I guess qp.getType() == Leader.COMMITANDACTIVATE is a typo that should read 
 qp.getType() == Leader.INFORMANDACTIVATE.
 Assigning to Alexander for now since this is part of ZOOKEEPER-107.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1784) Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus

2013-10-08 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789799#comment-13789799
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1784:
---

[~shralex]: so that code path, processing INFORMANDACTIVATE, doesn't have (it 
seems) a corresponding test case. Should we add one or extend an existing one 
to cover it?

 Logic to process INFORMANDACTIVATE packets in syncWithLeader seems bogus
 

 Key: ZOOKEEPER-1784
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1784
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Attachments: ZOOKEEPER-1784.patch


 If you look at Learner#syncWithLeader:
 {noformat}
 while (self.isRunning()) {
 readPacket(qp);
 switch(qp.getType()) {
 ...
 case Leader.INFORM:
 case Leader.INFORMANDACTIVATE:
 PacketInFlight packet = new PacketInFlight();
 packet.hdr = new TxnHeader();
 if (qp.getType() == Leader.COMMITANDACTIVATE) {
 {noformat}
 I guess qp.getType() == Leader.COMMITANDACTIVATE is a typo that should read 
 qp.getType() == Leader.INFORMANDACTIVATE.
 Assigning to Alexander for now since this is part of ZOOKEEPER-107.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1586) tarballs for zkfuse don't compile out of tree

2013-10-08 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789827#comment-13789827
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1586:
---

Yes, I believe it is. Actually there are two issues: a) the tarball is created 
with missing source files and b) (more importantly) the BUILD path for the C 
libs is wrong.

Should be:

{noformat}
-AC_CHECK_LIB(zookeeper_mt, main, [ZOOKEEPER_LD=-L${ZOOKEEPER_PATH}/.libs 
-lzookeeper_mt],,[-L${ZOOKEEPER_PATH}/.libs])
+ZOOKEEPER_BUILD_PATH=${BUILD_PATH}/../../../build/c
+AC_CHECK_LIB(zookeeper_mt, main, 
[ZOOKEEPER_LD=-L${ZOOKEEPER_BUILD_PATH}/.libs 
-lzookeeper_mt],,[-L${ZOOKEEPER_BUILD_PATH}/.libs])
{noformat}

And the third thing would be the configure.ac doesn't state that boost is 
needed.

I'll update the patch. 

 tarballs for zkfuse don't compile out of tree
 -

 Key: ZOOKEEPER-1586
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1586
 Project: ZooKeeper
  Issue Type: Bug
  Components: contrib-zkfuse
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Attachments: ZOOKEEPER-1586.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (ZOOKEEPER-1019) zkfuse doesn't list dependency on boost in README

2013-10-08 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales reassigned ZOOKEEPER-1019:
-

Assignee: Raul Gutierrez Segales

 zkfuse doesn't list dependency on boost in README
 -

 Key: ZOOKEEPER-1019
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1019
 Project: ZooKeeper
  Issue Type: Improvement
  Components: contrib
Affects Versions: 3.4.0
Reporter: Karel Vervaeke
Assignee: Raul Gutierrez Segales
   Original Estimate: 5m
  Remaining Estimate: 5m

 The README.txt under contrib/fuse doesn't list boost under Development build 
 libraries



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1019) zkfuse doesn't list dependency on boost in README

2013-10-08 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789923#comment-13789923
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1019:
---

[~phunt]: i haven't seen the crashes. I'll upload a patch that adds:

{noformat}
AC_CHECK_LIB([boost], [main], [], [AC_MSG_ERROR(We need boost to build 
zkfuse)])
{noformat}

or:

{noformat}
AC_CHECK_HEADERS([boost/shared_ptr.hpp boost/shared_array.hpp 
boost/date_time/gregorian/gregorian.hpp],,AC_MSG_ERROR([boost library headers 
not found. Please install boost library.]))
{noformat}

or such such to configure.ac (as well us updating the README). 

 zkfuse doesn't list dependency on boost in README
 -

 Key: ZOOKEEPER-1019
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1019
 Project: ZooKeeper
  Issue Type: Improvement
  Components: contrib
Affects Versions: 3.4.0
Reporter: Karel Vervaeke
Assignee: Raul Gutierrez Segales
   Original Estimate: 5m
  Remaining Estimate: 5m

 The README.txt under contrib/fuse doesn't list boost under Development build 
 libraries



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1019) zkfuse doesn't list dependency on boost in README

2013-10-08 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1019:
--

Attachment: ZOOKEEPER-1019

This patch fixes a couple of things:

- adds checks for log4cxx
- adds checks for boost's headers (the ones zkfuse uses)
- sets the right path to Zk C libs build which isn't src/c (it's build/c) 
(fixes ZOOKEEPER-1586)
- updates the README to mention boost

 zkfuse doesn't list dependency on boost in README
 -

 Key: ZOOKEEPER-1019
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1019
 Project: ZooKeeper
  Issue Type: Improvement
  Components: contrib
Affects Versions: 3.4.0
Reporter: Karel Vervaeke
Assignee: Raul Gutierrez Segales
 Attachments: ZOOKEEPER-1019

   Original Estimate: 5m
  Remaining Estimate: 5m

 The README.txt under contrib/fuse doesn't list boost under Development build 
 libraries



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1019) zkfuse doesn't list dependency on boost in README

2013-10-09 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790086#comment-13790086
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1019:
---

[~phunt]: ack - patch is go. I will rebase/update ZOOKEEPER-1586 to include the 
remaining bits on top of this one, thanks. 

 zkfuse doesn't list dependency on boost in README
 -

 Key: ZOOKEEPER-1019
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1019
 Project: ZooKeeper
  Issue Type: Improvement
  Components: contrib
Affects Versions: 3.4.0
Reporter: Karel Vervaeke
Assignee: Raul Gutierrez Segales
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1019

   Original Estimate: 5m
  Remaining Estimate: 5m

 The README.txt under contrib/fuse doesn't list boost under Development build 
 libraries



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1787) Add support enabling local session in rolling upgrade

2013-10-09 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790692#comment-13790692
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1787:
---

[~thawan]: would you like me to rebase the patch I proposed in ZOOKEEPER-1147 
(with the comments you suggested) or do you have a better/different approach?

 Add support enabling local session in rolling upgrade
 -

 Key: ZOOKEEPER-1787
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1787
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.0
Reporter: Thawan Kooburat
Priority: Minor

 Currently, local session need to be enable by stopping the entire ensemble. 
 If a rolling upgrade is used, all write request from a local session will 
 fail with session move until the local session is enabled on the leader.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (ZOOKEEPER-1788) Support clientID field on connection requests

2013-10-09 Thread Raul Gutierrez Segales (JIRA)

Raul Gutierrez Segales created ZOOKEEPER-1788:
-

 Summary: Support clientID field on connection requests 
 Key: ZOOKEEPER-1788
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1788
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Raul Gutierrez Segales
Priority: Minor


I suspect it's very common for deployments to have a wide variety of client 
libraries (different versions/languages) connecting to a given cluster.

It would be handy to have a way to identify clients via a clientID (akin to 
HTTP's User-Agent header). This could be implemented in 
ZooKeeperServer#processConnectRequest [1] and be fully backwards compatible.

The clientID could then be kept with the corresponding ServerCnxn instance and 
be used for better logging (or stats expose through 4-letter commands). 

The corresponding client side change would be to expose API to set the clientID 
on each connection handler (and by default it could be something like zk java 
$version, zk c $version, etc).

Thoughts?

[1] 
https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L797



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (ZOOKEEPER-1789) 3.4.x observer causes NPE on 3.5.0 (trunk) participants

2013-10-10 Thread Raul Gutierrez Segales (JIRA)

Raul Gutierrez Segales created ZOOKEEPER-1789:
-

 Summary: 3.4.x observer causes NPE on 3.5.0 (trunk) participants
 Key: ZOOKEEPER-1789
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1789
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Alexander Shraer


(assigning to Alex because this was introduced by ZOOKEEPER-107, but will 
upload a patch as well.)

I have a 5 participants cluster running what will be 3.5.0 (i.e.: trunk as of 
today) and an observer running 3.4 (trunk from 3.4 branch). When the observer 
tries to establish a connection to the participants I get:

{noformat}
Thread Thread[10.40.78.121:3888,5,main] died java.lang.NullPointerException at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:240)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:552)
{noformat}

Looking at QuorumCnxManager.java:240:

{noformat}
if (protocolVersion = 0) { // this is a server id and not a 
protocol version 
   sid = protocolVersion;
electionAddr = self.getVotingView().get(sid).electionAddr;
} else {
{noformat}

and self.getVotingView().get(sid) will be null for Observers.  So this block 
should cover that case.  




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1633) Introduce a protocol version to connection initiation message

2013-10-10 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792124#comment-13792124
]

Raul Gutierrez Segales commented on ZOOKEEPER-1633:
---

Are observers mandated to have -1 as their sid? (lots of test cases in the code
base (and deployments!) use something 0). Plus the docs don't indicate that
-1 should be used:
http://zookeeper.apache.org/doc/r3.3.1/zookeeperObservers.html.

Introduce a protocol version to connection initiation message
-

Key: ZOOKEEPER-1633
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1633
Project: ZooKeeper
Issue Type: Bug
Components: server
Reporter: Alexander Shraer
Assignee: Alexander Shraer
Fix For: 3.4.6

Attachments: ZOOKEEPER-1633.patch, ZOOKEEPER-1633-v4.patch,
ZOOKEEPER-1633-v4.patch, ZOOKEEPER-1633-ver2.patch, ZOOKEEPER-1633-ver3.patch

Currently the first message a server sends to another server includes just
one field - the server's id (long). This is in QuorumCnxManager.java. This
makes changes to the information passed during this initial connection very
difficult. This patch will change the first field of the message to be a
protocol version (a negative number that can't be a server id). The second
field will be the server id. The third field is number of bytes in the
remainder of the message. A 3.4 server will read the first field as before,
but if this is a negative number it will read the second field to find the
server id, and then remove the remainder of the message from the stream. This
will not affect 3.4 since 3.4 and earlier servers send just the server id (so
the code in the patch will not run unless there is a server 3.4 trying to
connect). This will, however, provide the necessary flexibility for future
releases as well as an upgrade path from 3.4

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1633) Introduce a protocol version to connection initiation message

2013-10-10 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792126#comment-13792126
]

Raul Gutierrez Segales commented on ZOOKEEPER-1633:
---

Ah - never mind. Alex clarified this in ZOOKEEPER-1789.

Introduce a protocol version to connection initiation message
-

Attachments: ZOOKEEPER-1633.patch, ZOOKEEPER-1633-v4.patch,
ZOOKEEPER-1633-v4.patch, ZOOKEEPER-1633-ver2.patch, ZOOKEEPER-1633-ver3.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1633) Introduce a protocol version to connection initiation message

2013-10-10 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792141#comment-13792141
]

Raul Gutierrez Segales commented on ZOOKEEPER-1633:
---

Now that I come to think of it, I might have seen it in prod.

Introduce a protocol version to connection initiation message
-

Attachments: ZOOKEEPER-1633.patch, ZOOKEEPER-1633-v4.patch,
ZOOKEEPER-1633-v4.patch, ZOOKEEPER-1633-ver2.patch, ZOOKEEPER-1633-ver3.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (ZOOKEEPER-1792) Observers don't need to keep the an in-memory copy of last commited proposals

2013-10-10 Thread Raul Gutierrez Segales (JIRA)

Raul Gutierrez Segales created ZOOKEEPER-1792:
-

 Summary: Observers don't need to keep the an in-memory copy of 
last commited proposals 
 Key: ZOOKEEPER-1792
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1792
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Raul Gutierrez Segales
Priority: Minor


In FinalRequestProcessor.java#processRequest we have:

{noformat}
 if (request.isQuorum()) {
zks.getZKDatabase().addCommittedProposal(request);
 }
{noformat}

but this is only useful to the leader since committed proposals are only used 
from LearnerHandler to sync up followers. I presume followers do need it as 
they might become a leader at any point. But observers have no need for them, 
so we could probably special case this for them and optimize the path for them.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1795) unable to build c client on ubuntu

2013-10-14 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794400#comment-13794400
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1795:
---

fwiw, happens in Fedora 19 too. this patch works for me:

{noformat}
diff --git a/src/c/tests/TestReconfigServer.cc 
b/src/c/tests/TestReconfigServer.cc
index 90bf6f6..a847b37 100644
--- a/src/c/tests/TestReconfigServer.cc
+++ b/src/c/tests/TestReconfigServer.cc
@@ -16,6 +16,7 @@
  */
 #include algorithm
 #include cppunit/extensions/HelperMacros.h
+#include unistd.h
 #include zookeeper.h
 
 #include Util.h
diff --git a/src/c/tests/ZooKeeperQuorumServer.cc 
b/src/c/tests/ZooKeeperQuorumServer.cc
index f8049d2..23392cd 100644
--- a/src/c/tests/ZooKeeperQuorumServer.cc
+++ b/src/c/tests/ZooKeeperQuorumServer.cc
@@ -21,6 +21,7 @@
 #include cstdlib
 #include fstream
 #include sstream
+#include unistd.h
 
 ZooKeeperQuorumServer::
 ZooKeeperQuorumServer(uint32_t id, uint32_t numServers) :
{noformat}

 unable to build c client on ubuntu
 --

 Key: ZOOKEEPER-1795
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1795
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.5.0
Reporter: Patrick Hunt
Priority: Blocker
 Fix For: 3.5.0


 Seems there is an issue for Ubuntu (I'm on 13.04), however I'm only seeing it 
 on trunk and not branch 34
 {noformat}
 make check
 make  zktest-st zktest-mt
 make[1]: Entering directory `/home/phunt/dev/svn/svn-zookeeper/src/c'
 g++ -DHAVE_CONFIG_H -I.  -I./include -I./tests -I./generated  
 -DUSE_STATIC_LIB -DZKSERVER_CMD=\./tests/zkServer.sh\ -DZOO_IPV6_ENABLED 
 -g -O2 -MT zktest_st-TestReconfigServer.o -MD -MP -MF 
 .deps/zktest_st-TestReconfigServer.Tpo -c -o zktest_st-TestReconfigServer.o 
 `test -f 'tests/TestReconfigServer.cc' || echo 
 './'`tests/TestReconfigServer.cc
 tests/TestReconfigServer.cc: In member function 'bool 
 TestReconfigServer::waitForConnected(zhandle_t*, uint32_t)':
 tests/TestReconfigServer.cc:128:16: error: 'sleep' was not declared in this 
 scope
 make[1]: *** [zktest_st-TestReconfigServer.o] Error 1
 make[1]: Leaving directory `/home/phunt/dev/svn/svn-zookeeper/src/c'
 make: *** [check-am] Error 2
 {noformat}
 I have 
 {noformat}
 g++ --version
 g++ (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3
 Copyright (C) 2012 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (ZOOKEEPER-1796) Move common code from {Follower, Observer}ZooKeeperServer into LearnerZooKeeperServer

2013-10-14 Thread Raul Gutierrez Segales (JIRA)

Raul Gutierrez Segales created ZOOKEEPER-1796:
-

 Summary: Move common code from {Follower, Observer}ZooKeeperServer 
into LearnerZooKeeperServer
 Key: ZOOKEEPER-1796
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1796
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Raul Gutierrez Segales
Priority: Trivial


Since ZOOKEEPER-1552 we are enabling syncProcessor in Observers, so we should 
have a proper shutdown() method there. Since FollowerZooKeeperServer already 
has one, which does the same thing that we need, move that to 
LearnerZooKeeperServer along with some related instance variables. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1796) Move common code from {Follower, Observer}ZooKeeperServer into LearnerZooKeeperServer

2013-10-14 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1796:
--

Attachment: ZOOKEEPER-1796.patch

 Move common code from {Follower, Observer}ZooKeeperServer into 
 LearnerZooKeeperServer
 -

 Key: ZOOKEEPER-1796
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1796
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Raul Gutierrez Segales
Priority: Trivial
 Attachments: ZOOKEEPER-1796.patch


 Since ZOOKEEPER-1552 we are enabling syncProcessor in Observers, so we should 
 have a proper shutdown() method there. Since FollowerZooKeeperServer already 
 has one, which does the same thing that we need, move that to 
 LearnerZooKeeperServer along with some related instance variables. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1795) unable to build c client on ubuntu

2013-10-14 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1795:
--

Attachment: ZOOKEEPER-1795.patch

unistd.h is needed for sleep().

 unable to build c client on ubuntu
 --

 Key: ZOOKEEPER-1795
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1795
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.5.0
Reporter: Patrick Hunt
Priority: Blocker
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1795.patch


 Seems there is an issue for Ubuntu (I'm on 13.04), however I'm only seeing it 
 on trunk and not branch 34
 {noformat}
 make check
 make  zktest-st zktest-mt
 make[1]: Entering directory `/home/phunt/dev/svn/svn-zookeeper/src/c'
 g++ -DHAVE_CONFIG_H -I.  -I./include -I./tests -I./generated  
 -DUSE_STATIC_LIB -DZKSERVER_CMD=\./tests/zkServer.sh\ -DZOO_IPV6_ENABLED 
 -g -O2 -MT zktest_st-TestReconfigServer.o -MD -MP -MF 
 .deps/zktest_st-TestReconfigServer.Tpo -c -o zktest_st-TestReconfigServer.o 
 `test -f 'tests/TestReconfigServer.cc' || echo 
 './'`tests/TestReconfigServer.cc
 tests/TestReconfigServer.cc: In member function 'bool 
 TestReconfigServer::waitForConnected(zhandle_t*, uint32_t)':
 tests/TestReconfigServer.cc:128:16: error: 'sleep' was not declared in this 
 scope
 make[1]: *** [zktest_st-TestReconfigServer.o] Error 1
 make[1]: Leaving directory `/home/phunt/dev/svn/svn-zookeeper/src/c'
 make: *** [check-am] Error 2
 {noformat}
 I have 
 {noformat}
 g++ --version
 g++ (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3
 Copyright (C) 2012 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1607) Read-only Observer

2013-10-17 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Raul Gutierrez Segales updated ZOOKEEPER-1607:
--

Attachment: (was: persistent-read-only-for-observers.patch)

Read-only Observer
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1607) Read-only Observer

2013-10-17 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Raul Gutierrez Segales updated ZOOKEEPER-1607:
--

Attachment: ZOOKEEPER-1607.patch

Chatted with Thawan about this and this still probably has to change but I
wanted to go ahead and post this for any interested passerby (since local
sessions support has been merged).

Read-only Observer
--

Attachments: ZOOKEEPER-1607.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1607) Read-only Observer

2013-10-17 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Raul Gutierrez Segales updated ZOOKEEPER-1607:
--

Attachment: (was: ZOOKEEPER-1607.patch)

Read-only Observer
--

Attachments: ZOOKEEPER-1607.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1607) Read-only Observer

2013-10-17 Thread Raul Gutierrez Segales (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Raul Gutierrez Segales updated ZOOKEEPER-1607:
--

Attachment: ZOOKEEPER-1607.patch

The prev patch had remaining bits and pieces of an internal patch to keep stats
using Twitter's stats-util - soz.

Read-only Observer
--

Attachments: ZOOKEEPER-1607.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1732) ZooKeeper server unable to join established ensemble

2013-10-28 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807102#comment-13807102
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1732:
---

[~fpj], [~abranzyck]: did you guys test this patch when joining a cluster of 
servers running without this patch (i.e.: trunk, only without this patch)?

After rolling the first 2 followers - in a 5 member ensemble - the 3rd follower 
fails to join with this:

{noformat}
2013-10-28 18:43:18,134 - INFO  [WorkerReceiver[myid=4]] - Notification: 4 
(n.leader), 0x890415 (n.zxid), 0x6 (n.round), LOOKING (n.state), 4 (n.sid), 
0x89 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2013-10-28 18:43:18,134 - INFO  [WorkerReceiver[myid=4]] - Notification: 2 
(n.leader), 0x88002c (n.zxid), 0x (n.round), FOLLOWING 
(n.state), 0 (n.sid), 0x89 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2013-10-28 18:43:18,135 - INFO  [WorkerReceiver[myid=4]] - Notification: 2 
(n.leader), 0x88002c (n.zxid), 0x6 (n.round), LEADING (n.state), 2 (n.sid), 
0x88 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2013-10-28 18:43:18,135 - INFO  [WorkerReceiver[myid=4]] - Notification: 2 
(n.leader), 0x88002c (n.zxid), 0x6 (n.round), FOLLOWING (n.state), 3 
(n.sid), 0x88 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2013-10-28 18:43:18,136 - INFO  [WorkerReceiver[myid=4]] - Notification: 2 
(n.leader), 0x88002c (n.zxid), 0x (n.round), FOLLOWING 
(n.state), 1 (n.sid), 0x89 (n.peerEPoch), LOOKING (my state)0 (n.config version)
{noformat}

I am guessing IGNOREVALUE (0x) as the round value is causing 
issues? What was the expected behavior here (i.e.: when dealing with cluster 
members without this patch during an upgrade)?

 ZooKeeper server unable to join established ensemble
 

 Key: ZOOKEEPER-1732
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.4.5
 Environment: Windows 7, Java 1.7
Reporter: Germán Blanco
Assignee: Germán Blanco
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz, 
 ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, 
 ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-b3.4.patch, 
 ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, 
 ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch


 I have a test in which I do a rolling restart of three ZooKeeper servers and 
 it was failing from time to time.
 I ran the tests in a loop until the failure came out and it seems that at 
 some point one of the servers is unable to join the enssemble formed by the 
 other two.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1732) ZooKeeper server unable to join established ensemble

2013-10-28 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807565#comment-13807565
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1732:
---

What's wrong with the round values? i.e.: the two new servers have IGNOREVALUE 
(sounds correct right?) and the older followers have the current round value 
(i.e.: 0x6). I thought the problem would be here:

{noformat}
 * @see 
https://issues.apache.org/jira/browse/ZOOKEEPER-1732
  
 */
outofelection.put(n.sid, new Vote(n.leader,
IGNOREVALUE, IGNOREVALUE, n.peerEpoch, 
n.state));
if (termPredicate(outofelection, new Vote(n.leader,
IGNOREVALUE, IGNOREVALUE, n.peerEpoch, n.state))
 checkLeader(outofelection, n.leader, 
IGNOREVALUE)) {
{noformat}

IGNOREVALUE doesn't work here, because we are talking to un-patched cluster 
members.

Sorry if I am completely misleading you :) That's as far as I got with my 
analysis today. 

 ZooKeeper server unable to join established ensemble
 

 Key: ZOOKEEPER-1732
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.4.5
 Environment: Windows 7, Java 1.7
Reporter: Germán Blanco
Assignee: Germán Blanco
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz, 
 ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, 
 ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-b3.4.patch, 
 ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, 
 ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch


 I have a test in which I do a rolling restart of three ZooKeeper servers and 
 it was failing from time to time.
 I ran the tests in a loop until the failure came out and it seems that at 
 some point one of the servers is unable to join the enssemble formed by the 
 other two.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1804) Stat the realtime tps of zookeepr server

2013-10-29 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1804:
--

Attachment: ZOOKEEPER-1804.patch

we've been using this patch, so I guess something along this lines could work. 

 Stat the realtime tps of zookeepr server
 

 Key: ZOOKEEPER-1804
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1804
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Leader Ni
Assignee: Leader Ni
 Attachments: ZOOKEEPER-1804.patch


 At this time, we assessed whether zookeeper supports some business scenarios, 
 always use the number of subscribers, or to assess the number of clients。
 You konw, some times, many client connection with zookeeper, but do noting, 
 and the onthers do complex business logic。
 So，we must stat the realtime tps of zookeepr。



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1732) ZooKeeper server unable to join established ensemble

2013-10-29 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808194#comment-13808194
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1732:
---

Hmm, I still think this could confuse people rolling a cluster. Sounds like we 
should revert this for the next release unless we have a fix for it. Smooth 
upgrades through rolling restarts are an expectation that ZooKeeper has always 
maintained. 

 ZooKeeper server unable to join established ensemble
 

 Key: ZOOKEEPER-1732
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.4.5
 Environment: Windows 7, Java 1.7
Reporter: Germán Blanco
Assignee: Germán Blanco
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz, 
 ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, 
 ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-b3.4.patch, 
 ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, 
 ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch


 I have a test in which I do a rolling restart of three ZooKeeper servers and 
 it was failing from time to time.
 I ran the tests in a loop until the failure came out and it seems that at 
 some point one of the servers is unable to join the enssemble formed by the 
 other two.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1805) Don't care value in ZooKeeper election breaks rolling upgrades

2013-10-30 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809386#comment-13809386
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1805:
---

Testing this now - thanks for the patch [~fpj].

 Don't care value in ZooKeeper election breaks rolling upgrades
 

 Key: ZOOKEEPER-1805
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1805
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Priority: Blocker
 Attachments: ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, 
 ZOOKEEPER-1805.patch


 This is an issue that has been originally reported in ZOOKEEPER-1732.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1805) Don't care value in ZooKeeper election breaks rolling upgrades

2013-10-30 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809419#comment-13809419
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1805:
---

[~fpj]: did a quick test and followers joined nicely - thanks!

 Don't care value in ZooKeeper election breaks rolling upgrades
 

 Key: ZOOKEEPER-1805
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1805
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Priority: Blocker
 Attachments: ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, 
 ZOOKEEPER-1805.patch


 This is an issue that has been originally reported in ZOOKEEPER-1732.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1805) Don't care value in ZooKeeper election breaks rolling upgrades

2013-10-31 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809981#comment-13809981
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1805:
---

Patch looks correct to me - thanks for the swift response [~fpj].

 Don't care value in ZooKeeper election breaks rolling upgrades
 

 Key: ZOOKEEPER-1805
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1805
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Attachments: ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, 
 ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch


 This is an issue that has been originally reported in ZOOKEEPER-1732.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1805) Don't care value in ZooKeeper election breaks rolling upgrades

2013-10-31 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810441#comment-13810441
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1805:
---

2nd test looks fine to me as well. I guess you could DRY it up a bit by having:

{noformat}
HashMapLong, Vote getVotes(long a, long b) {
HashMapLong, Vote votes = new HashMapLong, Vote();
votes.put(0L, new Vote(4L, Vote.DONTCARE, Vote.DONTCARE, a, 
ServerState.FOLLOWING));
votes.put(1L, new Vote(4L, Vote.DONTCARE, Vote.DONTCARE, a, 
ServerState.FOLLOWING));
votes.put(3L, new Vote(4L, 10L, 10L, b, ServerState.FOLLOWING));
votes.put(4L, new Vote(4L, 10L, 10L, b, ServerState.LEADING));
return votes;
}
{noformat}

but I guess copy/pasta is alright in test cases for readability (though I 
rather DRY it).  Thanks [~fpj].

 Don't care value in ZooKeeper election breaks rolling upgrades
 

 Key: ZOOKEEPER-1805
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1805
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, 
 ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch


 This is an issue that has been originally reported in ZOOKEEPER-1732.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1805) Don't care value in ZooKeeper election breaks rolling upgrades

2013-11-01 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811092#comment-13811092
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1805:
---

+1. 

 Don't care value in ZooKeeper election breaks rolling upgrades
 

 Key: ZOOKEEPER-1805
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1805
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1805-b3.4.patch, ZOOKEEPER-1805.patch, 
 ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, 
 ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch


 This is an issue that has been originally reported in ZOOKEEPER-1732.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)

Raul Gutierrez Segales created ZOOKEEPER-1807:
-

 Summary: Observers spam each other creating connections to the 
election addr
 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales


Hey [~shralex],

I noticed today that my Observers are spamming each other trying to open 
connections to the election port. I've got tons of these:

{noformat}
2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection 
already for server 9
2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection 
already for server 10
2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection 
already for server 6
2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection 
already for server 12
2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection 
already for server 14
{noformat}

and so and so on ad nauseam. 

Now, looking around I found this inside FastLeaderElection.java from when you 
committed ZOOKEEPER-107:

{noformat}
 private void sendNotifications() {
-for (QuorumServer server : self.getVotingView().values()) {
-long sid = server.id;
-
+for (long sid : self.getAllKnownServerIds()) {
+QuorumVerifier qv = self.getQuorumVerifier();
{noformat}

Is that really desired? I suspect that is what's causing Observers to try to 
connect to each other (as opposed as just connecting to participants). I'll 
give it a try now and let you know. (Also, we use observer ids that are  0, 
and I saw some parts of the code that might not deal with that assumption - so 
it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811782#comment-13811782
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

Oh - fair enough. So I suspect QuorumCnxManager isn't doing the right thing 
then. Will take look. Thanks for the quick reply!

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales

 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811798#comment-13811798
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

Actually - my initial assessment was wrong (the spammy there is already a 
connection.. message  confused me).I am seeing an excess in traffic between 
Observers through the election port, but it's not due to connection attempts. 
I'll come back with the actual messages. Sorry if this isn't actually related 
to ZOOKEEPER-107, [~shralex].

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales

 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811802#comment-13811802
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

Yes - absolutely [~fpj]. The amount of traffic that I am seeing between 
Observers through the election port is... scary. I am still trying to figure 
out what is going on. Will be back in a bit when I have a proper analysis. 

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811849#comment-13811849
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

Okey - this seems to actually be related to ZOOKEEPER-107, [~shralex].  I added 
some debugging logging and I've see that the spam, to all Observers, are the 
notifications:

{noformat}
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 9, 
peerEpoch = 130, configData = [B@5a0c0ce6
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 12, 
peerEpoch = 130, configData = [B@4d22fe39
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 6, 
peerEpoch = 130, configData = [B@346077bf
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 13, 
peerEpoch = 130, configData = [B@2955b776
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 11, 
peerEpoch = 130, configData = [B@3a7fb92d
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 14, 
peerEpoch = 130, configData = [B@1756575c
2013-11-02 02:33:21,341 - INFO  [WorkerSender[myid=13]] - will send: leader = 
3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 13, 
peerEpoch = 130, configData = [B@258164fc
{noformat}

As you can see, it's sending tons of notifications per second. Not good :)

With this diff in FastLeaderElection.java (i.e.: a revert of part of your 
change):

{noformat}
 private void sendNotifications() {
-for (long sid : self.getAllKnownServerIds()) {
+for (QuorumServer server : self.getVotingView().values()) {
+long sid = server.id;
{noformat}

observers, of course, don't get spammed. I am guessing some condition is 
failing for Observers that assumes the notifications are fresh and sends them 
repeatedly?

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811850#comment-13811850
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

[~fpj]: I think this is 3.5.0 specific since it goes away whilst reverting 
those bits from ZOOKEEPER-107 (there is a chance I am overlooking something, of 
course, and it's some other thing). But this is most likely a blocker for the 
3.5.0 release though. 

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811851#comment-13811851
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

[~thawan]: should omitting the Observers from zoo.cfg actually make any 
difference? If so we should document it somewhere (unless it already is is). In 
my case, where I do explicitly enumerate them I don't get 
observers-to-observers connections on the election port once I remove the bits 
I mentioned above in FLE (so it seems to me it isn't). 

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1790) Deal with special ObserverId in QuorumCnxManager.receiveConnection

2013-11-01 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811852#comment-13811852
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1790:
---

[~fpj]: I don't think it's related - my initial assessment was wrong. It isn't 
connection attempts that generate the extra traffic I am seeing but the 
Notifications (as commented in ZOOKEEPER-1807). 

 Deal with special ObserverId in QuorumCnxManager.receiveConnection
 --

 Key: ZOOKEEPER-1790
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1790
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Alexander Shraer
Assignee: Alexander Shraer
 Fix For: 3.4.6, 3.5.0


 QuorumCnxManager.receiveConnection assumes that a negative sid means that 
 this is a 3.5.0 server, which has a different communication protocol. This 
 doesn't account for the fact that ObserverId = -1 is a special id that may be 
 used by observers and is also negative. 
 This requires a fix to trunk and a separate fix to 3.4 branch, where this 
 function is different (see ZOOKEEPER-1633)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811858#comment-13811858
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

I think what's happening is that when we send the initial notifications to all 
members, as opposed to just voting members as it was before, we trigger off a 
self-replicating cascade of notifications. Each Observers gets the notification 
and then by virtue of:

{noformat}
/*  
  
 * If it is from a non-voting server (such as an 
observer or  
 * a non-voting follower), respond right away.  
  
 */
if(!self.getVotingView().containsKey(response.sid)){
   .
}
{noformat}

it replies back to each Observer and so on.  So sounds to me that this needs to 
match what we have  in sendNotifications and actually check response.sid 
against self.getAllKnownServerIds() to avoid the endless echoing of 
notifications that I am seeing.

Thoughts [~shralex], [~fpj] ?

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-01 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1807:
--

Attachment: ZOOKEEPER-1807.patch

The attached patch prevents sending replies back when we are an Observer. Since 
ZOOKEEPER-107 we send notifications to Observers because they can be promoted 
to Participants. But to avoid replicating replies forver (i.e.: an observer 
sends a notification and the receiving observer then sends another one and so 
on) we don't have to send notifications when we are a LearnerType.OBSERVER. 

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1807.patch


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-02 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811993#comment-13811993
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

Well, if we really need observer to observer responses, for reconfig purposes I 
presume, then should we be sending them to observers not in LOOKING state? See 
the conditions that apply when responding to participants in the lines below my 
patch.

But even still with that being correct it might be too much overhead for large 
Observers deployments. Should this be optional?

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1807.patch


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-02 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812100#comment-13812100
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

Yeah - I think you are right. In this ZOOKEEPER-107 world in which observers 
can be promoted, etc the initial if() doesn't make sense anymore. I'll submit a 
new patch so we can think about it a bit more.

With regards of the overhead and making all of this optional, well if you have 
 100 observers restarted at once you'll have a large of notifications traffic. 
But I guess within the limits of tolerable.

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1807.patch


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-02 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1807:
--

Attachment: (was: ZOOKEEPER-1807.patch)

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-02 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1807:
--

Attachment: ZOOKEEPER-1807.patch

As discussed with [~shralex], we now need to apply the same response logic for 
voting and non-voting members.

[~fpj], [~thawan] - thoughts?

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1807.patch


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-04 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1807:
--

Issue Type: Bug  (was: New Feature)

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1807.patch


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-04 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1807:
--

Attachment: (was: ZOOKEEPER-1807.patch)

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-04 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813077#comment-13813077
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

Thanks for the quick comment Alex. Yeah sounds to me that might be acceptable. 
Again, for huge deployments it might be a bit of concern since you'll be 
putting extra pressure on the cluster after, say, a big network partition. 
Thoughts? Cc: [~thawan], [~fpj]. 

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1807.patch


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-04 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1807:
--

Attachment: notifications-loop.png

Here's how notification traffic (on election port 3888 in my case) goes down 
with the patch (i.e.: without the notifications loop). It's pretty dramatic so 
I'd say this is definitely a blocker for 3.5.0. 

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Germán Blanco
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1807.patch, notifications-loop.png


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1804) Stat the realtime tps of zookeepr server

2013-11-07 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816056#comment-13816056
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1804:
---

Small styling nits, in things like:

{noformat}
+   if ( zkServer == null ) {
+   pw.println( ZK_NOT_SERVING );
{noformat}

the spaces after ( and before ) aren't used in the rest of the code.

Also - at the cost of introducing another dependency though - you might want to 
check Twitter's stats package which has convenience classes/methods for keeping 
stats (also useful for the case of write/read latency to keep p99, etc):

http://twitter.github.io/commons/apidocs/#com.twitter.common.stats.Stats

We use this in an internal branch atm. 


 Stat the realtime tps of zookeepr server
 

 Key: ZOOKEEPER-1804
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1804
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Leader Ni
Assignee: Leader Ni
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1804.patch, ZOOKEEPER-1804.patch


 At this time, we assessed whether zookeeper supports some business scenarios, 
 always use the number of subscribers, or to assess the number of clients。
 You konw, some times, many client connection with zookeeper, but do noting, 
 and the onthers do complex business logic。
 So，we must stat the realtime tps of zookeepr。
 [-Solution---]
 Solution1: 
 If you only want to know the real time transaction processed, you can use the 
 patch ZOOKEEPER-1804.patch.
 Solution2:
 If you also want to know how client use zookeeper, and the real time r/w ps 
 of each zookeeper client, you can use the patch ZOOKEEPER-1804-2.patch
 use java properties: -Dserver_process_stats=true to open the function.
 Sample:
 $echo rwps|nc localhost 2181
 RealTime R/W Statistics:
 getChildren2:　　　0.5994005994005994
 createSession:　　1.6983016983016983
 closeSession:　　　0.999000999000999
 setData:　110.18981018981019
 setWatches:　　　129.17082917082917
 getChildren:　　　 68.83116883116884
 delete:　　19.980019980019982
 create: 　22.27772227772228
 exists:　　1806.2937062937062
 getDate:　729.5704295704296



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-07 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816152#comment-13816152
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

[~shralex]: do you think that, perhaps, adding a comment elaborating a bit more 
on the rationale of notifications and the state of the new/old config would be 
worthwhile? I am thinking the comment should be along sendNotifications().

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, 
 ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807.patch, notifications-loop.png


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-07 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816157#comment-13816157
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

(could we get a reviewboard for this? some inline comments below)

For:

{noformat}
+// start server 3 with new config
+zk[2] = new ZooKeeper(127.0.0.1: + ports[2][2], 
ClientBase.CONNECTION_TIMEOUT, this);
{noformat}

I think the zk[2] assignment goes before the comment. 

For:

{noformat}

+for (int i=2; i3; i++) {
+Assert.assertTrue(waiting for server + i +  being up,
+ClientBase.waitForServerUp(127.0.0.1: + ports[i][2],
+CONNECTION_TIMEOUT * 2));
+ReconfigTest.testServerHasConfig(zk[i], allServersNext, null);  
+}
{noformat}

i= 3? Or no loop if you only want it to loop one time I guess.

Also the ports assignment loop and the currentQuorumCfgSection creation are 
repeated in testObserverConvertedToParticipantDuringFLE and 
testCurrentObserverIsParticipantInNewConfig; mind DRY-ing this up a bit by 
putting those in private methods? (i.e.: generatePorts() and 
generateInitialConfig() or such such). 



 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, 
 ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807.patch, notifications-loop.png


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1808) Add version to FLE notifications for 3.4 branch

2013-11-12 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820599#comment-13820599
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1808:
---

Some stylistic nits:

{noformat}
+requestBuffer.putLong(epoch);
+requestBuffer.putInt( Notification.CURRENTVERSION );
{noformat}

no spaces between parenthesis and parameters. 

{noformat}
+if(response.buffer.remaining() = 4) {
+n.version = response.buffer.getInt();
+} else {
+n.version = 0x0;
+}
 {noformat}

More succinct:
{noformat}
+ n.version ? response.buffer.remaining() = 4 : 
0x0;
{noformat}

Nit:
{noformat}
 private void printNotification(Notification n){
-LOG.info(Notification:  + n.leader +  (n.leader), 0x
+LOG.info(Notification:  + Long.toHexString(n.version) +  (message 
format version),  
...
{noformat}

Maybe that belongs as toString inside Notification?

Super nit: there's two extra newlines in 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java. 


 Add version to FLE notifications for 3.4 branch
 ---

 Key: ZOOKEEPER-1808
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1808
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1808.patch, ZOOKEEPER-1808.patch, 
 ZOOKEEPER-1808.patch, ZOOKEEPER-1808.patch, ZOOKEEPER-1808.patch, 
 ZOOKEEPER-1808.patch, ZOOKEEPER-1808.patch


 Add version to notification messages so that we can differentiate messages 
 during rolling upgrades. This task is for the 3.4 branch only. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1808) Add version to FLE notifications for 3.4 branch

2013-11-12 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820600#comment-13820600
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1808:
---

Sorry meant:

{noformat}
n.version = response.buffer.remaining() = 4 ? response.buffer.getInt() : 0x0;
{noformat}


 Add version to FLE notifications for 3.4 branch
 ---

 Key: ZOOKEEPER-1808
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1808
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1808.patch, ZOOKEEPER-1808.patch, 
 ZOOKEEPER-1808.patch, ZOOKEEPER-1808.patch, ZOOKEEPER-1808.patch, 
 ZOOKEEPER-1808.patch, ZOOKEEPER-1808.patch


 Add version to notification messages so that we can differentiate messages 
 during rolling upgrades. This task is for the 3.4 branch only. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1810) Add version to FLE notifications for trunk

2013-11-14 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822797#comment-13822797
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1810:
---

I think that:

{noformat}
+if(LOG.isInfoEnabled()){
+LOG.info(Backward compatibility mode (36 
bits), server id:  + response.sid);
+}
{noformat}

can do without the LOG.isInfoEnabled since it's already called by LOG.info and 
response.sid isn't computed (just a value accessed, so no savings).


 Add version to FLE notifications for trunk
 --

 Key: ZOOKEEPER-1810
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1810
 Project: ZooKeeper
  Issue Type: Sub-task
Affects Versions: 3.5.0
Reporter: Flavio Junqueira
Assignee: Germán Blanco
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1810.patch, ZOOKEEPER-1810.patch


 The same as ZOOKEEPER-1808 but for trunk.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1810) Add version to FLE notifications for trunk

2013-11-14 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823302#comment-13823302
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1810:
---

Yeah - sorry that was a bit confusing. I guess - if it isn't too much of a 
hassle - reviewboards to make things a bit easier.

 Add version to FLE notifications for trunk
 --

 Key: ZOOKEEPER-1810
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1810
 Project: ZooKeeper
  Issue Type: Sub-task
Affects Versions: 3.5.0
Reporter: Flavio Junqueira
Assignee: Germán Blanco
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1810.patch, ZOOKEEPER-1810.patch


 The same as ZOOKEEPER-1808 but for trunk.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1810) Add version to FLE notifications for trunk

2013-11-14 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823303#comment-13823303
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1810:
---

(I meant for future patches - we can keep on going with this one inside the 
ticket if it's easier.)

 Add version to FLE notifications for trunk
 --

 Key: ZOOKEEPER-1810
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1810
 Project: ZooKeeper
  Issue Type: Sub-task
Affects Versions: 3.5.0
Reporter: Flavio Junqueira
Assignee: Germán Blanco
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1810.patch, ZOOKEEPER-1810.patch


 The same as ZOOKEEPER-1808 but for trunk.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1808) Add version to FLE notifications for 3.4 branch

2013-11-15 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823732#comment-13823732
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1808:
---

(Meant FLETestUtils.createMsg() is called again and again with the same 
params).

 Add version to FLE notifications for 3.4 branch
 ---

 Key: ZOOKEEPER-1808
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1808
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1808.patch, ZOOKEEPER-1808.patch, 
 ZOOKEEPER-1808.patch, ZOOKEEPER-1808.patch, ZOOKEEPER-1808.patch, 
 ZOOKEEPER-1808.patch, ZOOKEEPER-1808.patch, ZOOKEEPER-1808.patch


 Add version to notification messages so that we can differentiate messages 
 during rolling upgrades. This task is for the 3.4 branch only. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

2013-11-15 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823924#comment-13823924
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1807:
---

I am happy to give the RB a shipit but I would prefer to have more 
feedback/reviews from [~thawan] and [~fpj] since they are more familiar with 
the internals of FLE. 

 Observers spam each other creating connections to the election addr
 ---

 Key: ZOOKEEPER-1807
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Raul Gutierrez Segales
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, 
 ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, 
 ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png


 Hey [~shralex],
 I noticed today that my Observers are spamming each other trying to open 
 connections to the election port. I've got tons of these:
 {noformat}
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 9
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 10
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 6
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 12
 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
 connection already for server 14
 {noformat}
 and so and so on ad nauseam. 
 Now, looking around I found this inside FastLeaderElection.java from when you 
 committed ZOOKEEPER-107:
 {noformat}
  private void sendNotifications() {
 -for (QuorumServer server : self.getVotingView().values()) {
 -long sid = server.id;
 -
 +for (long sid : self.getAllKnownServerIds()) {
 +QuorumVerifier qv = self.getQuorumVerifier();
 {noformat}
 Is that really desired? I suspect that is what's causing Observers to try to 
 connect to each other (as opposed as just connecting to participants). I'll 
 give it a try now and let you know. (Also, we use observer ids that are  0, 
 and I saw some parts of the code that might not deal with that assumption - 
 so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1810) Add version to FLE notifications for trunk

2013-11-16 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824620#comment-13824620
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1810:
---

Doesn't look like https://reviews.apache.org/r/15568/ was updated? Or should we 
continue the review (and give the +1s) here?

 Add version to FLE notifications for trunk
 --

 Key: ZOOKEEPER-1810
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1810
 Project: ZooKeeper
  Issue Type: Sub-task
Affects Versions: 3.5.0
Reporter: Flavio Junqueira
Assignee: Germán Blanco
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1810.patch, ZOOKEEPER-1810.patch, 
 ZOOKEEPER-1810.patch, ZOOKEEPER-1810.patch


 The same as ZOOKEEPER-1808 but for trunk.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1817) Fix don't care for b3.4

2013-11-16 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824622#comment-13824622
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1817:
---

With the mix of inline and reviewboard reviews I am not sure where we should 
review this one :) Is there a reviewboard for this one as well or just inline? 
If there is mind adding the link here for posterity - thanks [~fpj].

 Fix don't care for b3.4
 ---

 Key: ZOOKEEPER-1817
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1817
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch


 See umbrella jira.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1817) Fix don't care for b3.4

2013-11-16 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824625#comment-13824625
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1817:
---

Ah - the rb is https://reviews.apache.org/r/15625/. Though it's having issues - 
maybe try reloading? I guess reviewboard applies against the git mirrors and 
there was a lag in Apache's git-svn sync yesterday (i think). 

 Fix don't care for b3.4
 ---

 Key: ZOOKEEPER-1817
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1817
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch


 See umbrella jira.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1653) zookeeper fails to start because of inconsistent epoch

2013-11-16 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824632#comment-13824632
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1653:
---

I take back the last comment, I carelessly overlooked the inheriting class. 

 zookeeper fails to start because of inconsistent epoch
 --

 Key: ZOOKEEPER-1653
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1653
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.4.5
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1653.3.4.patch, ZOOKEEPER-1653.3.4.patch, 
 ZOOKEEPER-1653.patch, ZOOKEEPER-1653.patch


 It looks like QuorumPeer.loadDataBase() could fail if the server was 
 restarted after zk.takeSnapshot() but before finishing 
 self.setCurrentEpoch(newEpoch) in Learner.java.
 {code:java}
 case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
 zk.takeSnapshot();
 self.setCurrentEpoch(newEpoch); //  got restarted here
 snapshotTaken = true;
 writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), 
 true);
 break;
 {code}
 The server fails to start because currentEpoch is still 1 but the last 
 processed zkid from the snapshot has been updated.
 {noformat}
 2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR 
 org.apache.zookeeper.server.quorum.QuorumPeer  - Unable to load database on 
 disk
 java.io.IOException: The current epoch, 1, is older than the last zxid, 
 8589934592
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
 ...
 {noformat}
 {noformat}
 $ find datadir 
 datadir
 datadir/version-2
 datadir/version-2/currentEpoch.tmp
 datadir/version-2/acceptedEpoch
 datadir/version-2/snapshot.0
 datadir/version-2/currentEpoch
 datadir/version-2/snapshot.2
 $ cat datadir/version-2/currentEpoch.tmp
 2%
 $ cat datadir/version-2/acceptedEpoch
 2%
 $ cat datadir/version-2/currentEpoch
 1%
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1653) zookeeper fails to start because of inconsistent epoch

2013-11-16 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824630#comment-13824630
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1653:
---

Nit in:

{noformat}
+static void writeLongToFile(File file, long value) throws IOException {
+AtomicFileOutputStream out = new AtomicFileOutputStream(file);
+BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(out));
+boolean aborted = false;
+try {
+bw.write(Long.toString(value));
+bw.flush();
+out.flush();
+out.close();
+} catch (IOException e) {
+LOG.error(Failed to write new file  + file, e);
+out.abort();
+throw e;
+}
+}
{noformat}

aborted is not used. 

Nit in:

{noformat}
+LOG.info(Validating current epoch:  + servers.mt[i].dataDir);
{noformat}

use {} instead of concatenating. 

Nit:

{noformat}
+// Shut down the cluster
{noformat}

should be Shutdown the cluster.

In src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerTestBase.java:

{noformat}
+CountDownLatch mainFailed;
{noformat}

is assigned and modified but never asserted or checked?

 zookeeper fails to start because of inconsistent epoch
 --

 Key: ZOOKEEPER-1653
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1653
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.4.5
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1653.3.4.patch, ZOOKEEPER-1653.3.4.patch, 
 ZOOKEEPER-1653.patch, ZOOKEEPER-1653.patch


 It looks like QuorumPeer.loadDataBase() could fail if the server was 
 restarted after zk.takeSnapshot() but before finishing 
 self.setCurrentEpoch(newEpoch) in Learner.java.
 {code:java}
 case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
 zk.takeSnapshot();
 self.setCurrentEpoch(newEpoch); //  got restarted here
 snapshotTaken = true;
 writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), 
 true);
 break;
 {code}
 The server fails to start because currentEpoch is still 1 but the last 
 processed zkid from the snapshot has been updated.
 {noformat}
 2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR 
 org.apache.zookeeper.server.quorum.QuorumPeer  - Unable to load database on 
 disk
 java.io.IOException: The current epoch, 1, is older than the last zxid, 
 8589934592
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
 ...
 {noformat}
 {noformat}
 $ find datadir 
 datadir
 datadir/version-2
 datadir/version-2/currentEpoch.tmp
 datadir/version-2/acceptedEpoch
 datadir/version-2/snapshot.0
 datadir/version-2/currentEpoch
 datadir/version-2/snapshot.2
 $ cat datadir/version-2/currentEpoch.tmp
 2%
 $ cat datadir/version-2/acceptedEpoch
 2%
 $ cat datadir/version-2/currentEpoch
 1%
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1653) zookeeper fails to start because of inconsistent epoch

2013-11-16 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824634#comment-13824634
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1653:
---

One more nit in 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java:

{noformat}
+currentEpochFile = new File(new File(follower.dataDir, version-2),
+currentEpoch);

+File updatingEpochFile = new File(
+new File(follower.dataDir, version-2),
+QuorumPeer.UPDATING_EPOCH_FILENAME);
{noformat}

could be abbreviated with:

{noformat}
+File followerDataDir = new File(follower.dataDir, version-2);
+currentEpochFile = new File(followerDataDir, currentEpoch);

+File updatingEpochFile = new File(followerDataDir, 
QuorumPeer.UPDATING_EPOCH_FILENAME);
{noformat}

Also - should there be a constant for currentEpoch too?

 zookeeper fails to start because of inconsistent epoch
 --

 Key: ZOOKEEPER-1653
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1653
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.4.5
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1653.3.4.patch, ZOOKEEPER-1653.3.4.patch, 
 ZOOKEEPER-1653.patch, ZOOKEEPER-1653.patch


 It looks like QuorumPeer.loadDataBase() could fail if the server was 
 restarted after zk.takeSnapshot() but before finishing 
 self.setCurrentEpoch(newEpoch) in Learner.java.
 {code:java}
 case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
 zk.takeSnapshot();
 self.setCurrentEpoch(newEpoch); //  got restarted here
 snapshotTaken = true;
 writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), 
 true);
 break;
 {code}
 The server fails to start because currentEpoch is still 1 but the last 
 processed zkid from the snapshot has been updated.
 {noformat}
 2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR 
 org.apache.zookeeper.server.quorum.QuorumPeer  - Unable to load database on 
 disk
 java.io.IOException: The current epoch, 1, is older than the last zxid, 
 8589934592
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
 ...
 {noformat}
 {noformat}
 $ find datadir 
 datadir
 datadir/version-2
 datadir/version-2/currentEpoch.tmp
 datadir/version-2/acceptedEpoch
 datadir/version-2/snapshot.0
 datadir/version-2/currentEpoch
 datadir/version-2/snapshot.2
 $ cat datadir/version-2/currentEpoch.tmp
 2%
 $ cat datadir/version-2/acceptedEpoch
 2%
 $ cat datadir/version-2/currentEpoch
 1%
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1573) Unable to load database due to missing parent node

2013-11-16 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824644#comment-13824644
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1573:
---

Nit - maybe this:

{noformat}
+ * Snapshots are lazily created. So when the snapshot was in progress
+ * there is a chance that some of the later transactions can go into
+ * snapshot. While restoring same transactions NONODE/NODEEXISTS errors
+ * can come. Basically we can ignore all errors during the restore.
{noformat}

could be more clear like this:

{noformat}
+ * Snapshots are lazily created. So when a snapshot is in progress,
+ * there is a chance for later transactions to make to into the 
snapshot.
+ * Then when the snapshot is restored,  NONODE/NODEEXISTS errors
+ * could occur. It should be safe to ignore these.
{noformat}

Nit:

{noformat}
+LOG.warn(Intrrupted);
{noformat}

typo.

Nit:
{noformat}
+LOG.debug(Ignoring processTxn failure hdr:  + hdr.getType() +  
: error:  + rc.err +  path:  + rc.path);
{noformat}

use string extrapolation with {} instead of string concatenation. 

Nit:
{noformat}
+/**
+ * Test we can restore a snapshot that has delete txns ahead of the zxid 
of the snapshot file. ZOOKEEPER-1573
+ */
{noformat}

make it:

{noformat}
+/**
+ * ZOOKEEPER-1573: test restoring a snapshot with deleted txns ahead of 
the snapshot file's zxid. 
+ */
{noformat}

Nit:
{noformat}
+LOG.info(Set lastProcessedZxid to  + 
zks.getZKDatabase().getDataTreeLastProcessedZxid());
{noformat}

ditto wrt to string extrapolation via {}.



 Unable to load database due to missing parent node
 --

 Key: ZOOKEEPER-1573
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1573
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3, 3.5.0
Reporter: Thawan Kooburat
 Attachments: ZOOKEEPER-1573.patch


 While replaying txnlog on data tree, the server has a code to detect missing 
 parent node. This code block was last modified as part of ZOOKEEPER-1333. In 
 our production, we found a case where this check is return false positive.
 The sequence of txns is as follows:
 zxid 1:  create /prefix/a
 zxid 2:  create /prefix/a/b
 zxid 3:  delete /prefix/a/b
 zxid 4:  delete /prefix/a
 The server start capturing snapshot at zxid 1. However, by the time it 
 traversing the data tree down to /prefix, txn 4 is already applied and 
 /prefix have no children. 
 When the server restore from snapshot, it process txnlog starting from zxid 
 2. This txn generate missing parent error and the server refuse to start up.
 The same check allow me to discover bug in ZOOKEEPER-1551, but I don't know 
 if we have any option beside removing this check to solve this issue.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1573) Unable to load database due to missing parent node

2013-11-17 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824934#comment-13824934
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1573:
---

Thanks for the quick update Vinay. A few more nits:

In:

{noformat}
+ * Snapshots are lazily created. So when a snapshot is in progress,
+ * there is a chance for later transactions to make to into the 
snapshot.
+ * Then when the snapshot is restored,  NONODE/NODEEXISTS errors
+ * could occur. It should be safe to ignore these.
{noformat}

I had a typo in my suggestion (sorry) - it should be: ...transactions to make 
into the snapshot.

In:

{noformat}
+LOG.debug(Ignoring processTxn failure hdr: {}, error: {}, path: 
{}, hdr.getType(), rc.err, rc.path);
{noformat}

the line is too long, maybe:
{noformat}
+LOG.debug(Ignoring processTxn failure hdr: {}, error: {}, path: 
{},
+   hdr.getType(), rc.err, rc.path);
{noformat}

In:

{noformat}
+Assert.assertTrue(waiting for server being up , 
ClientBase.waitForServerUp(HOSTPORT, CONNECTION_TIMEOUT));
{noformat}

line is too long. And this line appears again later on. 

 Unable to load database due to missing parent node
 --

 Key: ZOOKEEPER-1573
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1573
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3, 3.5.0
Reporter: Thawan Kooburat
 Attachments: ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch


 While replaying txnlog on data tree, the server has a code to detect missing 
 parent node. This code block was last modified as part of ZOOKEEPER-1333. In 
 our production, we found a case where this check is return false positive.
 The sequence of txns is as follows:
 zxid 1:  create /prefix/a
 zxid 2:  create /prefix/a/b
 zxid 3:  delete /prefix/a/b
 zxid 4:  delete /prefix/a
 The server start capturing snapshot at zxid 1. However, by the time it 
 traversing the data tree down to /prefix, txn 4 is already applied and 
 /prefix have no children. 
 When the server restore from snapshot, it process txnlog starting from zxid 
 2. This txn generate missing parent error and the server refuse to start up.
 The same check allow me to discover bug in ZOOKEEPER-1551, but I don't know 
 if we have any option beside removing this check to solve this issue.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1817) Fix don't care for b3.4

2013-11-17 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824970#comment-13824970
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1817:
---

One more nit (sorry [~fpj]) in:

{noformat}
-return ( + id + ,  + Long.toHexString(zxid) + ,  + 
Long.toHexString(peerEpoch) + );
+return ( + id + ,  
+   + Long.toHexString(zxid) 
+   + ,  + Long.toHexString(peerEpoch) 
+   + );
{noformat}

should we encourage String.format instead of concatenation (as we do in LOG 
statements with {})? I think this is more readable:

{noformat}
-return ( + id + ,  + Long.toHexString(zxid) + ,  + 
Long.toHexString(peerEpoch) + );
+return String.format((%d, %s, %s), id, Long.toHexString(zxid), 
Long.toHexString(peerEpoch));
{noformat}

What do you think?


 Fix don't care for b3.4
 ---

 Key: ZOOKEEPER-1817
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1817
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, 
 ZOOKEEPER-1817.patch


 See umbrella jira.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1817) Fix don't care for b3.4

2013-11-17 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824972#comment-13824972
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1817:
---

(of course, with the proper line wrap for  80 chars). 

 Fix don't care for b3.4
 ---

 Key: ZOOKEEPER-1817
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1817
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, 
 ZOOKEEPER-1817.patch


 See umbrella jira.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1573) Unable to load database due to missing parent node

2013-11-17 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825014#comment-13825014
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1573:
---

Last nit (though feel free to ignore it since it refers to improving old code 
as well):

{noformat}
+
+long start = System.currentTimeMillis();
+while (!connected) {
+long end = System.currentTimeMillis();
+if (end - start  5000) {
+Assert.assertTrue(Could not connect with server in 5 seconds,
+false);
+}
+try {
+Thread.sleep(200);
+} catch (Exception e) {
+LOG.warn(Interrupted);
+}
+}
{noformat}

this is copy/pasted for two other tests as well - can we move it to a method 
called waitConnected and call that instead? It'll make tests shorted and more 
readable I think. 


 Unable to load database due to missing parent node
 --

 Key: ZOOKEEPER-1573
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1573
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.3, 3.5.0
Reporter: Thawan Kooburat
 Attachments: ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch, 
 ZOOKEEPER-1573.patch


 While replaying txnlog on data tree, the server has a code to detect missing 
 parent node. This code block was last modified as part of ZOOKEEPER-1333. In 
 our production, we found a case where this check is return false positive.
 The sequence of txns is as follows:
 zxid 1:  create /prefix/a
 zxid 2:  create /prefix/a/b
 zxid 3:  delete /prefix/a/b
 zxid 4:  delete /prefix/a
 The server start capturing snapshot at zxid 1. However, by the time it 
 traversing the data tree down to /prefix, txn 4 is already applied and 
 /prefix have no children. 
 When the server restore from snapshot, it process txnlog starting from zxid 
 2. This txn generate missing parent error and the server refuse to start up.
 The same check allow me to discover bug in ZOOKEEPER-1551, but I don't know 
 if we have any option beside removing this check to solve this issue.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1817) Fix don't care for b3.4

2013-11-19 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826775#comment-13826775
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1817:
---

Sorry guys I couldn't test this - don't have a 3.4 setup handy. Will do proper 
testing with trunk though (and of course, the nits in both cases ;-). And 
thanks for testing [~abranzyck].

 Fix don't care for b3.4
 ---

 Key: ZOOKEEPER-1817
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1817
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, 
 ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, 
 ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, logs.tar.gz, logs2.tar.gz


 See umbrella jira.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1817) Fix don't care for b3.4

2013-11-19 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827146#comment-13827146
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1817:
---

(as said before, i have only been testing the upstream version of this ticket)

Some nits:

{noformat}
+if ((state == ServerState.LOOKING) ||
+(other.state == ServerState.LOOKING)) {
+return (id == other.id
  zxid == other.zxid
  electionEpoch == other.electionEpoch
  peerEpoch == other.peerEpoch);
+} else {
+if (version == other.version) {
+return (id == other.id
+ peerEpoch == other.peerEpoch);
+} else {
+return id == other.id;
+}
+} 
+}
{noformat}

could be simplified to:

{noformat}
+if ((state == ServerState.LOOKING) ||
+(other.state == ServerState.LOOKING)) {
+return (id == other.id
  zxid == other.zxid
  electionEpoch == other.electionEpoch
  peerEpoch == other.peerEpoch);
+} else if (version == other.version) {
+return id == other.id  peerEpoch == other.peerEpoch;
+}
+
+return id == other.id;
+}
{noformat}

In src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java:

{noformat}
+//import org.apache.zookeeper.server.quorum.QuorumCnxManager;
{noformat}

just delete that line?

In src/java/test/org/apache/zookeeper/server/quorum/FLEDontCareTest.java I 
guess testOutofElection is still work in progress because of the commented code?


 Fix don't care for b3.4
 ---

 Key: ZOOKEEPER-1817
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1817
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, 
 ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, 
 ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, 
 logs.tar.gz, logs2.tar.gz


 See umbrella jira.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1787) Add support for enabling local session in rolling upgrade

2013-11-19 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1787:
--

Summary: Add support for enabling local session in rolling upgrade  (was: 
Add support enabling local session in rolling upgrade)

 Add support for enabling local session in rolling upgrade
 -

 Key: ZOOKEEPER-1787
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1787
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.0
Reporter: Thawan Kooburat
Priority: Minor

 Currently, local session need to be enable by stopping the entire ensemble. 
 If a rolling upgrade is used, all write request from a local session will 
 fail with session move until the local session is enabled on the leader.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ZOOKEEPER-1787) Add support for enabling local session in rolling upgrade

2013-11-19 Thread Raul Gutierrez Segales (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raul Gutierrez Segales updated ZOOKEEPER-1787:
--

Attachment: ZOOKEEPER-1787.patch

With this patch you can use -Dzookeeper.skipSessionValidation=yes to enable 
local sessions with a rolling upgrade. We should probably add some 
documentation as well - but lets agree on this patch first. 

 Add support for enabling local session in rolling upgrade
 -

 Key: ZOOKEEPER-1787
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1787
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.0
Reporter: Thawan Kooburat
Priority: Minor
 Attachments: ZOOKEEPER-1787.patch


 Currently, local session need to be enable by stopping the entire ensemble. 
 If a rolling upgrade is used, all write request from a local session will 
 fail with session move until the local session is enabled on the leader.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-602) log all exceptions not caught by ZK threads

2013-11-21 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829232#comment-13829232
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-602:
--

Two nits:

{noformat}
+class LearnerCnxAcceptor extends ZooKeeperThread{
{noformat}

space missing (ZooKeeperThread {)

Typo:
{noformat}
+// When there is no worker thread pool, do the work directly 
+// and waiting for its completion
{noformat}

and wait for its completion



 log all exceptions not caught by ZK threads
 ---

 Key: ZOOKEEPER-602
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-602
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.2.1
Reporter: Patrick Hunt
Assignee: Rakesh R
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, 
 ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, 
 ZOOKEEPER-602.patch


 the java code should add a ThreadGroup exception handler that logs at ERROR 
 level any uncaught exceptions thrown by Thread run methods.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1817) Fix don't care for b3.4

2013-11-22 Thread Raul Gutierrez Segales (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830155#comment-13830155
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1817:
---

Even though I didn't actually test this it looks correct to me (and nit-less 
;-) ) - so +1. Will do proper testing for 
https://issues.apache.org/jira/browse/ZOOKEEPER-1818 once 
https://issues.apache.org/jira/browse/ZOOKEEPER-1810 lands. 

 Fix don't care for b3.4
 ---

 Key: ZOOKEEPER-1817
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1817
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Blocker
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, 
 ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, 
 ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, 
 ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, ZOOKEEPER-1817.patch, 
 logs.tar.gz, logs2.tar.gz


 See umbrella jira.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

1 2 3 4 5 6 7 8 9 >

1 - 100 of 889 matches

Mail list logo