from:"Ted Dunning $JIRA$"

[jira] [Commented] (ZOOKEEPER-3188) Improve resilience to network

2018-12-05 Thread Ted Dunning (JIRA)



[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710015#comment-16710015
 ] 

Ted Dunning commented on ZOOKEEPER-3188:


Taking the questions in order:


bq. Did we consider the compatibility requirement here? Will the new 
configuration format be backward compatible? One concrete use case is if a 
customer upgrades to new version with this multiple address per server 
capability but wants to roll back without rewriting the config files to older 
version.

Yes. We considered this.

The compatibility is such that old configurations will work with the new 
version. New configurations will likely not work with older versions (this is 
life). Upgrading without configuration changes will allow transparent roll 
back. Upgrading and changing the configuration to take advantage of multiple 
configurations will require configuration change to roll back. I think that 
this is unavoidable with the current configuration format. A better JSON-ish 
format would be much easier to future proof.

If the upgrade is done using multiple DNS A records for each host instead of 
configuration changes, then transparent roll back should be possible because 
the old code just takes the first address while the new code accepts all 
addresses.

.bq Did we evaluate the impact of this feature on existing server to server 
mutual authentication and authorization feature (e.g. ZOOKEEPER-1045 for 
Kerberos, ZOOKEEPER-236 for SSL), and also the impact on operations? For 
example how to configure Kerberos principals and / or SSL certs per host given 
multiple potential IP address and / or FQDN names per server?

Yes. This was considered.

There are two important cases to consider. The first is the one that arises due 
to multiple DNS records for the same host name. In this case, binding and 
authenticating against the same host name should be transparent. We will test 
this as much as feasible. 

The second case is where there are multiple host names embedded in the 
configuration. This case should also work but each separate address must be 
separately authenticated. Again, we will test this as much as possible.

.bq Could we provide more details on expected level of support with regards to 
dynamic reconfiguration feature? 

I don't understand the question. Dynamic reconfiguration involves changing the 
dynamic part of the configuration file. That can involve addresses, for 
instance. Such changes should be handled exactly the way they are now and there 
should be no interactions with the changes to the networking stack. A commit of 
a new config is a commit.

.bq Examples would be great - for example: we would support adding, removing, 
or updating server address that's appertained to a given server via dynamic 
reconfiguration, and also the expected behavior in each case. For example, 
adding a new address to an existing ensemble member should not cause any 
disconnect / reconnect but removing an in use address of a server should cause 
a disconnect. Likely the dynamic reconfig API / CLI / doc should be updated 
because of this.

I don't really see how this pertains other than the desire not to lose a live 
connection. The documentation, in particular, should be essentially identical 
except that an example of adding an address might be nice (but kind of 
redundant).


> Improve resilience to network
> -
>
> Key: ZOOKEEPER-3188
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3188
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Dunning
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We propose to add network level resiliency to Zookeeper. The ideas that we 
> have on the topic have been discussed on the mailing list and via a 
> specification document that is located at 
> [https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing]
> That document is copied to this issue which is being created to report the 
> results of experimental implementations.
> h1. Zookeeper Network Resilience
> h2. Background
> Zookeeper is designed to help in building distributed systems. It provides a 
> variety of operations for doing this and all of these operations have rather 
> strict guarantees on semantics. Zookeeper itself is a distributed system made 
> up of cluster containing a leader and a number of followers. The leader is 
> designated in a process known as leader election in which a majority of all 
> nodes in the cluster must agree on a leader. All subsequent operations are 
> initiated by the leader and completed when a majority of nodes have confirmed 
> the operation. Whenever an operation cannot be confirmed by a majority or 
> whenever the leader goes missing for a time, a new leader

[jira] [Comment Edited] (ZOOKEEPER-3188) Improve resilience to network

2018-12-05 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710015#comment-16710015
]

Ted Dunning edited comment on ZOOKEEPER-3188 at 12/5/18 12:37 PM:
--

Taking the questions in order:

bq. Did we consider the compatibility requirement here? Will the new
configuration format be backward compatible? One concrete use case is if a
customer upgrades to new version with this multiple address per server
capability but wants to roll back without rewriting the config files to older
version.

Yes. We considered this.

The compatibility is such that old configurations will work with the new
version. New configurations will likely not work with older versions (this is
life). Upgrading without configuration changes will allow transparent roll
back. Upgrading and changing the configuration to take advantage of multiple
configurations will require configuration change to roll back. I think that
this is unavoidable with the current configuration format. A better JSON-ish
format would be much easier to future proof.

If the upgrade is done using multiple DNS A records for each host instead of
configuration changes, then transparent roll back should be possible because
the old code just takes the first address while the new code accepts all
addresses.

bq. Did we evaluate the impact of this feature on existing server to server
mutual authentication and authorization feature (e.g. ZOOKEEPER-1045 for
Kerberos, ZOOKEEPER-236 for SSL), and also the impact on operations? For
example how to configure Kerberos principals and / or SSL certs per host given
multiple potential IP address and / or FQDN names per server?

Yes. This was considered.

There are two important cases to consider. The first is the one that arises due
to multiple DNS records for the same host name. In this case, binding and
authenticating against the same host name should be transparent. We will test
this as much as feasible.

The second case is where there are multiple host names embedded in the
configuration. This case should also work but each separate address must be
separately authenticated. Again, we will test this as much as possible.

bq. Could we provide more details on expected level of support with regards to
dynamic reconfiguration feature?

I don't understand the question. Dynamic reconfiguration involves changing the
dynamic part of the configuration file. That can involve addresses, for
instance. Such changes should be handled exactly the way they are now and there
should be no interactions with the changes to the networking stack. A commit of
a new config is a commit.

bq. Examples would be great - for example: we would support adding, removing,
or updating server address that's appertained to a given server via dynamic
reconfiguration, and also the expected behavior in each case. For example,
adding a new address to an existing ensemble member should not cause any
disconnect / reconnect but removing an in use address of a server should cause
a disconnect. Likely the dynamic reconfig API / CLI / doc should be updated
because of this.

I don't really see how this pertains other than the desire not to lose a live
connection. The documentation, in particular, should be essentially identical
except that an example of adding an address might be nice (but kind of
redundant).

was (Author: tdunning):
Taking the questions in order:

Yes. We considered this.

.bq Did we evaluate the impact of this feature on existing server to server
mutual authentication and authorization feature (e.g. ZOOKEEPER-1045 for
Kerberos, ZOOKEEPER-236 for SSL), and also the impact on operations? For
example how to configure Kerberos principals and / or SSL certs per host given
multiple potential IP address and / or FQDN names per server?

Yes. This was

[jira] [Commented] (ZOOKEEPER-3189) Support new configuration syntax for resilient network feature

2018-11-12 Thread Ted Dunning (JIRA)



[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684812#comment-16684812
 ] 

Ted Dunning commented on ZOOKEEPER-3189:


Supporting any syntax requires having the semantics available.

> Support new configuration syntax for resilient network feature
> --
>
> Key: ZOOKEEPER-3189
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3189
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Dunning
>Priority: Major
>
>  
> There are simultaneous efforts ongoing to support network resilience (3188, 
> blocking this issue) and a new configuration syntax (3166, also blocking this 
> issue) being worked on simultaneously.
> This issue captures the fact that the new syntax will need to be supported by 
> the network resilience code, but both features are pre-requisites for that 
> support.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ZOOKEEPER-3189) Support new configuration syntax for resilient network feature

2018-11-12 Thread Ted Dunning (JIRA)

Ted Dunning created ZOOKEEPER-3189:
--

 Summary: Support new configuration syntax for resilient network 
feature
 Key: ZOOKEEPER-3189
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3189
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Dunning


 

There are simultaneous efforts ongoing to support network resilience (3188, 
blocking this issue) and a new configuration syntax (3166, also blocking this 
issue) being worked on simultaneously.

This issue captures the fact that the new syntax will need to be supported by 
the network resilience code, but both features are pre-requisites for that 
support.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ZOOKEEPER-3188) Improve resilience to network

2018-11-12 Thread Ted Dunning (JIRA)

Ted Dunning created ZOOKEEPER-3188:
--

Summary: Improve resilience to network
Key: ZOOKEEPER-3188
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3188
Project: ZooKeeper
Issue Type: Bug
Reporter: Ted Dunning

We propose to add network level resiliency to Zookeeper. The ideas that we have
on the topic have been discussed on the mailing list and via a specification
document that is located at
[https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing]

That document is copied to this issue which is being created to report the
results of experimental implementations.
h1. Zookeeper Network Resilience
h2. Background

Zookeeper is designed to help in building distributed systems. It provides a
variety of operations for doing this and all of these operations have rather
strict guarantees on semantics. Zookeeper itself is a distributed system made
up of cluster containing a leader and a number of followers. The leader is
designated in a process known as leader election in which a majority of all
nodes in the cluster must agree on a leader. All subsequent operations are
initiated by the leader and completed when a majority of nodes have confirmed
the operation. Whenever an operation cannot be confirmed by a majority or
whenever the leader goes missing for a time, a new leader election is conducted
and normal operations proceed once a new leader is confirmed.

The details of this are not important relative to this discussion. What is
important is that the semantics of the operations conducted by a Zookeeper
cluster and the semantics of how client processes communicate with the cluster
depend only on the basic fact that messages sent over TCP connections will
never appear out of order or missing. Central to the design of ZK is that a
server to server network connection is used as long as it works to use it and a
new connection is made when it appears that the old connection isn't working.

As currently implemented, however, each member of a Zookeeper cluster can have
only a single address as viewed from some other process. This means, absent
network link bonding, that the loss of a single switch or a few network
connections could completely stop the operations of a the Zookeeper cluster. It
is the goal of this work to address this issue by allowing each server to
listen on multiple network interfaces and to connect to other servers any of
several addresses. The effect will be to allow servers to communicate over
redundant network paths to improve resiliency to network failures without
changing any core algorithms.
h2. Proposed Change

Interestingly, the correct operations of a Zookeeper cluster do not depend on
_how_ a TCP connection was made. There is no reason at all not to advertise
multiple addresses for members of a Zookeeper cluster.

Connections between members of a Zookeeper cluster and between a client and a
cluster member are established by referencing a configuration file (for cluster
members) that specifies the address of all of the nodes in a cluster or by
using a connection string containing possible addresses of Zookeeper cluster
members. As soon as a connection is made, any desired authentication or
encryption layers are added and the connection is handed off to the client
communications layer or the server to server logic.

This means that the only thing that actually needs to change to allow Zookeeper
servers to be accessible on multiple networks is a change in the server
configuration file format to allow the multiple addresses to be specified and
to update the code that establishes the TCP connection to make use of these
multiple addresses. No code changes are actually needed on the client since we
can simply supply all possible server addresses. The client already has logic
for selecting a server address at random and it doesn’t really matter if these
addresses represent synonyms for the same server. All that matters is that
_some_ connection to a server is established.
h2. Configuration File Syntax Change

The current Zookeeper syntax looks like this:

tickTime=2000
dataDir=/var/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

The only lines that matter for this discussion are the last three. These
specify the addresses for each of the servers that are part of the Zookeeper
cluster as well as the port numbers used for the servers to talk to each other.

I propose that the current syntax of these lines be augmented to allow a comma
delimited list of addresses. For the current example, we might have this:

server.1=zoo1-net1:2888:3888,zoo1-net2:2888:3888
server.2=zoo2-net1:2888:3888,zoo2-net2:2888:3888
server.3=zoo3-net1:2888:3888

The first two servers are available via two different

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-10-25 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218937#comment-16218937
 ] 

Ted Dunning commented on ZOOKEEPER-2770:



There is a pull request just now that has 95 files changed and 45 commits.

What is this?!?

https://github.com/apache/zookeeper/pull/307/commits

It looks like some wires got seriously crossed here. There is no way that this 
feature should have so many commits.

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-26 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102749#comment-16102749
 ] 

Ted Dunning commented on ZOOKEEPER-2770:


The typical approach is to set a limit on number of messages per unit time
(say one every 10 minutes). Each message that is printed sets a coalescence
time during which no further messages are printed, but a counter is
updated. At the end of the coalescence time a modified message which
mentions that n additional events were detected and the coalescence time is
disabled.

This way if the warnings are rare, you get normal behavior. If the warnings
are frequent, you get at most one message per 10 minutes (or whatever
coalescence period you choose). You get instant notification of a problem
and limited log output.


On Wed, Jul 26, 2017 at 10:05 PM, Karan Mehta (JIRA) 



> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101161#comment-16101161
 ] 

Ted Dunning commented on ZOOKEEPER-2770:



Btw I note that there is no metering on this logging.

That raise an obligatory question. Is there a plausible circumstance where 
thousands of nearly identical messages might be logged?



> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101158#comment-16101158
 ] 

Ted Dunning commented on ZOOKEEPER-2770:


{quote}
With that said, is 300 ms a good value or even less is better?
{quote}

I would suggest that getting a real time varying histogram is the right answer. 
I suggested that early on for just this kind of reason.



> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100945#comment-16100945
 ] 

Ted Dunning commented on ZOOKEEPER-2770:



On second thought, I could imagine that startup transients could cause a long 
operation. Once you have your quorum in a groove, however, >1 second is very 
bad, especially if you don't have something like a quorum leader change 
happening.


> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100942#comment-16100942
 ] 

Ted Dunning commented on ZOOKEEPER-2770:


To put some color on Camille's surprise, I would consider any operation over a 
second to be indicative of gross failure in the quorum. Operations over 100ms 
should be vanishingly rare, but I wouldn't leap up to find out what is 
happening. I would be fairly unhappy, though, and would start checking.


> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100603#comment-16100603
 ] 

Ted Dunning commented on ZOOKEEPER-2770:


[~fournc],

I am not so sure that *I* agree with me at this point.

It is fair to say that on occasion there are slow operations in ZK and it would 
be good to know about them. 

This kind of problem is almost always due, in my own vicarious experience,  to 
bad configuration. Often the bad configuration is simply collocation with a 
noisy neighbor on a deficient storage layer.  There might be situations where 
an operation is slow due to the content of the query itself, but I cannot 
imagine what those situations might be.  Writing a large value (but that is 
strictly limited in size), or even doing a huge multi-op (which has the same 
limited size in aggregate) should never take very long.

As such, I would expect that the highest diagnostic value would not be 
something that dumped the contents of slow queries, but rather a capability 
that characterizes the entire distribution of query times. The frequency of 
slow queries is a diagnostic of sorts, but is one that could be inferred from 
the time-varying distributional information I was suggesting.

That said, I don't think that a slow query log is a BAD thing (except a bit bad 
in terms of security if it logs the actual query). And I wouldn't want the BEST 
thing (a distribution log) to stop somebody contributing something.




> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-10 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081514#comment-16081514
 ] 

Ted Dunning commented on ZOOKEEPER-2770:


Who are we?

This kind of feature needs to be discussed on the dev@zookeeper mailing list. I 
hate to be a prig about this, but one of the truisms at Apache is that if it 
didn't happen on the list, it didn't happen.



> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-06 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077254#comment-16077254
 ] 

Ted Dunning commented on ZOOKEEPER-2770:


Would it be a good idea to extend the idea of this patch to include more 
generalized latency monitoring?



> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-06 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077221#comment-16077221
 ] 

Ted Dunning commented on ZOOKEEPER-2770:


I don't see any discussion of this on the mailing list. Also, the patch was 
posted to this bug 2 hours after it was filed.

Is the problem you are trying to solve being discussed somewhere off list?



> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (ZOOKEEPER-2509) Secure mode leaks memory

2016-08-11 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated ZOOKEEPER-2509:
---
Attachment: 0001-Updated-patch-for-Netty-leak-testing-to-trunk.patch

Updated patch (which was for 3.5.1) to trunk

> Secure mode leaks memory
> 
>
> Key: ZOOKEEPER-2509
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2509
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.1, 3.5.2
>Reporter: Ted Dunning
> Fix For: 3.5.3
>
> Attachments: 
> 0001-Updated-patch-for-Netty-leak-testing-to-trunk.patch, leak-patch.patch
>
>
> The Netty connection handling logic fails to clean up watches on connection 
> close. This causes memory to leak.
> I will have a repro script available soon and a fix. I am not sure how to 
> build a unit test since we would need to build an entire server and generate 
> keys and such. Advice on that appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2509) Secure mode leaks memory

2016-08-10 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416280#comment-15416280
 ] 

Ted Dunning commented on ZOOKEEPER-2509:


[~rgs] Thanks for the pointer. I got some hints from that other test that 
helped get rid of some spurious exceptions.

> Secure mode leaks memory
> 
>
> Key: ZOOKEEPER-2509
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2509
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.1, 3.5.2
>Reporter: Ted Dunning
> Fix For: 3.5.3
>
> Attachments: leak-patch.patch
>
>
> The Netty connection handling logic fails to clean up watches on connection 
> close. This causes memory to leak.
> I will have a repro script available soon and a fix. I am not sure how to 
> build a unit test since we would need to build an entire server and generate 
> keys and such. Advice on that appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2509) Secure mode leaks memory

2016-08-10 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated ZOOKEEPER-2509:
---
Attachment: leak-patch.patch

This is a fix for the problem along with a unit test.

> Secure mode leaks memory
> 
>
> Key: ZOOKEEPER-2509
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2509
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.1, 3.5.2
>Reporter: Ted Dunning
> Fix For: 3.5.3
>
> Attachments: leak-patch.patch
>
>
> The Netty connection handling logic fails to clean up watches on connection 
> close. This causes memory to leak.
> I will have a repro script available soon and a fix. I am not sure how to 
> build a unit test since we would need to build an entire server and generate 
> keys and such. Advice on that appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2510) org.apache.zookeeper.server.NettyServerCnxnTest uses wrong import for junit

2016-08-10 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated ZOOKEEPER-2510:
---
Attachment: fix-import.patch

Trivial fix. Impossible to test, really.

> org.apache.zookeeper.server.NettyServerCnxnTest uses wrong import for junit
> ---
>
> Key: ZOOKEEPER-2510
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2510
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Dunning
> Attachments: fix-import.patch
>
>
> junit.framework.Assert is deprecated. The code should use org.junit.Assert 
> instead.
> Patch coming shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ZOOKEEPER-2510) org.apache.zookeeper.server.NettyServerCnxnTest uses wrong import for junit

2016-08-10 Thread Ted Dunning (JIRA)

Ted Dunning created ZOOKEEPER-2510:
--

 Summary: org.apache.zookeeper.server.NettyServerCnxnTest uses 
wrong import for junit
 Key: ZOOKEEPER-2510
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2510
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Dunning


junit.framework.Assert is deprecated. The code should use org.junit.Assert 
instead.

Patch coming shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2509) Secure mode leaks memory

2016-08-10 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416190#comment-15416190
 ] 

Ted Dunning commented on ZOOKEEPER-2509:


[~rgs] It is possible that would work. I have a working test now that exhibits 
the problem and the fix, but I don't yet like how I had to get the test system 
to instantiate a Netty connection factory. What I did was to change the system 
property controlling this. That is, of course, a recipe for non-deterministic 
disaster if multiple tests run at the same time.

Once I got the test to run at all, I ran into a 
java.nio.channels.ClosedChannelException as the server shuts down. That 
probably indicates something is wrong with the test framework or the netty code 
itself, but it is very hard to see what.



> Secure mode leaks memory
> 
>
> Key: ZOOKEEPER-2509
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2509
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.1
>Reporter: Ted Dunning
>
> The Netty connection handling logic fails to clean up watches on connection 
> close. This causes memory to leak.
> I will have a repro script available soon and a fix. I am not sure how to 
> build a unit test since we would need to build an entire server and generate 
> keys and such. Advice on that appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2317) Non-OSGi compatible version

2016-08-10 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated ZOOKEEPER-2317:
---
Attachment: 0001-Put-a-different-version-string-into-the-jar-meta-dat.patch


Here is a patch with a proposed fix. The idea is to segregate out an OSGI 
compatible version string that gets put into the meta-data. I don't have an 
OSGI container handy for testing, but will be able to get a customer to test 
shortly.

> Non-OSGi compatible version
> ---
>
> Key: ZOOKEEPER-2317
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2317
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.5.1
> Environment: Karaf OSGi container
>Reporter: Markus Tippmann
>Assignee: Sachin
>Priority: Blocker
> Fix For: 3.5.3
>
> Attachments: 
> 0001-Put-a-different-version-string-into-the-jar-meta-dat.patch
>
>
> Bundle cannot be deployed to OSGi container.
> Manifest version is not OSGi compatible.
> Instead of using 3.5.1-alpha, manifest needs to contain 3.5.1.alpha



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2509) Secure mode leaks memory

2016-08-09 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated ZOOKEEPER-2509:
---
Affects Version/s: 3.5.1

> Secure mode leaks memory
> 
>
> Key: ZOOKEEPER-2509
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2509
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.1
>Reporter: Ted Dunning
>
> The Netty connection handling logic fails to clean up watches on connection 
> close. This causes memory to leak.
> I will have a repro script available soon and a fix. I am not sure how to 
> build a unit test since we would need to build an entire server and generate 
> keys and such. Advice on that appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1366) Zookeeper should be tolerant of clock adjustments

2015-02-01 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300826#comment-14300826
]

Ted Dunning commented on ZOOKEEPER-1366:

Actually, no. I hadn't considered them. In fact, I had pretty much given up
on this patch since it was ignored for two years after it caused a major P1
outage at multiple sites. I figured it was just me that considered it
important enough to try to fix.

And at this point, I don't have time to update the patch. Check with Patrick
or Michi.

(I am only vaguely sorry about being snippy here. This was an egregious case
of ignoring a fairly valid patch on an important issue).

Zookeeper should be tolerant of clock adjustments
-

Key: ZOOKEEPER-1366
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1366
Project: ZooKeeper
Issue Type: Bug
Reporter: Ted Dunning
Assignee: Ted Dunning
Priority: Critical
Fix For: 3.5.1

Attachments: ZOOKEEPER-1366-3.3.3.patch, ZOOKEEPER-1366.patch,
ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch,
ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch,
ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch,
zookeeper-3.4.5-ZK1366-SC01.patch

If you want to wreak havoc on a ZK based system just do [date -s +1hour]
and watch the mayhem as all sessions expire at once.
This shouldn't happen. Zookeeper could easily know handle elapsed times as
elapsed times rather than as differences between absolute times. The
absolute times are subject to adjustment when the clock is set while a timer
is not subject to this problem. In Java, System.currentTimeMillis() gives
you absolute time while System.nanoTime() gives you time based on a timer
from an arbitrary epoch.
I have done this and have been running tests now for some tens of minutes
with no failures. I will set up a test machine to redo the build again on
Ubuntu and post a patch here for discussion.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2025) Single-node ejection caused apparent reconnection storm, leading to cluster unresponsiveness

2014-10-16 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174009#comment-14174009
 ] 

Ted Dunning commented on ZOOKEEPER-2025:



Has anybody looked at this?



 Single-node ejection caused apparent reconnection storm, leading to cluster 
 unresponsiveness
 

 Key: ZOOKEEPER-2025
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2025
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client, server
Affects Versions: 3.4.5
Reporter: Stephen Tyree
 Attachments: zookeeper_issues.pdf


 Description will be included in an attached PDF.
 The two main questions we have are:
 1: What would be the cause of the Unreasonable Length error in our context, 
 and how might we prevent it from occurring?
 2: What can we do to prevent the reconnection storm that led to the cluster 
 becoming unresponsive?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2025) Single-node ejection caused apparent reconnection storm, leading to cluster unresponsiveness

2014-09-16 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135510#comment-14135510
 ] 

Ted Dunning commented on ZOOKEEPER-2025:


Hongchao,

I don't see how that really addresses the problems here.  The throttling code 
is still very naive.  And the complete throttling of clients with no client 
back off is also not a great solution since there is no source quench under 
pretty reasonable scenarios.

It also doesn't address the question of why one server would send an 
unreasonably large packet to another.

Flavio,

Have you had a chance to look at this?



 Single-node ejection caused apparent reconnection storm, leading to cluster 
 unresponsiveness
 

 Key: ZOOKEEPER-2025
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2025
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client, server
Affects Versions: 3.4.5
Reporter: Stephen Tyree
 Attachments: zookeeper_issues.pdf


 Description will be included in an attached PDF.
 The two main questions we have are:
 1: What would be the cause of the Unreasonable Length error in our context, 
 and how might we prevent it from occurring?
 2: What can we do to prevent the reconnection storm that led to the cluster 
 becoming unresponsive?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2025) Single-node ejection caused apparent reconnection storm, leading to cluster unresponsiveness

2014-09-12 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132297#comment-14132297
 ] 

Ted Dunning commented on ZOOKEEPER-2025:


Brett,

That is a pretty interesting scenario.

Flavio,

What do you think about that?




 Single-node ejection caused apparent reconnection storm, leading to cluster 
 unresponsiveness
 

 Key: ZOOKEEPER-2025
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2025
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client, server
Affects Versions: 3.4.5
Reporter: Stephen Tyree
 Attachments: zookeeper_issues.pdf


 Description will be included in an attached PDF.
 The two main questions we have are:
 1: What would be the cause of the Unreasonable Length error in our context, 
 and how might we prevent it from occurring?
 2: What can we do to prevent the reconnection storm that led to the cluster 
 becoming unresponsive?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1502) Prevent multiple zookeeper servers from using the same data directory

2014-03-23 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944466#comment-13944466
]

Ted Dunning commented on ZOOKEEPER-1502:

File locking is notoriously unreliable. There is a history of it not even
working correctly on local physical volumes and there are many examples of it
not working correctly on SAN and NAS devices.

Depending on it, is not a great idea. At the very least, you should make sure
that the code works if flock is not available.

Prevent multiple zookeeper servers from using the same data directory
-

Key: ZOOKEEPER-1502
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1502
Project: ZooKeeper
Issue Type: Improvement
Components: server
Affects Versions: 3.4.3
Reporter: Will Johnson
Assignee: Rakesh R
Fix For: 3.5.0

Attachments: ZOOKEEPER-1502.patch

We recently ran into an issue where two zookeepers servers which were a part
of two separate quorums were configured to use the same data directory.
Interestingly, the zookeeper servers did not seem to complain and both seemed
to work fine until one of them was restarted. Once that happened all sort of
chaos ensued. I understand that this is a misconfiguration should zookeeper
complain about this or do users need to protect themselves in some external
fashion? Is a simple file lock enough or are there other things I should
take into consideration if it’s up to me to handle?

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (ZOOKEEPER-1366) Zookeeper should be tolerant of clock adjustments

2013-09-04 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757489#comment-13757489
 ] 

Ted Dunning commented on ZOOKEEPER-1366:


Why do you say this?

What scenario would cause a breakage?



 Zookeeper should be tolerant of clock adjustments
 -

 Key: ZOOKEEPER-1366
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1366
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1366-3.3.3.patch, ZOOKEEPER-1366.patch, 
 ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch, 
 ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch, 
 zookeeper-3.4.5-ZK1366-SC01.patch


 If you want to wreak havoc on a ZK based system just do [date -s +1hour] 
 and watch the mayhem as all sessions expire at once.
 This shouldn't happen.  Zookeeper could easily know handle elapsed times as 
 elapsed times rather than as differences between absolute times.  The 
 absolute times are subject to adjustment when the clock is set while a timer 
 is not subject to this problem.  In Java, System.currentTimeMillis() gives 
 you absolute time while System.nanoTime() gives you time based on a timer 
 from an arbitrary epoch.
 I have done this and have been running tests now for some tens of minutes 
 with no failures.  I will set up a test machine to redo the build again on 
 Ubuntu and post a patch here for discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1633) Introduce a protocol version to connection initiation message

2013-04-02 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619771#comment-13619771
]

Ted Dunning commented on ZOOKEEPER-1633:

Alex,

Sorry to come in late here.

I have a few comments.

1) is a version number really what you want here? Shouldn't it be done more
like modern protocols such as protobufs to introduce a mechanism of optional
fields? Strict versioning of protocols is very unpopular any more because of
the brittleness introduced into protocols.

2) it is possible that there is an irreconciliable conflict in version between
correspondents. In such cases, it is important to signal this clearly. As
such, it is good to add not only versioning information to the original
request, but a very stable reply that indicates that there is an
irreconciliable version mismatch. Is there a way that you can do this in your
proposal?

Introduce a protocol version to connection initiation message
-

Key: ZOOKEEPER-1633
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1633
Project: ZooKeeper
Issue Type: Bug
Components: server
Reporter: Alexander Shraer
Assignee: Alexander Shraer
Fix For: 3.4.6

Attachments: ZOOKEEPER-1633.patch, ZOOKEEPER-1633-v4.patch,
ZOOKEEPER-1633-v4.patch, ZOOKEEPER-1633-ver2.patch, ZOOKEEPER-1633-ver3.patch

Currently the first message a server sends to another server includes just
one field - the server's id (long). This is in QuorumCnxManager.java. This
makes changes to the information passed during this initial connection very
difficult. This patch will change the first field of the message to be a
protocol version (a negative number that can't be a server id). The second
field will be the server id. The third field is number of bytes in the
remainder of the message. A 3.4 server will read the first field as before,
but if this is a negative number it will read the second field to find the
server id, and then remove the remainder of the message from the stream. This
will not affect 3.4 since 3.4 and earlier servers send just the server id (so
the code in the patch will not run unless there is a server 3.4 trying to
connect). This will, however, provide the necessary flexibility for future
releases as well as an upgrade path from 3.4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1629) testTrancationLogCorruption occasionally fails

2013-03-27 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615024#comment-13615024
 ] 

Ted Dunning commented on ZOOKEEPER-1629:


WHile you are at it, could you fix the spelling of the test case name?

 testTrancationLogCorruption occasionally fails
 --

 Key: ZOOKEEPER-1629
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1629
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Reporter: Alexander Shraer
 Attachments: TruncateCorruptionTest-patch.patch


 It seems that testTransactionLogCorruption is very flaky,for example fails 
 here:
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/500/
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/502/
 https://builds.apache.org/job/ZooKeeper-trunk-jdk7/503/#showFailuresLink
 also fails for older builds (no longer on the website), for example all 
 builds from 381 to 399.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1640) dynamically load class objects instead of hard code

2013-02-18 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581012#comment-13581012
 ] 

Ted Dunning commented on ZOOKEEPER-1640:


Why is this important?  Why does it matter for a library to load code on the 
client side?

Isn't that what the caller of the library is supposed to do?

 dynamically load class objects instead of hard code
 ---

 Key: ZOOKEEPER-1640
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1640
 Project: ZooKeeper
  Issue Type: Improvement
  Components: java client
Reporter: Tian Hong Wang
Assignee: Tian Hong Wang
  Labels: patch
 Fix For: 3.4.5

 Attachments: zookeeper.patch


 Class org.apache.zookeeper.ZooKeeperMain.java uses hard code to load command 
 objects in client site. 
 It need to dynamically load command objects in case of command extension.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1640) dynamically load command objects in zk

2013-02-18 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581035#comment-13581035
 ] 

Ted Dunning commented on ZOOKEEPER-1640:


So why is this important on the server side?  Do you really imagine that 
anybody would normally do this?

Isn't this a massive security hole waiting to happen?



 dynamically load command objects in zk
 --

 Key: ZOOKEEPER-1640
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1640
 Project: ZooKeeper
  Issue Type: Improvement
  Components: java client
Reporter: Tian Hong Wang
Assignee: Tian Hong Wang
Priority: Minor
  Labels: patch
 Fix For: 3.4.5

 Attachments: zookeeper.patch


 In class org.apache.zookeeper.ZooKeeperMain.java,
 new CloseCommand().addToMap(commandMapCli);
 new CreateCommand().addToMap(commandMapCli);
 new DeleteCommand().addToMap(commandMapCli);
 new DeleteAllCommand().addToMap(commandMapCli);
 // Depricated: rmr
 new DeleteAllCommand(rmr).addToMap(commandMapCli);
 new SetCommand().addToMap(commandMapCli);
 new GetCommand().addToMap(commandMapCli);
 new LsCommand().addToMap(commandMapCli);
 new Ls2Command().addToMap(commandMapCli);
 new GetAclCommand().addToMap(commandMapCli);
 new SetAclCommand().addToMap(commandMapCli);
 new StatCommand().addToMap(commandMapCli);
 new SyncCommand().addToMap(commandMapCli);
 new SetQuotaCommand().addToMap(commandMapCli);
 new ListQuotaCommand().addToMap(commandMapCli);
 new DelQuotaCommand().addToMap(commandMapCli);
 new AddAuthCommand().addToMap(commandMapCli);
 The above code is not flexible for command object scalability. It's better to 
 refine the code to load and create the command objects dynamically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1616) time calculations should use a monotonic clock

2013-01-09 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548949#comment-13548949
 ] 

Ted Dunning commented on ZOOKEEPER-1616:


OK,

So can we get ZOOKEEPER-1366 patches reviewed and committed?  I can bring them 
up to date on whatever branch anybody would like.  Just let me know which 
branches you are willing to review on.

 time calculations should use a monotonic clock
 --

 Key: ZOOKEEPER-1616
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1616
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Todd Lipcon

 We recently had an issue with ZooKeeper sessions acting strangely due to a 
 bad NTP setup on a set of hosts. Looking at the code, ZK seems to use 
 System.currentTimeMillis to measure durations or intervals in many places. 
 This is bad since that time can move backwards or skip ahead by several 
 minutes. Instead, it should use System.nanoTime (or a wrapper such as Guava's 
 Stopwatch class)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1616) time calculations should use a monotonic clock

2013-01-08 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547649#comment-13547649
 ] 

Ted Dunning commented on ZOOKEEPER-1616:


Uh...

I thought we patched this ages ago.

See ZOOKEEPER-1366



 time calculations should use a monotonic clock
 --

 Key: ZOOKEEPER-1616
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1616
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Todd Lipcon

 We recently had an issue with ZooKeeper sessions acting strangely due to a 
 bad NTP setup on a set of hosts. Looking at the code, ZK seems to use 
 System.currentTimeMillis to measure durations or intervals in many places. 
 This is bad since that time can move backwards or skip ahead by several 
 minutes. Instead, it should use System.nanoTime (or a wrapper such as Guava's 
 Stopwatch class)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1616) time calculations should use a monotonic clock

2013-01-08 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547651#comment-13547651
 ] 

Ted Dunning commented on ZOOKEEPER-1616:


Todd,

Can you fill in the versions for which you observed this problem?

 time calculations should use a monotonic clock
 --

 Key: ZOOKEEPER-1616
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1616
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Todd Lipcon

 We recently had an issue with ZooKeeper sessions acting strangely due to a 
 bad NTP setup on a set of hosts. Looking at the code, ZK seems to use 
 System.currentTimeMillis to measure durations or intervals in many places. 
 This is bad since that time can move backwards or skip ahead by several 
 minutes. Instead, it should use System.nanoTime (or a wrapper such as Guava's 
 Stopwatch class)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1599) 3.3 server cannot join 3.4 quorum

2012-12-14 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532801#comment-13532801
 ] 

Ted Dunning commented on ZOOKEEPER-1599:


I think that Flavio is simply pointing out that if 3.4 can follow 3.3, but 3.3 
cannot follow 3.4, then a rolling upgrade works.  If you have A,B,C then 
upgrading A will force B or C to lead.  Assume B WLOG.  When you take down B, A 
might want to become leader, but will fail so C will lead with A and C in the 
quorum.  When you take down C after B is back, either A or B can become a 3.4 
leader.

Surprisingly, this also works for downgrades.  If you downgrade C, A or B can 
remain leader.  When you downgrade B, however, A cannot be leader anymore so C 
will take over while B is down.  Downgrading A now gives you a pure 3.3 cluster.

The question of how a 3.4 leader can be sure not to allow a 3.3 follower while 
staying otherwise compatible is a different kettle of fish, but obviously 
critical to the scenario.

 3.3 server cannot join 3.4 quorum
 -

 Key: ZOOKEEPER-1599
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1599
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.6, 3.4.5
Reporter: Skye Wanderman-Milne
Assignee: Skye Wanderman-Milne
Priority: Blocker
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1599.patch


 When a 3.3 server attempts to join an existing quorum lead by a 3.4 server, 
 the 3.3 server is disconnected while trying to download the leader's 
 snapshot. The 3.3 server restarts and starts the process over again, but is 
 never able to join the quorum.
 3.3 server log:
 {code}
 2012-12-07 10:44:34,582 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@294] - Getting a snapshot from 
 leader
 2012-12-07 10:44:34,582 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@325] - Setting leader epoch 12
 2012-12-07 10:44:54,604 - WARN  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@82] - Exception when following the 
 leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:392)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
 at 
 org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:332)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
 2012-12-07 10:44:54,605 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@165] - shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
 {code}
 3.4 leader log:
 {code}
 2012-12-07 10:51:35,178 [myid:2] - INFO  
 [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@273] - 
 Backward compatibility mode, server id=3
 2012-12-07 10:51:35,178 [myid:2] - INFO  
 [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 3 (n.leader), 
 0x11 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x11 
 (n.peerEPoch), LEADING (my state)
 2012-12-07 10:51:35,182 [myid:2] - INFO  
 [LearnerHandler-/127.0.0.1:37654:LearnerHandler@263] - Follower sid: 3 : info 
 : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@262f4873
 2012-12-07 10:51:35,182 [myid:2] - INFO  
 [LearnerHandler-/127.0.0.1:37654:LearnerHandler@318] - Synchronizing with 
 Follower sid: 3 maxCommittedLog=0x0 minCommittedLog=0x0 
 peerLastZxid=0x11
 2012-12-07 10:51:35,182 [myid:2] - INFO  
 [LearnerHandler-/127.0.0.1:37654:LearnerHandler@395] - Sending SNAP
 2012-12-07 10:51:35,183 [myid:2] - INFO  
 [LearnerHandler-/127.0.0.1:37654:LearnerHandler@419] - Sending snapshot last 
 zxid of peer is 0x11  zxid of leader is 0x12sent zxid of db 
 as 0x12
 2012-12-07 10:51:55,204 [myid:2] - ERROR 
 [LearnerHandler-/127.0.0.1:37654:LearnerHandler@562] - Unexpected exception 
 causing shutdown while sock still open
 java.net.SocketTimeoutException: Read timed out
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:150)
 at java.net.SocketInputStream.read(SocketInputStream.java:121)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
 at

[jira] [Commented] (ZOOKEEPER-1599) 3.3 server cannot join 3.4 quorum

2012-12-14 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532825#comment-13532825
 ] 

Ted Dunning commented on ZOOKEEPER-1599:


Ultimately, if somebody else gets a quorum, the 3.4 node should give up trying 
to become leader.

 3.3 server cannot join 3.4 quorum
 -

 Key: ZOOKEEPER-1599
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1599
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.6, 3.4.5
Reporter: Skye Wanderman-Milne
Assignee: Skye Wanderman-Milne
Priority: Blocker
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1599.patch


 When a 3.3 server attempts to join an existing quorum lead by a 3.4 server, 
 the 3.3 server is disconnected while trying to download the leader's 
 snapshot. The 3.3 server restarts and starts the process over again, but is 
 never able to join the quorum.
 3.3 server log:
 {code}
 2012-12-07 10:44:34,582 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@294] - Getting a snapshot from 
 leader
 2012-12-07 10:44:34,582 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@325] - Setting leader epoch 12
 2012-12-07 10:44:54,604 - WARN  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@82] - Exception when following the 
 leader
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:392)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
 at 
 org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:332)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
 2012-12-07 10:44:54,605 - INFO  
 [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@165] - shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
 {code}
 3.4 leader log:
 {code}
 2012-12-07 10:51:35,178 [myid:2] - INFO  
 [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@273] - 
 Backward compatibility mode, server id=3
 2012-12-07 10:51:35,178 [myid:2] - INFO  
 [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 3 (n.leader), 
 0x11 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x11 
 (n.peerEPoch), LEADING (my state)
 2012-12-07 10:51:35,182 [myid:2] - INFO  
 [LearnerHandler-/127.0.0.1:37654:LearnerHandler@263] - Follower sid: 3 : info 
 : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@262f4873
 2012-12-07 10:51:35,182 [myid:2] - INFO  
 [LearnerHandler-/127.0.0.1:37654:LearnerHandler@318] - Synchronizing with 
 Follower sid: 3 maxCommittedLog=0x0 minCommittedLog=0x0 
 peerLastZxid=0x11
 2012-12-07 10:51:35,182 [myid:2] - INFO  
 [LearnerHandler-/127.0.0.1:37654:LearnerHandler@395] - Sending SNAP
 2012-12-07 10:51:35,183 [myid:2] - INFO  
 [LearnerHandler-/127.0.0.1:37654:LearnerHandler@419] - Sending snapshot last 
 zxid of peer is 0x11  zxid of leader is 0x12sent zxid of db 
 as 0x12
 2012-12-07 10:51:55,204 [myid:2] - ERROR 
 [LearnerHandler-/127.0.0.1:37654:LearnerHandler@562] - Unexpected exception 
 causing shutdown while sock still open
 java.net.SocketTimeoutException: Read timed out
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:150)
 at java.net.SocketInputStream.read(SocketInputStream.java:121)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
 at java.io.DataInputStream.readInt(DataInputStream.java:387)
 at 
 org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:450)
 2012-12-07 10:51:55,205 [myid:2] - WARN  
 [LearnerHandler-/127.0.0.1:37654:LearnerHandler@575] - *** GOODBYE 
 /127.0.0.1:37654 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA,

[jira] [Commented] (ZOOKEEPER-1346) Handle 4lws and monitoring on separate port

2012-12-03 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509551#comment-13509551
 ] 

Ted Dunning commented on ZOOKEEPER-1346:


Regarding resty-ness, I think that this interface is a real candidate for not 
using the GET/PUT/POST/DELETE verb at all but simply post-pending a verb to the 
URL of appropriate report.  For that matter, the available verbs could be added 
to each report for convenience if there is an HTML view.

This allows mumble/connection/close which is much more clearly 
self-documenting than a guess about what different operations might do to the 
connection in question.

 Handle 4lws and monitoring on separate port
 ---

 Key: ZOOKEEPER-1346
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1346
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Camille Fournier
Assignee: Skye Wanderman-Milne
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1346.2.patch, ZOOKEEPER-1346_jetty.patch, 
 ZOOKEEPER-1346.patch


 Move the 4lws to their own port, off of the client port, and support them 
 properly via long-lived sessions instead of polling. Deprecate the 4lw 
 support on the client port. Will enable us to enhance the functionality of 
 the commands via extended command syntax, address security concerns and fix 
 bugs involving the socket close being received before all of the data on the 
 client end.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1366) Zookeeper should be tolerant of clock adjustments

2012-11-19 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500520#comment-13500520
]

Ted Dunning commented on ZOOKEEPER-1366:

This patch never got committed. It should have been committed long ago.

I am happy to resurrect it. Which branches should I generate patches for?

Zookeeper should be tolerant of clock adjustments
-

Key: ZOOKEEPER-1366
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1366
Project: ZooKeeper
Issue Type: Bug
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.5.0

Attachments: ZOOKEEPER-1366-3.3.3.patch, ZOOKEEPER-1366.patch,
ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch, ZOOKEEPER-1366.patch

[jira] [Commented] (ZOOKEEPER-1346) Handle 4lws and monitoring on separate port

2012-11-16 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499264#comment-13499264
]

Ted Dunning commented on ZOOKEEPER-1346:

Skye,

Nice work. Is there documentation about what URL structure you used?

Is there a way to make it a bit more REST-y?

Are there commands that are natural to decompose a bit using a hierarchical URL
style? For instance, if have something that lists live quorum members, it
would be nice be able to append the quorum member id to the URL to get more
details on that member alone. Likewise for live sessions.

Further afield, are there metrics that should be reported/set using this
mechanism? I know that we report a fair number of metrics using JMX, but would
it be good to have a different mechanism?

Handle 4lws and monitoring on separate port
---

Key: ZOOKEEPER-1346
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1346
Project: ZooKeeper
Issue Type: Improvement
Components: server
Reporter: Camille Fournier
Assignee: Skye Wanderman-Milne
Fix For: 3.5.0

Attachments: ZOOKEEPER-1346_jetty.patch, ZOOKEEPER-1346.patch

Move the 4lws to their own port, off of the client port, and support them
properly via long-lived sessions instead of polling. Deprecate the 4lw
support on the client port. Will enable us to enhance the functionality of
the commands via extended command syntax, address security concerns and fix
bugs involving the socket close being received before all of the data on the
client end.

[jira] [Commented] (ZOOKEEPER-1502) Prevent multiple zookeeper servers from using the same data directory

2012-07-31 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426248#comment-13426248
]

Ted Dunning commented on ZOOKEEPER-1502:

Another, more robust option, is to put the PID of the ZK process into the lock
file. If that process doesn't exist or isn't a ZK process, then the lock is an
orphan and can be removed. Touching the file every minute or so also makes
identification of an orphan very easy.

Other systems that use a similar approach include mySQL, Solr and mongodb.

Prevent multiple zookeeper servers from using the same data directory
-

Key: ZOOKEEPER-1502
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1502
Project: ZooKeeper
Issue Type: Improvement
Components: server
Affects Versions: 3.4.3
Reporter: Will Johnson

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

2011-09-19 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108010#comment-13108010
]

Ted Dunning commented on ZOOKEEPER-1174:

OK.

Can you say specifically which branches you mean?

FD leak when network unreachable

Key: ZOOKEEPER-1174
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
Project: ZooKeeper
Issue Type: Bug
Components: java client
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Priority: Critical
Fix For: 3.3.4, 3.4.0, 3.5.0

Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch,
ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch,
zk-fd-leak.tgz

In the socket connection logic there are several errors that result in bad
behavior. The basic problem is that a socket is registered with a selector
unconditionally when there are nuances that should be dealt with. First, the
socket may connect immediately. Secondly, the connect may throw an
exception. In either of these two cases, I don't think that the socket
should be registered.
I will attach a test case that demonstrates the problem. I have been unable
to create a unit test that exhibits the problem because I would have to mock
the low level socket libraries to do so. It would still be good to do so if
somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

2011-09-09 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101642#comment-13101642
]

Ted Dunning commented on ZOOKEEPER-1174:

Mocking the test sounds great. Using this bug to bring in a mocking technology
that can mock
static methods is a little more ambitious than I wanted it to be.

I see that jmockit and powermock both claim the ability to do this. Powermock
requires another
mocking technology underneath. Jmockit has the problem that it isn't available
in an official
maven repo.

My tendency is to suggest that we commit this without the unit test and open
another JIRA to address
the testing problem in general.

If I can get sign-off on that, then I will produce a final patch to verify.
The code right now stands
like this:
{code}
try {
boolean immediateConnect = sock.connect(addr);
sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
if (immediateConnect) {
sendThread.primeConnection();
}
} catch (IOException e) {
sock.close();
}
initialized = false;
{code}

FD leak when network unreachable

Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch,
ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (ZOOKEEPER-1174) FD leak when network unreachable

2011-09-08 Thread Ted Dunning (JIRA)

FD leak when network unreachable


 Key: ZOOKEEPER-1174
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Priority: Critical
 Fix For: 3.3.4


In the socket connection logic there are several errors that result in bad 
behavior.  The basic problem is that a socket is registered with a selector 
unconditionally when there are nuances that should be dealt with.  First, the 
socket may connect immediately.  Secondly, the connect may throw an exception.  
In either of these two cases, I don't think that the socket should be 
registered.

I will attach a test case that demonstrates the problem.  I have been unable to 
create a unit test that exhibits the problem because I would have to mock the 
low level socket libraries to do so.  It would still be good to do so if 
somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1174) FD leak when network unreachable

2011-09-08 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated ZOOKEEPER-1174:
---

Attachment: zk-fd-leak.tgz

Here is a program that demonstrates the problem.  It includes a README and 
sample output with and without the fix.

 FD leak when network unreachable
 

 Key: ZOOKEEPER-1174
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Priority: Critical
 Fix For: 3.3.4

 Attachments: zk-fd-leak.tgz


 In the socket connection logic there are several errors that result in bad 
 behavior.  The basic problem is that a socket is registered with a selector 
 unconditionally when there are nuances that should be dealt with.  First, the 
 socket may connect immediately.  Secondly, the connect may throw an 
 exception.  In either of these two cases, I don't think that the socket 
 should be registered.
 I will attach a test case that demonstrates the problem.  I have been unable 
 to create a unit test that exhibits the problem because I would have to mock 
 the low level socket libraries to do so.  It would still be good to do so if 
 somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1174) FD leak when network unreachable

2011-09-08 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-1174:
---

Attachment: ZOOKEEPER-1174.patch

Here is a proposed patch. There are a few considerations here that merit
review.

First, is it safe to register sockets with a selector after the connect call?
I assert yes because select is level based rather than transition based.

Secondly, is it safe to not register sockets that connect immediately? I
think, but am not sure, that the answer is yes because we have clearly already
called primeConnection().

Thirdly, is it OK to not rethrow the io exception from the connect call? I am
not sure here. The immediate effect is that connection is only attempted at
the timeout rate rather than the faster rate specified by some of the delays in
the code. This seems OK at first glance, but other opinions would be nice to
have.

FD leak when network unreachable

Attachments: ZOOKEEPER-1174.patch, zk-fd-leak.tgz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

2011-09-08 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100635#comment-13100635
]

Ted Dunning commented on ZOOKEEPER-1174:

Camille,

Thanks for looking at this. I am not sure if it my assertion is true either,
but it does seem correct to me. (happily, I expressed some doubt)

The documentation for sock.connect is exactly what I base my (current) position
on. The idea is that if connect returns true, then you don't need to use
select to wait for the connection and can proceed immediately with the
primeConnection and light up the connection for prime time. It is only if
connect returns false that deferred actions are required.

FD leak when network unreachable

Attachments: ZOOKEEPER-1174.patch, zk-fd-leak.tgz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable

2011-09-08 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100714#comment-13100714
 ] 

Ted Dunning commented on ZOOKEEPER-1174:


Correct.  Is sockKey needed if we don't register with the selector?

 FD leak when network unreachable
 

 Key: ZOOKEEPER-1174
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Priority: Critical
 Fix For: 3.3.4

 Attachments: ZOOKEEPER-1174.patch, zk-fd-leak.tgz


 In the socket connection logic there are several errors that result in bad 
 behavior.  The basic problem is that a socket is registered with a selector 
 unconditionally when there are nuances that should be dealt with.  First, the 
 socket may connect immediately.  Secondly, the connect may throw an 
 exception.  In either of these two cases, I don't think that the socket 
 should be registered.
 I will attach a test case that demonstrates the problem.  I have been unable 
 to create a unit test that exhibits the problem because I would have to mock 
 the low level socket libraries to do so.  It would still be good to do so if 
 somebody can figure out a good way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1174) FD leak when network unreachable

2011-09-08 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-1174:
---

Attachment: ZOOKEEPER-1174.patch

Here is an updated patch that maintains the sockKey even for immediate loads.
My guess is that this didn't matter in testing so far because it is rare for an
async socket to connect instantly.

This addresses Camille's eagle-eyed comments.

I have added a few javadoc fixes and one weakening of a catch from Exception to
Throwable in the general spirit of making things better when I see them. They
are unrelated to this JIRA, but are very minor so do not warrant their own bug
report.

FD leak when network unreachable

Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch,
zk-fd-leak.tgz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1174) FD leak when network unreachable

2011-09-08 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-1174:
---

Attachment: ZOOKEEPER-1174.patch

Here is a cheesy test. The idea is that I injected an explicit throw of the
same exception that a downed internet connection causes.

Is this just t cheesy to stomach?

FD leak when network unreachable

Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch,
ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1124) Multiop submitted to non-leader always fails due to timeout

2011-07-13 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064684#comment-13064684
]

Ted Dunning commented on ZOOKEEPER-1124:

All,

Is there a way to restructure the code so that naive implementors don't run
into this situation? Essentially, the code as it stands is default-fail and it
would be nice to make it be default-succeed in the presence of new ops.

Or is the addition of new ops rare enough that this doesn't matter?

Multiop submitted to non-leader always fails due to timeout
---

Key: ZOOKEEPER-1124
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1124
Project: ZooKeeper
Issue Type: Bug
Components: server
Affects Versions: 3.4.0
Environment: all
Reporter: Marshall McMullen
Priority: Critical
Fix For: 3.4.0

Attachments: multi-non-observer.patch

The new Multiop support added under zookeeper-965 fails every single time if
the multiop is submitted to a non-leader in quorum mode. In standalone mode
it always works properly and this bug only presents itself in quorum mode
(with 2 or more nodes). After 12 hours of debugging (*sigh*) it turns out to
be a really simple fix. There are a couple of missing case statements inside
FollowerRequestProcessor.java and ObserverRequestProcessor.java to ensure
that multiop is forwarded to the leader for commit. I've attached a patch
that fixes this problem.
It's probably worth nothing that zookeeper-965 has already been committed to
trunk. But this is a fatal flaw that will prevent multiop support from
working properly and as such needs to get committed to 3.4.0 as well. Is
there a way to tie these two cases together in some way?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1124) Multiop submitted to non-leader always fails due to timeout

2011-07-13 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064683#comment-13064683
]

Ted Dunning commented on ZOOKEEPER-1124:

Marshall,

This fix is clearly important. Do you have any tests?

The role of these tests is not just to verify this bug, but also to provide a
prototype for any later implementors of new operations.

Multiop submitted to non-leader always fails due to timeout
---

Attachments: multi-non-observer.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-06-30 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

Final patch for committing

 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1064) Startup script needs more LSB compatability

2011-06-22 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053563#comment-13053563
 ] 

Ted Dunning commented on ZOOKEEPER-1064:


Awesome!



 Startup script needs more LSB compatability
 ---

 Key: ZOOKEEPER-1064
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1064
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.2
Reporter: Ted Dunning
 Fix For: 3.2.3, 3.3.3, 3.3.4


 The zkServer.sh script kind of sort of implements the standard init.d style 
 of interaction.
 It lacks
 - nice return codes
 - status method
 - standard output messages
 See 
 http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
 and
 http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptfunc.html
 and
 http://wiki.debian.org/LSBInitScripts
 It is an open question how much zkServer should use these LSB scripts because 
 that may impair portability.  I
 think it should produce similar messages, however, and should return 
 standardized error codes.  If lsb functions
 are available, I think that they should be used so that ZK works as a first 
 class citizen.
 I will produce a proposed patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-06-21 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052703#comment-13052703
 ] 

Ted Dunning commented on ZOOKEEPER-965:
---

Pushed to github.




 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (ZOOKEEPER-1102) Need update for programmer manual to cover multi operation

2011-06-21 Thread Ted Dunning (JIRA)

Need update for programmer manual to cover multi operation
--

 Key: ZOOKEEPER-1102
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1102
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Dunning


The new multi operation is undocumented as yet.  Clearly it needs some doc to 
cover:

1) the basic syntax

2) java code sample

3) C code sample


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-06-21 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052715#comment-13052715
 ] 

Ted Dunning commented on ZOOKEEPER-965:
---

Done.  See ZOOKEEPER-1102

On Tue, Jun 21, 2011 at 7:49 PM, Marshall McMullen (JIRA)



 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-06-21 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052730#comment-13052730
 ] 

Ted Dunning commented on ZOOKEEPER-965:
---

My tests completed successfully.

On Tue, Jun 21, 2011 at 7:57 PM, Marshall McMullen (JIRA)



 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-06-15 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049669#comment-13049669
 ] 

Ted Dunning commented on ZOOKEEPER-965:
---

I can take a quick look, but I am having trouble getting reliable net access.

On Wed, Jun 15, 2011 at 6:29 AM, Marshall McMullen (JIRA)


 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-06-15 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049670#comment-13049670
]

Ted Dunning commented on ZOOKEEPER-965:
---

OK. As a first step, I rebased our changes to current trunk.

This will require the usual recheckout due to non-fast-forward operations.

Now to the problems you are seeing.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch

The basic idea is to have a single method called multi that will accept a
list of create, delete, update or check objects each of which has a desired
version or file state in the case of create. If all of the version and
existence constraints can be satisfied, then all updates will be done
atomically.
Two API styles have been suggested. One has a list as above and the other
style has a Transaction that allows builder-like methods to build a set of
updates and a commit method to finalize the transaction. This can trivially
be reduced to the first kind of API so the list based API style should be
considered the primitive and the builder style should be implemented as
syntactic sugar.
The total size of all the data in all updates and creates in a single
transaction should be limited to 1MB.
Implementation-wise this capability can be done using standard ZK internals.
The changes include:
- update to ZK clients to all the new call
- additional wire level request
- on the server, in the code that converts transactions to idempotent form,
the code should be slightly extended to convert a list of operations to
idempotent form.
- on the client, a down-rev server that rejects the multi-update should be
detected gracefully and an informative exception should be thrown.
To facilitate shared development, I have established a github repository at
https://github.com/tdunning/zookeeper and am happy to extend committer
status to anyone who agrees to donate their code back to Apache. The final
patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-06-15 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049673#comment-13049673
]

Ted Dunning commented on ZOOKEEPER-965:
---

I see a clean compile on my mac. Looks like I don't understand the problem.
I can't run all the tests just now, but last time I looked they ran.

BuzzBook-Pro:zookeeper[trunk*]$ git checkout multi
Switched to branch 'multi'
BuzzBook-Pro:zookeeper[multi*]$ ant clean
Buildfile: /Users/tdunning/Apache/zookeeper/build.xml
...
clean:
BUILD SUCCESSFUL
Total time: 0 seconds
BuzzBook-Pro:zookeeper[multi*]$ ant compile
...
version-info:
[java] Unknown REVISION number, using -1
...
[javac] Compiling 52 source files to
/Users/tdunning/Apache/zookeeper/build/classes
...
[javac] Compiling 134 source files to
/Users/tdunning/Apache/zookeeper/build/classes
BUILD SUCCESSFUL
Total time: 11 seconds
BuzzBook-Pro:zookeeper[multi*]$

On Wed, Jun 15, 2011 at 10:01 AM, Ted Dunning (JIRA) j...@apache.org
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049670#comment-13049670]
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch
a list of create, delete, update or check objects each of which has a
desired version or file state in the case of create. If all of the version
and existence constraints can be satisfied, then all updates will be done
atomically.
other style has a Transaction that allows builder-like methods to build a
set of updates and a commit method to finalize the transaction. This can
trivially be reduced to the first kind of API so the list based API style
should be considered the primitive and the builder style should be
implemented as syntactic sugar.
transaction should be limited to 1MB.
internals. The changes include:
form, the code should be slightly extended to convert a list of operations
to idempotent form.
be detected gracefully and an informative exception should be thrown.
at https://github.com/tdunning/zookeeper and am happy to extend committer
status to anyone who agrees to donate their code back to Apache. The final
patch will be attached to this bug as normal.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-06-15 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049953#comment-13049953
]

Ted Dunning commented on ZOOKEEPER-965:
---

Ahhh... no.

I didn't notice that. I will take a look.

On Wed, Jun 15, 2011 at 4:45 PM, Marshall McMullen (JIRA)

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-06-15 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049955#comment-13049955
]

Ted Dunning commented on ZOOKEEPER-965:
---

Marshall,

I just tried with and without your patch. It compiles either way.

My feeling is that excessive throws declarations are bad juju anyway so the
current state (with your change)
is better than the previous state (with the extra throws in processTxn).

I would leave it as is.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-06-07 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045799#comment-13045799
]

Ted Dunning commented on ZOOKEEPER-965:
---

I just integrated changes to trunk (ZOOKEEPER-1069) and rebased the github.
This also picked up some small changes I made to the API to get results from
the exception.

Marshall, can you say if you are happy about the API change in 5ce6043f4 ?

I have pushed the rebase to github so everybody will have to pull from scratch
to make sure that they get the right history.

I will also attach the patch for this right now.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-06-06 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044991#comment-13044991
]

Ted Dunning commented on ZOOKEEPER-965:
---

The jute-ly issue can only be corrected by cut-and-paste coding on the jute
definitions. Putting the message into jute will involve replicating a union
of the fields in the sub-fields. With ZK's history of wire compatibility,
that probably isn't a big deal.

On Mon, Jun 6, 2011 at 10:56 AM, Marshall McMullen (JIRA)

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-31 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041733#comment-13041733
]

Ted Dunning commented on ZOOKEEPER-965:
---

{quote}
I don't see any corresponding changes in FileTxnSnapLog
{quote}

Of course, you are correct here. The individual operations in a multi work
exactly like the
individual operations. Unless we somehow inherit the right behavior, your
change will need
to be replicated.

I won't get to this for a day or two. Would it be possible for you to drop
this change in on
the github version?

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-30 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

Updated to track trunk

 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-30 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

Rebased patch to current trunk. Note that there were several conflicts in the
merge. The most important was in

src/java/main/org/apache/zookeeper/server/DataTree.java around line 891.

The conflict was versus changes introduced by ZOOKEEPER-1046

Camille, if you can, could you check that my merge preserved your fix?

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-30 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

Cut and paste error in resolving conflict.

 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-29 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

Updated patch with Marshall's latest.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-24 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038656#comment-13038656
]

Ted Dunning commented on ZOOKEEPER-965:
---

Marshall,

After logging in as Patrick says, you also need to click on the number on the
far left. That will
pop up an overlay window that will let you type in your response. Or riposte
as the case may be.

If the replies are general rather than line by line, you can also just put them
here. That can
even be better.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-18 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035792#comment-13035792
]

Ted Dunning commented on ZOOKEEPER-965:
---

Patch will be updated shortly. I will also update reviewboard if I hear
from somebody that this will not damage ongoing reviews.

Also, the private branch mentioned below is available to anyone at
https://github.com/tdunning/zookeeper

On Wed, May 18, 2011 at 12:56 PM, Marshall McMullen (JIRA)

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (ZOOKEEPER-1064) Startup script needs more LSB compatability

2011-05-18 Thread Ted Dunning (JIRA)

Startup script needs more LSB compatability
---

 Key: ZOOKEEPER-1064
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1064
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.2
Reporter: Ted Dunning
 Fix For: 3.2.3, 3.3.3, 3.3.4


The zkServer.sh script kind of sort of implements the standard init.d style of 
interaction.

It lacks

- nice return codes

- status method

- standard output messages

See 

http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

and

http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptfunc.html

and

http://wiki.debian.org/LSBInitScripts

It is an open question how much zkServer should use these LSB scripts because 
that may impair portability.  I
think it should produce similar messages, however, and should return 
standardized error codes.  If lsb functions
are available, I think that they should be used so that ZK works as a first 
class citizen.


I will produce a proposed patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-18 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

Updated to trunk. Includes Marshall's latest naming tweaks.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-18 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035919#comment-13035919
 ] 

Ted Dunning commented on ZOOKEEPER-965:
---

Updated review board

 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-18 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

Added some javadocs per request.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-15 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033654#comment-13033654
]

Ted Dunning commented on ZOOKEEPER-965:
---

Great.

I will use this as an excuse to jump on the problem with the patch again.

On Sat, May 14, 2011 at 8:09 PM, Marshall McMullen (JIRA)

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-15 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

Name changes in the C code only.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-13 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033163#comment-13033163
]

Ted Dunning commented on ZOOKEEPER-965:
---

I will take a look. It is likely the prefix issue.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1061) Zookeeper stop fails if start called twice

2011-05-13 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033178#comment-13033178
 ] 

Ted Dunning commented on ZOOKEEPER-1061:


Any hope for a commit on this?

 Zookeeper stop fails if start called twice
 --

 Key: ZOOKEEPER-1061
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1061
 Project: ZooKeeper
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.3.2
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.3.3, 3.4.0

 Attachments: ZOOKEEPER-1061.patch


 The zkServer.sh script doesn't check properly to see if a previously started
 server is still running.  If you call start twice, the second invocation
 will over-write the PID file with a process that then fails due to port
 occupancy.
 This means that stop will subsequently fail.
 Here is a reference that describes how init scripts should normally work:
 http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-12 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032647#comment-13032647
 ] 

Ted Dunning commented on ZOOKEEPER-965:
---

Stephen,

Nice work on the code review.  Marshall will have to comment on your points, 
but it is clear that you
read the code well and deserve a stroke for that!

 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (ZOOKEEPER-1061) Zookeeper stop fails if start called twice

2011-05-10 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning reassigned ZOOKEEPER-1061:
--

Assignee: Ted Dunning

 Zookeeper stop fails if start called twice
 --

 Key: ZOOKEEPER-1061
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1061
 Project: ZooKeeper
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.3.2
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.3.3


 The zkServer.sh script doesn't check properly to see if a previously started
 server is still running.  If you call start twice, the second invocation
 will over-write the PID file with a process that then fails due to port
 occupancy.
 This means that stop will subsequently fail.
 Here is a reference that describes how init scripts should normally work:
 http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1061) Zookeeper stop fails if start called twice

2011-05-10 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated ZOOKEEPER-1061:
---

Attachment: ZOOKEEPER-1061.patch

Here is a patch that handles the double start and fixes up some exit values.

 Zookeeper stop fails if start called twice
 --

 Key: ZOOKEEPER-1061
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1061
 Project: ZooKeeper
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.3.2
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.3.3

 Attachments: ZOOKEEPER-1061.patch


 The zkServer.sh script doesn't check properly to see if a previously started
 server is still running.  If you call start twice, the second invocation
 will over-write the PID file with a process that then fails due to port
 occupancy.
 This means that stop will subsequently fail.
 Here is a reference that describes how init scripts should normally work:
 http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1061) Zookeeper stop fails if start called twice

2011-05-10 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031401#comment-13031401
]

Ted Dunning commented on ZOOKEEPER-1061:

No unit tests are reasonably for these script-only changes. Here is a manual
test. Without the fix, we see this mal-behavior:

{code}
tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
Starting zookeeper ...
STARTED
tdunning@ted-desk:~/Apache/zookeeper$ sudo jps
17610 QuorumPeerMain
17646 Jps
tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
Starting zookeeper ...
STARTED
tdunning@ted-desk:~/Apache/zookeeper$ sudo jps
17685 Jps
17610 QuorumPeerMain
tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh stop
JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
Stopping zookeeper ...
kill: 160: No such process

STOPPED
tdunning@ted-desk:~/Apache/zookeeper$ sudo jps
17730 Jps
17610 QuorumPeerMain
{code}

With the fix, I get this.
{code}
tdunning@ted-desk:~/Apache/zookeeper$ patch ZOOKEEPER-1061.patch
patching file zkServer.sh

# first start works
tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
Starting zookeeper ... STARTED

# second start fails with good message
tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
Starting zookeeper ... already running as process 17928.

# and this is persistent behavior
tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
Starting zookeeper ... already running as process 17928.

# stop now works
tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh stop
JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
Stopping zookeeper ... STOPPED

# repeated stop works correctly
tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh stop
JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
Stopping zookeeper ... error: could not find file
/var/zookeeper/zookeeper_server.pid

# and start works again
tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
Starting zookeeper ... STARTED

# but can't be repeated
tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
Starting zookeeper ... already running as process 18155.

# running without proper permissions gives a different error
tdunning@ted-desk:~/Apache/zookeeper$ bin/zkServer.sh start
JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
Starting zookeeper ... bin/zkServer.sh: 169: cannot create
/var/zookeeper/zookeeper_server.pid: Permission denied
FAILED TO WRITE PID
tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh stop
JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
Stopping zookeeper ... STOPPED
{code}

Zookeeper stop fails if start called twice
--

Key: ZOOKEEPER-1061
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1061
Project: ZooKeeper
Issue Type: Bug
Components: scripts
Affects Versions: 3.3.2
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.3.3, 3.4.0

Attachments: ZOOKEEPER-1061.patch

The zkServer.sh script doesn't check properly to see if a previously started
server is still running. If you call start twice, the second invocation
will over-write the PID file with a process that then fails due to port
occupancy.
This means that stop will subsequently fail.
Here is a reference that describes how init scripts should normally work:
http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-09 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

This is a WIP patch. Marshall says that he has a setData issue still that is
giving a bit of pause.

This is still a very good reference for review.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-09 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

Here is Marshall's code in the form of an updated patch against trunk

 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-04 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028622#comment-13028622
 ] 

Ted Dunning commented on ZOOKEEPER-965:
---

Marshall,

Do these C unit tests have a corresponding set of Java unit tests?

 Need a multi-update command to allow multiple znodes to be updated safely
 -

 Key: ZOOKEEPER-965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, 
 ZOOKEEPER-965.patch, ZOOKEEPER-965.patch


 The basic idea is to have a single method called multi that will accept a 
 list of create, delete, update or check objects each of which has a desired 
 version or file state in the case of create.  If all of the version and 
 existence constraints can be satisfied, then all updates will be done 
 atomically.
 Two API styles have been suggested.  One has a list as above and the other 
 style has a Transaction that allows builder-like methods to build a set of 
 updates and a commit method to finalize the transaction.  This can trivially 
 be reduced to the first kind of API so the list based API style should be 
 considered the primitive and the builder style should be implemented as 
 syntactic sugar.
 The total size of all the data in all updates and creates in a single 
 transaction should be limited to 1MB.
 Implementation-wise this capability can be done using standard ZK internals.  
 The changes include:
 - update to ZK clients to all the new call
 - additional wire level request
 - on the server, in the code that converts transactions to idempotent form, 
 the code should be slightly extended to convert a list of operations to 
 idempotent form.
 - on the client, a down-rev server that rejects the multi-update should be 
 detected gracefully and an informative exception should be thrown.
 To facilitate shared development, I have established a github repository at 
 https://github.com/tdunning/zookeeper  and am happy to extend committer 
 status to anyone who agrees to donate their code back to Apache.  The final 
 patch will be attached to this bug as normal.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-04 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

Latests repairs from Marshall.

This should be ready for review, but not commit.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-04 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028802#comment-13028802
]

Ted Dunning commented on ZOOKEEPER-965:
---

Camille is correct.

I don't think that we can get rid of these warnings because passing the buffer
around here is of the essence. The warning
is reasonable except for the fact that the data structure is really just a
wrapper around the byte array.

Mahadev, Ben,

Is it OK to ultimately commit code that has these two warnings?

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-04 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028847#comment-13028847
]

Ted Dunning commented on ZOOKEEPER-965:
---

Pushed fix to github.

Note that I also reworded the commit messages to include the JIRA.

This will require that you force the pull.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-04 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028850#comment-13028850
]

Ted Dunning commented on ZOOKEEPER-965:
---

I made the tweaks. I will post another patch and see what jenkins has to say.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-04 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

This should fix the niggling warnings

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-04 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028875#comment-13028875
]

Ted Dunning commented on ZOOKEEPER-965:
---

This latest Hudson report appears to be a Jenkins problem rather than a real
test failure.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch, ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-03 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028051#comment-13028051
]

Ted Dunning commented on ZOOKEEPER-965:
---

Mahadev,

There is a patch on this JIRA.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

Attachments: ZOOKEEPER-965.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-03 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

Updated to trunk and Marshall's changes

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-03 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

Hopefully this makes jenkins happier.

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.3
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

Attachments: ZOOKEEPER-965.patch, ZOOKEEPER-965.patch,
ZOOKEEPER-965.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-965) Need a multi-update command to allow multiple znodes to be updated safely

2011-05-02 Thread Ted Dunning (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Dunning updated ZOOKEEPER-965:
--

Attachment: ZOOKEEPER-965.patch

Patch against current trunk. Commit messages are reworded versus github version

Need a multi-update command to allow multiple znodes to be updated safely
-

Key: ZOOKEEPER-965
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-965
Project: ZooKeeper
Issue Type: Bug
Reporter: Ted Dunning
Assignee: Ted Dunning
Fix For: 3.4.0

Attachments: ZOOKEEPER-965.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (ZOOKEEPER-966) Client side for multi

2011-05-02 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning resolved ZOOKEEPER-966.
---

Resolution: Fixed

Consolidated patch for all sub-tasks is found on ZOOKEEPER-965

 Client side for multi
 -

 Key: ZOOKEEPER-966
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-966
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Ted Dunning
 Fix For: 3.4.0


 This is jus the client side of the code up to and including the serialization 
 of requests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (ZOOKEEPER-967) Server side decoding and function dispatch

2011-05-02 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning resolved ZOOKEEPER-967.
---

Resolution: Fixed

Consolidated patch attached to ZOOKEEPER-965

 Server side decoding and function dispatch
 --

 Key: ZOOKEEPER-967
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-967
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Ted Dunning
 Fix For: 3.4.0


 This would include making the server catch the request and hand it down to 
 the actual transaction code

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 >

1 - 100 of 114 matches

Mail list logo