I'm thinking of hacking it through the connectresponse session timeout
(similar to the way we detect session rejected). I wrote up a prototype that
worked ok this way. Might could extend this hack to other things, using that
field as an encoded error msg, thoughts?

C
On Aug 4, 2011 6:10 PM, "Patrick Hunt" <[email protected]> wrote:
> Our error reporting server->client has always been weak. It's a PITA
> to debug in production because a lot of times when the client gets
> bounced it's not clear from the client side why (you end up having to
> search the server log - for example when maxClientCount is exceeded).
> It would be great to fix this, esp if the server could provide insight
> to the client about why (an error code/message perhaps). Doing it in a
> b/w compatible way might be tough though...
>
> Patrick
>
> On Thu, Aug 4, 2011 at 2:45 PM, Ted Dunning <[email protected]> wrote:
>> This is used normally to guarantee in-order data views.  If you get
>> disconnected from one host in an advanced state and then connect to an
out
>> of date slave, ZK automatically disconnects you to avoid letting you see
>> time go backwards.  Your situation is different of course.
>>
>>
>>
>> On Thu, Aug 4, 2011 at 7:05 PM, Fournier, Camille F. <
>> [email protected]> wrote:
>>
>>> Right now the server just detects that the zxid is wrong, and calls
close
>>> on the client. The client logs:
>>> 15:01:47,593 - INFO
>>>  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1159] - Unable
to
>>> read additional data from server sessionid 0x131962b00540000, likely
server
>>> has closed socket, closing socket connection and attempting reconnect
>>> (branch 3.3.3)
>>>
>>> I will poke around and see if I can figure out a nicer way to indicate
this
>>> condition. The expired state is perfectly fine for me in my use case.
>>>
>>> C
>>>
>>>
>>> -----Original Message-----
>>> From: Patrick Hunt [mailto:[email protected]]
>>> Sent: Thursday, August 04, 2011 1:51 PM
>>> To: [email protected]
>>> Subject: Re: devops/admin/client question: What do you do when you
>>> rollback?
>>>
>>> On Thu, Aug 4, 2011 at 10:29 AM, Fournier, Camille F.
>>> <[email protected]> wrote:
>>> > We had an issue here the other day where the ZK servers were running
>>> poorly, and in an effort to get them healthy again we ended up rolling
back
>>> the cluster state. While this was, in retrospect, not the right solution
to
>>> the problem we were facing, it brought up another problem. Namely, that
many
>>> of our clients couldn't reconnect with their sessions because their zxid
was
>>> too high (expected), but that the error they got when trying to do that
>>> reconnection was just a vanilla disconnected error. The result was that
most
>>> of our clients had to be bounced.
>>>
>>> Hi Camille, there's a long standing jira on this:
>>> https://issues.apache.org/jira/browse/ZOOKEEPER-523
>>>
>>> > Aside from trying hard to avoid ever rolling back the cluster state,
does
>>> anyone have a way they deal with this situation if it occurs? Should we
>>> consider enhancing the error message to the client so we could track the
>>> fact that we were ahead of the quorum zxid and react sensibly?
Alternately,
>>> since we were sending a sessionId along with the zxid, perhaps it would
be
>>> nice to check to see if the sessionId exists before checking the zxid,
which
>>> would send an expired state signal which my client code could handle
>>> cleanly.
>>>
>>> It seems reasonable that if the client connects to all servers in the
>>> ensemble (that it knows about) and sees that it's ahead of each one,
>>> it should consider the session expired (we could add a new state, but
>>> seems like just treating as expired with a good log message would be
>>> better from b/w compat standpoint).
>>>
>>> I can't recall, does the client have sufficient information to make
>>> this determination, or is the server just disconnecting?
>>>
>>> Patrick
>>>
>>

Reply via email to