Camille,

 well done! I think thats probably the bug. You are right. There is a race 
condition on the outstanding changes and the ephemeral nodes.

Great catch!

thanks
mahadev

On Sep 27, 2011, at 5:42 PM, Camille Fournier wrote:

> This:
> 
> HashSet<String> es = zks.getZKDatabase()
> .getEphemerals(request.sessionId);
> 
> Is outside the synchronized block that deletes the ephemerals:
> 
>                 synchronized (zks.outstandingChanges) {
>     for (ChangeRecord c : zks.outstandingChanges) {
>     if (c.stat == null) {                            // Doing a delete
>                           es.remove(c.path);                        }
> else if (c.stat.getEphemeralOwner() == request.sessionId) {
>                 es.add(c.path);                        }
>     }                    for (String path2Delete : es) {
>           addChangeRecord(new ChangeRecord(txnHeader.getZxid(),
>                         path2Delete, null, 0, null));
>   }                }
> On Tue, Sep 27, 2011 at 7:26 PM, Mahadev Konar <[email protected]> 
> wrote:
>> Camille,
>>  I am a little confused on the explanation/text. Probably might want
>> to update the jira (if kishore opens one) with a little more detail.
>> 
>> thanks
>> mahadev
>> 
>> On Tue, Sep 27, 2011 at 2:20 PM, Fournier, Camille F.
>> <[email protected]> wrote:
>>> So, the node was created by 0x13220b93e610550 at 12:17:56, then that 
>>> session closed at 12:17:57, the node did not delete, and a bunch of other 
>>> sessions later tried to create the node. These sessions got nodeexists 
>>> failures I presume?
>>> 
>>> 
>>> Forgive the block of text I'm going to write instead of code:
>>> 
>>> I'm going to bet that the problem lies in PrepRequestProcessor. If we get 
>>> the ephemerals for the session while an ephemeral is still in 
>>> outstandingChanges and has not been committed, then another thread commits 
>>> that ephemeral and removes it from outstanding changes before synchronizing 
>>> in the outstandingChagnes block, we could never put it in the ephemeral set 
>>> that we are using to reflect ephemerals to delete. I think we need to move 
>>> the synchronized block up before we get the ephemerals from the database. 
>>> But this is a bit of speculation at the moment. Can you create a JIRA 
>>> tracker for me to look at this?
>>> 
>>> Thanks,
>>> C
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: kishore g [mailto:[email protected]]
>>> Sent: Monday, September 26, 2011 1:07 AM
>>> To: [email protected]
>>> Subject: Re: ephemeral node not removed after the client session is long 
>>> gone
>>> 
>>> Hi,
>>> 
>>> I got the following information from the logs.
>>> 
>>> The node that still exists is
>>> /kafka-tracking/consumers/UserPerformanceEvent-<host>/owners/UserPerformanceEvent/529-7
>>> 
>>> I saw that the ephemeral owner is 86167322861045079 which is session id
>>> 0x13220b93e610550.
>>> 
>>> After searching in the transaction log of one of the ZK servers found that
>>> session expired
>>> 
>>> 9/22/11 12:17:57 PM PDT session 0x13220b93e610550 cxid 0x74 zxid 0x601bd36f7
>>> closeSession null
>>> 
>>> On digging further into the logs I found that there were multiple sessions
>>> created in quick succession and every session tried to create the same node.
>>> But i verified that the sessions were closed and opened in order
>>> 9/22/11 12:17:56 PM PDT session 0x13220b93e610550 cxid 0x0 zxid 0x601bd36b5
>>> createSession 6000
>>> 9/22/11 12:17:57 PM PDT session 0x13220b93e610550 cxid 0x74 zxid 0x601bd36f7
>>> closeSession null
>>> 9/22/11 12:17:58 PM PDT session 0x13220b93e610551 cxid 0x0 zxid 0x601bd36f8
>>> createSession 6000
>>> 9/22/11 12:17:59 PM PDT session 0x13220b93e610551 cxid 0x74 zxid 0x601bd373a
>>> closeSession null
>>> 9/22/11 12:18:00 PM PDT session 0x13220b93e610552 cxid 0x0 zxid 0x601bd373e
>>> createSession 6000
>>> 9/22/11 12:18:01 PM PDT session 0x13220b93e610552 cxid 0x6c zxid 0x601bd37a0
>>> closeSession null
>>> 9/22/11 12:18:02 PM PDT session 0x13220b93e610553 cxid 0x0 zxid 0x601bd37e9
>>> createSession 6000
>>> 9/22/11 12:18:03 PM PDT session 0x13220b93e610553 cxid 0x74 zxid 0x601bd382b
>>> closeSession null
>>> 9/22/11 12:18:04 PM PDT session 0x13220b93e610554 cxid 0x0 zxid 0x601bd383c
>>> createSession 6000
>>> 9/22/11 12:18:05 PM PDT session 0x13220b93e610554 cxid 0x6a zxid 0x601bd388f
>>> closeSession null
>>> 9/22/11 12:18:06 PM PDT session 0x13220b93e610555 cxid 0x0 zxid 0x601bd3895
>>> createSession 6000
>>> 9/22/11 12:18:07 PM PDT session 0x13220b93e610555 cxid 0x6a zxid 0x601bd38cd
>>> closeSession null
>>> 9/22/11 12:18:10 PM PDT session 0x13220b93e610556 cxid 0x0 zxid 0x601bd38d1
>>> createSession 6000
>>> 9/22/11 12:18:11 PM PDT session 0x13220b93e610557 cxid 0x0 zxid 0x601bd38f2
>>> createSession 6000
>>> 9/22/11 12:18:11 PM PDT session 0x13220b93e610557 cxid 0x51 zxid 0x601bd396a
>>> closeSession null
>>> 
>>> Here is the log output for the sessions that tried creating the same node
>>> 
>>> 9/22/11 12:17:54 PM PDT session 0x13220b93e61054f cxid 0x42 zxid 0x601bd366b
>>> create
>>> '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
>>> 9/22/11 12:17:56 PM PDT session 0x13220b93e610550 cxid 0x42 zxid 0x601bd36ce
>>> create
>>> '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
>>> 9/22/11 12:17:58 PM PDT session 0x13220b93e610551 cxid 0x42 zxid 0x601bd3711
>>> create
>>> '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
>>> 9/22/11 12:18:00 PM PDT session 0x13220b93e610552 cxid 0x42 zxid 0x601bd3777
>>> create
>>> '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
>>> 9/22/11 12:18:02 PM PDT session 0x13220b93e610553 cxid 0x42 zxid 0x601bd3802
>>> create
>>> '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
>>> 9/22/11 12:18:05 PM PDT session 0x13220b93e610554 cxid 0x44 zxid 0x601bd385d
>>> create
>>> '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
>>> 9/22/11 12:18:07 PM PDT session 0x13220b93e610555 cxid 0x44 zxid 0x601bd38b0
>>> create
>>> '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
>>> 9/22/11 12:18:11 PM PDT session 0x13220b93e610557 cxid 0x52 zxid 0x601bd396b
>>> create
>>> '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
>>> 
>>> Let me know if you need additional information.
>>> 
>>> Thanks,
>>> Kishore G
>>> 
>>> On Sun, Sep 25, 2011 at 12:56 PM, Camille Fournier 
>>> <[email protected]>wrote:
>>> 
>>>> Can you give us more details, like the information on the client that
>>>> created it, any logs around the time they created it/disconnected,
>>>> etc?
>>>> 
>>>> On Sun, Sep 25, 2011 at 2:54 PM, Jun Rao <[email protected]> wrote:
>>>>> Hi,
>>>>> 
>>>>> We found our ZK server in a state where an ephemeral node still exists
>>>> after
>>>>> a client session is long gone. I used the cons command on each ZK host to
>>>>> list all connections and couldn't find the ephemeralOwner id. We are
>>>> using
>>>>> ZK 3.3.3. Has anyone seen this problem?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jun
>>>>> 
>>>> 
>>> 
>> 

Reply via email to