Any monitoring of mem, gc, disk, etc... that might give some
additional insight? Perhaps the disks were loaded and that was slowing
things? Or swapping/gc of the jvm? You might be able to tune to
resolve some of that.

One thing you can try is copying the snapshot file to a an empty
datadir on a separate machine and try starting a 2 node cluster.
(where the second node starts with an empty datadir)

Patrick

On Tue, Jul 31, 2012 at 3:34 PM, Jordan Zimmerman
<[email protected]> wrote:
>> Seems you are down to 4gb now. That still seems way too high for
>> "coordination" operations… ?
>
> A big problem currently is detritus nodes. People use lock recipes for 
> various movie IDs and they leave garbage parent nodes around in the 
> thousands. I've written some gc tasks to clean them up but it's been a slow 
> process to get everyone to use it. I know there is a Jira to help with this 
> but I don't know the status.
>
> -JZ
>
> On Jul 31, 2012, at 3:17 PM, Patrick Hunt <[email protected]> wrote:
>
>> On Tue, Jul 31, 2012 at 3:14 PM, Jordan Zimmerman
>> <[email protected]> wrote:
>>> There were a lot creations but I removed those nodes last night. How long 
>>> does it take to clear out of the snapshot?
>>
>> The snapshot is a copy of whatever is in the znode tree at the time
>> the snapshot is taken. (so instantaneous the next time a snapshot is
>> taken). You can see the dates and the epoch number if that gives you
>> any insight (epoch is the upper 32 bits of the filename)
>>
>> Seems you are down to 4gb now. That still seems way too high for
>> "coordination" operations... ?
>>
>> Patrick
>>
>>>
>>> On Jul 31, 2012, at 2:52 PM, Patrick Hunt <[email protected]> wrote:
>>>
>>>> You have an 11gig snapshot file. That's very large. Did someone
>>>> unexpectedly overload the server with znode creations?
>>>>
>>>> When a follower comes up the leader needs to serialize the znodes to
>>>> the snapshot file, stream it to the follower, who saves it locally
>>>> then deserializes it. (11g/15min is avg about 12meg/second for this
>>>> process)
>>>>
>>>> Often times this is exacerbated by the max heap and GC interactions.
>>>>
>>>> Patrick
>>>>
>>>> On Tue, Jul 31, 2012 at 2:23 PM, Jordan Zimmerman
>>>> <[email protected]> wrote:
>>>>> BTW - this is 3.3.5
>>>>>
>>>>> On Jul 31, 2012, at 2:22 PM, Jordan Zimmerman 
>>>>> <[email protected]> wrote:
>>>>>
>>>>>> We've had a few outages of our ZK cluster recently. When trying to bring 
>>>>>> the cluster back up it's been taking 10-15 minutes for the followers to 
>>>>>> sync with the Leader. Any idea what might cause this? Here's an ls of 
>>>>>> the data dir:
>>>>>>
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 20:39 
>>>>>> log.3900a4bc75
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 20:40 
>>>>>> log.3900a634ee
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 21:21 
>>>>>> log.3a00000001
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 21:22 
>>>>>> log.3a000139a2
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac  9279729723 Jul 31 20:42 
>>>>>> snapshot.3900a634ec
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 11126306780 Jul 31 21:09 
>>>>>> snapshot.3900a6b149
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac  4153727423 Jul 31 21:22 
>>>>>> snapshot.3a000139a0
>>>>>>
>>>>>
>>>
>

Reply via email to