> On 17 Nov 2015, at 21:34, Raúl Gutiérrez Segalés <[email protected]> wrote:
> 
> On 17 November 2015 at 12:13, Akmal Abbasov <[email protected]>
> wrote:
> 
>> Hi Raul,
>> Thank you for your response.
>> I am running zookeeper with -Xms512m -Xmx1g options, is this enough.
>> 
> 
> It depends on your workload.. how many writes/read per sec are you
> expecting/seeing? Are you seeing long
> GC pauses? If so, you'll need more mem or bigger tick times, otherwise
> you'll miss the deadlines for the
> pings (both among learners and to clients…)
> 
Where I can find this information, in fact information regarding read/writes. 
This is the output of the stat command
Server 1
Latency min/avg/max: 0/66/5212
Received: 8722
Sent: 8694
Connections: 19
Outstanding: 0
Zxid: 0xa9600002ef2
Mode: follower
Node count: 479

Server 2 
Latency min/avg/max: 0/70/5252
Received: 8228
Sent: 8203
Connections: 16
Outstanding: 0
Zxid: 0xa9600002e12
Mode: leader
Node count: 479

Server 3
Latency min/avg/max: 0/0/1
Received: 140
Sent: 139
Connections: 2
Outstanding: 0
Zxid: 0xa9600002bf8
Mode: follower
Node count: 479

All the servers have the same configs. 
Is -Xms512m -Xmx1g enough to handle my workload.
Moreover I see that the load is not evenly distributed. Is it something that 
should be tuned manually,
or there is something like hbase/hdfs balancer, which will take care of this?

> 
>> Regarding the network, all of the server zk server nodes are hosted in the
>> cloud, in the same dc.
>> But according to the zk troubleshooting guide, the timeout should be
>> increased for cloud environments.
>> 
> 
> Yup, latency can be unpredictable in the cloud…
> 
> 
>> One more thing is that, I’m seeing a lot of
>> fsync-ing the write ahead log in SyncThread:1 took 2962ms which will
>> adversely effect operation latency. See the ZooKeeper troubleshooting guide
>> messages in the logs.
>> 
> 
> That definitely looks bad and will block everything else. What type of disc
> are you writing your logs and snapshots to? Are they
> separate volumes?
I’m using separate disk for both logs and data. But they’re hdd, not ssd. 
So my assumption 

I’ve tried to understand what actually is happening, here is the summary of the 
logs
08:22:08,201    Transaction timeout
08:22:08,596 - 08:22:25,441     ZookeeperServer not running
08:22:24,927    New election
Everything is starting from ’Transaction timeout’ in leader, which caused 
‘Exception when following the leader’ in learners.
Then all zookeeper processes are shutting down. New election is happening and 
zookeeper processes are starting. 

And one more thing, what’s the best way to update the configs without downtime.
Thank you.

Regards, Akmal

        

Reply via email to