unsubscribe

2015-03-07 Thread pinktie
unsubscribe

 Original Message 
From: Jeff Schroeder jeffschroe...@computer.org
Apparently from: user-return-2791-pinktie=safe-mail@mesos.apache.org
To: Mesos Users user@mesos.apache.org
Subject: Question on Monitoring a Mesos Cluster
Date: Sat, 7 Mar 2015 12:02:00 -0600
 

 I wrote a python collectd plugin which pulls both master (only if 
 master/elected == 1) and slave stats from the rest api under 
 /metrics/snapshot and /slave(1)/stats.json respectively and throws those into 
 graphite.
 
 After getting everything working, I built a few dashboards, one of which 
 displays these stats from http://master:5051/metrics/snapshot:
 
 master/disk_percent
 master/cpus_percent
 master/mem_percent 
  
 I had assumed that this was something like aggregate cluster utilization, but 
 this seems incorrect in practice. I have a small cluster with ~1T of memory, 
 ~25T of Disks, and ~540 CPU cores. I had a dozen or so small tasks running, 
 and launched 500 tasks with 1G of memory and 1 CPU each.
 
 Now I'd expect to se the disk/cpu/mem percentage metrics above go up 
 considerably. I did notice that cpus_percent went to around 0.94.
  
 What is the correct way to measure overall cluster utilization for capacity 
 planning? We can have the NOC watch this and simply add more hardware when 
 the number starts getting low.
 
 Thanks
  
 -- 
 Jeff Schroeder
 
 Don't drink and derive, alcohol and analysis don't mix.
 http://www.digitalprognosis.com
 
 


mesos c++ zookeeper blocks indefinately -- any plans to enhance?

2015-03-04 Thread pinktie
hi mesos users and devs,
We've observed that that the mesos 0.22.0-rc1 c++ zookeeper code appears to 
allow indefinite waits on responses.
This leads to application hangs blocked inside mesos zookeeper calls.
This can happen with a properly running zookeeper presumably able to make all 
responses.

Heres how we hung it for eg.
We issue a mesos zk set via

int ZooKeeper::set  (   const std::string  path,
const std::string  data,
int version 
)   

then inside a Watcher we process on CHANGED event to issue a mesos zk get on 
the same path via

int ZooKeeper::get  (   const std::string  path,
boolwatch,
std::string *   result,
Stat *  stat 
)   

we end up with two threads in the process both in pthread_cond_waits
#0  0x00334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f6664ee1cf5 in Gate::arrive (this=0x7f6140, old=0)
at ../../../3rdparty/libprocess/src/gate.hpp:82
#2  0x7f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0, pid=...)
at ../../../3rdparty/libprocess/src/process.cpp:2476
#3  0x7f6664ed2ce9 in process::wait (pid=..., duration=...)
at ../../../3rdparty/libprocess/src/process.cpp:2958
#4  0x7f6664e90558 in process::Latch::await (this=0x7f6ba0, duration=...)
at ../../../3rdparty/libprocess/src/latch.cpp:49
#5  0x7f66649452cc in process::Futureint::await (this=0x7fffa0fd9040, 
duration=...)
at ../../3rdparty/libprocess/include/process/future.hpp:1156
#6  0x7f666493a04d in process::Futureint::get (this=0x7fffa0fd9040)
at ../../3rdparty/libprocess/include/process/future.hpp:1167
#7  0x7f6664ab1aac in ZooKeeper::set (this=0x803ce0, path=/craig/mo, data=
...

and
#0  0x00334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f6664ee1cf5 in Gate::arrive (this=0x7f66380013f0, old=0)
at ../../../3rdparty/libprocess/src/gate.hpp:82
#2  0x7f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0, pid=...)
at ../../../3rdparty/libprocess/src/process.cpp:2476
#3  0x7f6664ed2ce9 in process::wait (pid=..., duration=...)
at ../../../3rdparty/libprocess/src/process.cpp:2958
#4  0x7f6664e90558 in process::Latch::await (this=0x7f6638000d00, 
duration=...)
at ../../../3rdparty/libprocess/src/latch.cpp:49
#5  0x7f66649452cc in process::Futureint::await (this=0x7f66595fb6f0, 
duration=...)
at ../../3rdparty/libprocess/include/process/future.hpp:1156
#6  0x7f666493a04d in process::Futureint::get (this=0x7f66595fb6f0)
at ../../3rdparty/libprocess/include/process/future.hpp:1167
#7  0x7f6664ab18d3 in ZooKeeper::get (this=0x803ce0, path=/craig/mo, 
watch=false,


So, really we are asking whether the mesos zk c++ api will be enhanced to not 
block indefinitely when results are beyond a time bound.

cheers
craig