The problem as I see it comes down to this: current carbon metrics loop is blocking, we can't collect push next stats until current are sent to carbon. This blocking adds delay to data collecting, so with 1s frequency we are in practice collecting data at frequency > 1.0 (1.05 at least in my case). Carbon server uses second resolution so after a while this delay will manifest as a gap in graph. If we eliminate all possible blockers from it, we will have much reliable stats. To do so we could have it like this:
* wake up every N second (N - carbon freq) * if we have free thread execute carbon_push_stats() in it * if not find thread with oldest metrics, kill it and execute carbon_push_stats() there In each thread carbon_push_stats() will get current time stamp, calculate metrics and send data to carbon server. If carbon_push_stats() takes longer than carbon frequency just for a single thread, than we will likely have other threads spare and use them for other metrics, so no data lost (unless there is network error, or carbon server has an error). If carbon_push_stats() takes longer than carbon frequency for all our threads, than the oldest one gets kill, we will have at least one gap. If killed thread re-executes carbon_push_stats() for current time stamp and still has issue we will kill another thread, and so on. So if issue is permanent we will have big gap. AFAIK carbon has no issues with reciving data from the past, so even if threads finishes in wrong order we are still safe. To sum up - yes, 5 threads and 1s freq gives us up to 5 second protection against carbon server issues. Number of threads should be configurable. 2013/3/2 Roberto De Ioris <[email protected]> > > > IMHO each thread should take care of whole set of metrics for each cycle, > > maybe we could just call carbon_push_stats() there. It would just be a > > matter of calling the thread every N seconds, thread would > > execute carbon_push_stats() and send data. > > So thread pool size would not need to be very big, 5-10 would do fine. > > With > > 5 threads and 1 second frequency, thread pool could contain metrics for > > last 5 seconds. > > > So, if i understand correctly the whole problem, having a threadpool of 5 > with 1 second carbon resolution, give us a tolerance for the "carbon > server stuck" situation of 5 seconds ? read: only if the carbon server is > blocked for more than 5 seconds, we will get a hole ? > > If i am right i think it is a good approach as we introduce a bit of > determinism in the system. > > > > > > > 2013/3/2 Roberto De Ioris <[email protected]> > > > >> > >> > 2013/3/2 Łukasz Mierzwa <[email protected]> > >> > > >> >> So it isn't hard to make carbon "skip a beat" with such low > >> frequency, > >> >> and > >> >> it seems that at least part of reasons are out of uWSGI control > >> (carbon > >> >> server response time). > >> >> To make it work better I think that we would need to have a thread > >> that > >> >> only does one thing - sleeps for $(carbon freq) seconds and than > >> creates > >> >> a > >> >> new thread that will collect current metrics and push data to carbon, > >> >> after > >> >> it's done or it had any problems, it dies (?). > >> >> > >> > > >> > Or instead of creating new thread for each cycle, use a pool of > >> threads > >> > for > >> > pushing each set of metrics. Use first available, if all are busy kill > >> the > >> > one with oldest metrics (?). But threads are tricky, I'm not sure if > >> it > >> > would work. > >> > > >> > -- > >> > Łukasz Mierzwa > >> > > >> > >> The threading api in 1.9 makes things line threadpool really easy to > >> realize, so i am not worried about that. But again, how to choose the > >> right size of the pool ? > >> > >> -- > >> Roberto De Ioris > >> http://unbit.it > >> > > > > > > > > -- > > Łukasz Mierzwa > > > > > -- > Roberto De Ioris > http://unbit.it > -- Łukasz Mierzwa
_______________________________________________ uWSGI mailing list [email protected] http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi
