On Wed, Sep 7, 2016 at 1:21 PM, Andy Grimm <[email protected]> wrote:

> On Wed, Sep 7, 2016 at 11:22 PM, Diego Castro <[email protected]
> > wrote:
>
>> Hello, list.
>> We have been running Origin since last November and i'd like to share
>> some experiences, pains and thoughts.
>>
>> Our origin cluster has about 25 servers including masters,nodes and
>> routers. We have roughly 500 applications exposing services and a bunch of
>> HPA firing up containers all the time.
>>
>> 1) Resource consumption: i noticed during the day a increase of memory
>> consumption due multiple reloads, a lot of process keep running until the
>> connections is finished or OOM kill. Other issue regarding restarts is that
>> due to TCP SYN DROP iptables we are facing some high latencies.  What can
>> we do to reduce restart overhead ?
>>
>
> You seem to have several questions intertwined here, and I am by no means
> an expert on this, but on the "lots of processes keep running" topic, you
> may be hitting https://bugzilla.redhat.com/show_bug.cgi?id=1364870
> (though this manifests as more of a CPU consumption issue than a memory
> issue).   In short, what we've seen is cases where haproxy connections are
> "orphaned", so the old processes never exit -- they continuously think they
> have one or two "jobs" left, but they never actually handle them.  I think
> this is fixed in the latest 1.5.x release of haproxy, but have not had a
> chance to test yet.
>


In 3.3 there are some more knobs you can set to limit the length of time
that an haproxy will stay around after a restart, you may wish to try
playing wit hthat... but the underlying bug is still there in 3.3.


>
>
>>
>> 2) Metrics: Would be nice to pull some metrics from the routers,
>> something like general network i/o and per endpoint traffic, i found a
>> prometheus export but due to process restart the endpoint states are
>> cleaned. HAProxy 1.6 have a fix for that (http://blog.haproxy.com/2015/
>> 10/14/whats-new-in-haproxy-1-6/). Do we have plans to upgrade to 1.6 ?
>> What kind of metrics do we have available today?
>>
>>
The lack of metrics is a problem, and there's no great answer to your
question/

There are no plans to go to 1.6 at the moment, but we do need to solce the
stats problem, and we need to solve the reload problem, so we may end up
moving.  But we are investigating upstream ingress and trying to get
support for that into OpenShift so we can migrate and deprecate the router.

-ben



>
>>
>> ---
>> Diego Castro / The CloudFather
>> GetupCloud.com - Eliminamos a Gravidade
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>
>>
>
> _______________________________________________
> users mailing list
> [email protected]
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>
_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to