Hi All, This is going to be a long post but it explains exactly how to reproduce the issue.
So i was able to nail down the part of code that is causing the issue. Let me explain a little bit, in my app there is an API that gets me a random content (key) from a bucket initially my implementation was like this: randn = random.randrange(0000, 9999) results = content_bucket.search("id:*",fl="*,score", sort="random_%s asc" % str(randn), start=0, rows=2) if len(results['docs']) == 0: raise ContentException('No content found') images = [] for res in results['docs']: images.append(res['imageBinary']) It was not the best in term of performance, so since that bucket does not change i decided to do the following: print 'content bucket' content_bucket =riak_client.bucket_type('pickorflip_content').bucket('content') print 'cache content keys' content_keys = [] #content_keys = content_bucket.get_keys() for keys in content_bucket.stream_keys(): for key in keys: content_keys.append(key) I'm fully aware of NEVER user get_keys() and stream_keys() in production but since it's only at startup and we are talking about couple of thousand keys i don't see an issue in doing that. By storing the keys on an array i can quickly randomly get 3 keys and get the content i need much quickly than doing the riak search. I'm sure it loads that code only at startup and not everytime because i've added the print that gets printed only at startupt and not everytime i call the API. So i created another app in order to reproduce the issue: /opt/myapp/myapp.py import os, sys from flask import Flask, request, Response, jsonify, g, url_for, make_response, session import riak app = Flask(__name__) riak_client = riak.RiakClient(host='127.0.0.1', pb_port=10017, protocol='pbc') print 'Content bucket' content_bucket = riak_client.bucket_type('pickorflip_content').bucket('content') print 'cache content keys' content_keys = [] #content_keys = content_bucket.get_keys() for keys in content_bucket.stream_keys(): for key in keys: content_keys.append(key) @app.route("/") def myapp(): print 'get bucket' user_bucket = riak_client.bucket_type('user_type').bucket('users') print 'get user info from bucket' user_info = user_bucket.get('myuser') return '', 200 if __name__ == '__main__': app.run( host="0.0.0.0",debug=False, port=int("5000") ) The myapp.ini for uwsgi: [uwsgi] vhost = true socket = /tmp/myapp.sock venv = /opt/myapp/venv chdir = /opt/myapp/ module = myapp callable = app processes = 4 close-on-exec=true master = true carbon = 10.21.1.1:2003 stats = /tmp/stats.sock post-buffering = 1 nginx: location / { include uwsgi_params; uwsgi_pass unix:/tmp/myapp.sock; } This is my locust load testing script: test.py from locust import HttpLocust, TaskSet, task class LoadTest(TaskSet): @task def get_user(self): self.client.get("/") class WebsiteUser(HttpLocust): task_set = LoadTest min_wait = 1000 max_wait = 3000 And you can run it with: locust -H http://ip:port -f test.py So while it's running i monitor with uwsgitop the socket uwsgitop /tmp/stats.sock So i can see what the workers are doing. After less than a minute all the workers gets busy and stuck. But if i ran uwsgi with only one process it works fine. If i ran just flask with python myapp.py and then i loadtest against port 5000 i do not hit this issue. So it's only when i use uwsgi and multiple processes. Any idea of what i could further troubleshoot to understand exactly why the get_keys is causing the issue with multi processes in uwsgi ? Considering that part of the code it's loaded only at startup of uwsgi or maybe i'm missing something here? Thank you :tele On Sat, 20 Jun 2015 02:59:20 -0500 tele <t...@rhizomatica.org> wrote: > Hi All, > > I'm trying to troubleshoot an issue and i'm posting here because its > caused by connecting to Riak even if i may miss some configuration on > uwsgi. This is my enviroment: > > nginx + uwsgi + flask app > > The flask app uses Riak and Redis. > The connection between nginx and uwsgi is via unix socket. > > If i use only one process in uwsgi i can easily run simultaneous > requests without hitting the issue i'm having. When i add even only > one more process all the workers gets busy and the app hangs. If i > remove the riak code part it's working fine, so the issue has to be > somewhere on the connection pooling or something else. > > I'm experiencing the same issues as this user: > http://lists.basho.com/pipermail/riak-users_lists.basho.com/2014-January/014387.html > > If i use protobuf protocol i hit the DecodeErrors messages, sometimes > i don't get any error the app just hangs. If i use the http protocol > with riak, i don't get any exception but it just hangs. > > It hangs on a simple snippet: > > user_bucket = riak_client.bucket_type('user_type').bucket('users') > user_info = user_bucket.get(user_id) > > I'm using Locust to generate traffic > > 1 uwsgi worker, locust 10 users hatch 2 seconds = no issues > 2+ uwsgi worker, locust 10 users hatch 2 seconds = app hangs after few > minutes > > For Riak i have 3 nodes running on the same box, i'm using the latest > version from git. > > The app hangs in any of those connection scenarios: > > riak_client = riak.RiakClient(host='127.0.0.1', pb_port=10017, > protocol='pbc') riak_client = riak.RiakClient(protocol='pbc', > nodes=[{'host':'127.0.0.1', 'pb_port':10017},{'host':'127.0.0.1', > 'pb_port':10027},{'host':'127.0.0.1', 'pb_port':10037}]) riak_client = > riak.RiakClient(protocol='http', http_port=10018, host='127.0.0.1') > riak_client = riak.RiakClient(protocol='http', > nodes=[{'host':'127.0.0.1', 'http_port':10018},{'host':'127.0.0.1', > 'http_port':10028},{'host':'127.0.0.1', 'http_port':10038}]) > > My uwsgi config is the following: > > [uwsgi] > vhost = true > socket = /tmp/app.sock > venv = /opt/app/venv > chdir = /opt/app/ > module = myapp > callable = app > processes = 2 > master = true > close-on-exec=true > master = true > post-buffering = 1 > carbon = 127.0.0.1:2003 > stats = /tmp/stats.sock > > If i sniff the network traffic, when it hangs uwsgi basically stops > sending any request to riak, all the workers becomes busy and the only > way to restore it it's a restart of uwsgi. > > My SW versions are the following: > > Riak latest from git. > > Python libs: > riak (2.2.0) > riak-pb (2.0.0.16) > protobuf (2.5.0) > > UWSGI: 2.0.10 > > Any idea on how i can troubleshoot this issue? It seems related to > uwsgi but it's happening only when using the Riak connection. > > Thank you > > :tele > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com