Re: [ZODB-Dev] [X-Post] Figure out bottle neck in a repoze.bfg based web app

2012-02-06 Thread steve
Hi,

Last week I started this thread to try and figure out the bottle neck in our
web-app. After trying out different approaches, we decided to profile the app.
using repoze.profile[1] in the middleware and hit the app. using funkload[2].

This is what we saw for a test run: (although my question below is not related
to the specifics of the test run, details of the same can be provided if 
needed):

http://pastebin.com/pzwA74EN

As you can see, most of the time is being spent, acquiring locks, or (validating
?) cache code.

After looking at ZODB3-3.10.2-py2.6-linux-x86_64.egg/ZODB/Connection.py line 864
(_setstate) (line 13 in that ^ paste), I think most of those acquire()s are
coming from line 900:

self._inv_lock.acquire()

So, my question here is:
a. Is the invalidation-queue-size just a start-up time optimization or does it
play a part in invalidation of the zeo client cache for every transaction ? (we
use the repoze.zodbconn middleware, using which, every request to the webserver
is treated as a transaction)

b. Does drop_cache_rather_verify() play a part on a 'connected' zeo client --
ie: would the client always drop cache rather than verify each transaction or is
this just a startup optimization ?

cheers,
- steve

[...snip...]

-- 
random spiel: http://lonetwin.net/
what i'm stumbling into: http://lonetwin.stumbleupon.com/
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [X-Post] Figure out bottle neck in a repoze.bfg based web app

2012-01-26 Thread steve
Hi Alex,

On 01/24/2012 07:30 PM, Alex Clark wrote:
 On 1/24/12 8:50 AM, steve wrote:
 Hi All, 
 ...snip...
 You might try New Relic
 

Thanks for your reply and sorry for the delayed response (been fighting fires).
New Relic does seem interesting. Unfortunately however, their python agent
doesn't seem to log the zodb layer as db (they support an impressive number of
other python modules tho[1]).

That said, we shall try tracking the zodb layer by using custom instrumentation
as described in http://newrelic.com/docs/python/adding-python-instrumentation .

thanks for the suggestion,
cheers,
- steve

[1] http://newrelic.com/docs/python/instrumented-python-packages

-- 
random spiel: http://lonetwin.net/
what i'm stumbling into: http://lonetwin.stumbleupon.com/
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [X-Post] Figure out bottle neck in a repoze.bfg based web app

2012-01-26 Thread steve
Hi Laurence,

Thanks for your reply and sorry for the delayed response (been fighting fires).

On 01/24/2012 09:18 PM, Laurence Rowe wrote:
 On 24 January 2012 13:50, steve st...@lonetwin.net wrote:
 Hi All,

 ...snip...
 Any pointers/comments would be appreciated.
 
 (Following up only on zodb-dev as I'm not subscribed to the other lists.)
 
 I'm guessing, but I suspect your load tests may only be reading from
 the ZODB so you rarely see any cache misses.
 

Yes, that's correct, we do test (mostly) read operations.

 The most important tuning paramaters for ZODB in respect to memory
 usage are the number of threads and the connection_cache_size. The
 connection_cache_size controls the number of persistent objects kept
 live in the interpreter at a time. It's a per-connection setting and
 as each thread needs its own connection. Memory usage increases
 proportionally to connection_cache_size * number of threads. Most
 people use either one or two threads per process with the ZODB. I know
 plone.recipe.zope2instance defaults to two threads per process, though
 I think this is only to avoid locking up in the case of Plone being
 configured to load an RSS feed from itself.
 
 The Python Global Interpreter Lock prevents threads from running
 concurrently, so with ZEO running so many threads per process is
 likely to be counter-productive. Try with one or two threads and
 perhaps up the connection_cache_size (though loading from the zeo
 cache is very quick you must ensure your working set fits in the
 connection cache or else you'll be loading the same objects every
 request). If your CPU usage goes down a lot it means your spending
 time waiting for objects to be loaded in the connection and you might
 want to run a couple more processes than you have cpus if you are
 running one thread per process.
 

Thanks for that information. That is helpful. Yes, I shall try and experiment
with the connection_cache_size.

 I also suggest you check that your mod_wsgi config is correctly
 specifying WSGIProcessGroup, see:
 http://www.martinaspeli.net/articles/update-repoze-under-mod-wsgi-is-not-slow
 and 
 http://code.google.com/p/modwsgi/wiki/ConfigurationDirectives#WSGIDaemonProcess.
 

Yes, it is.

 To get more information about what is going on, try porting
 https://github.com/collective/collective.stats to work as WSGI
 middleware instead of hooking into Zope2's ZPublisher.
 

Hmm, this is interesting too. I'll try to see if I can port and fit it into our 
app.

Thanks for taking the time.

cheers,
- steve

-- 
random spiel: http://lonetwin.net/
what i'm stumbling into: http://lonetwin.stumbleupon.com/
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] [X-Post] Figure out bottle neck in a repoze.bfg based web app

2012-01-26 Thread steve
Hi Aijun,

On 01/25/2012 10:36 AM, Aijun Jian wrote:
 You might try Nginx + uWSGI http://projects.unbit.it/uwsgi/ (+Green)

Yep, thanks for the suggestion and we shall consider it but first I'd like
confirm that it is the webserver that's at fault rather than shoot in the dark
because changing the stack on production is a bit tedious for release reasons.

cheers,
- steve

-- 
random spiel: http://lonetwin.net/
what i'm stumbling into: http://lonetwin.stumbleupon.com/
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] [X-Post] Figure out bottle neck in a repoze.bfg based web app

2012-01-24 Thread steve
Hi All,

I apologize for the cross-post but by this mail I simply hope to get a few
pointers on how to narrow down to the problem I am seeing. I shall post to the
relevant list if I have further questions.

So here is the issue:

Short description:
I've got a repoze.bfg application running on top of zeo/zodb across multiple
servers, served using mod_wsgi and it's showing bad resource usage (both high
memory consumption as well as CPU usage). Are there any steps i can do to
localise whether this is an issue with zeo/zodb/mod_wsgi configuration, and/or
usage ?

Long description:

* I have a repoze.bfg (version 1.3) based app, which uses zodb (over zeo,
version 3.10.2) as the backend and is served up using apache+mod_wsgi. All
running on a minimal debian 6.0 based amazon instances.

* The architecture is 1 zodb server and 4 app instances running on individual
EC2 instances (all in the same availability zone). All of the instances are
behind an amazon Elastic Load Balancer

* At the web-server, we don't customize apache much (ie: we pretty much use the
stock debian apache config). We use mod_wsgi (version 3.3-2) to serve the
application in daemon mode, with the following parameters:

WSGIDaemonProcess webapp user=appname threads=7 processes=4
maximum-requests=1 python-path=/path/to/virtualenv/eggs

* The web app is the only thing that is served from these instances and we serve
the static content for the using apache rather than the web app.

* The zodb config on the db server looks like:
zeo
  address 8886
  read-only false
  invalidation-queue-size 1000
  pid-filename $INSTANCE/var/ZEO.pid
  # monitor-address 8887
  # transaction-timeout SECONDS
/zeo

blobstorage 1
  filestorage
path $INSTANCE/var/webapp.db
  /filestorage
  blob-dir $INSTANCE/var/blobs
/blobstorage

* The zeo connection string (for repoze.zodbconn-0.11) is:

zodb_uri = zeo://zodb server
ip:8886/?blob_dir=/path/to/var/blobsshared_blob_dir=falseconnection_pool_size=50cache_size=1024MBdrop_cache_rather_verify=true

(Note: the drop_cache_rather_verify=true is for faster startups)

Now with this, on live we have typical load such as:
top - 13:34:54 up 1 day,  8:22,  2 users,  load average: 11.87, 8.75, 6.37
Tasks:  85 total,   2 running,  82 sleeping,   0 stopped,   1 zombie
Cpu(s): 81.1%us,  6.7%sy,  0.0%ni, 11.8%id,  0.0%wa,  0.0%hi,  0.1%si,  0.2%st
Mem:  15736220k total,  7867340k used,  7868880k free,   283332k buffers
Swap:0k total,0k used,0k free,  1840876k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 5079 appname   21   0 1587m 1.2g 6264 S   77  8.1   9:23.86 apache2
 5065 appname   20   0 1545m 1.2g 6272 S   95  7.9   9:31.24 apache2
 5144 appname   20   0 1480m 1.1g 6260 S   86  7.4   5:49.92 apache2
 5127 appname   20   0 1443m 1.1g 6264 S   94  7.2   7:13.10 apache2




As you can see that very high load avg. and the apache processes spawned for
mod_wsgi (identifiable because of the user whose context they run under) consume
about 1.2Gs resident memory each.

With a constant load like this, the app. response progressively degrades. We've
tried to tweak the number of processes, the cache_size in the zeo connection
string but all to no avail. So, now rather than shoot in the dark, I would
appreciate suggestions on how I might be able to isolate the bottle-neck in the
stack.

One thing to note is that is high load and memory usage is only seen on the
production instances. When we test the app. using ab or funkload on a similar
setup (2 app instances instead of 4), we do not see this problem.

Any pointers/comments would be appreciated.

cheers,
- steve
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Wrong blob file being returned (similar to https://mail.zope.org/pipermail/zodb-dev/2011-February/014067.html )

2011-07-13 Thread steve
Hello Jim,

On 07/12/2011 07:28 PM, Jim Fulton wrote:
 On Tue, Jul 12, 2011 at 6:33 AM, stevest...@lonetwin.net  wrote:
  Hi,
 [...snip...]
  a. our site is image heavy (36293 blob files) and the servers are behind a 
 load
  balancer so in a single request to the web-app (a repoze.bfg site) we might 
 even
  load collectively 20+ blobs from any of the 4 servers.

  b. zeo connection string on the clients
  zodb_uri =
  
 zeo://xxx..xxx.xxx:8886/?blob_dir=%(here)s/../var/blobsshared_blob_dir=falseconnection_pool_size=50cache_size=1024MBdrop_cache_rather_verify=true

  c. $ cat var/blobs/.layout
  zeocache

  Any comments/suggestion on how to isolate and fix this problem would be 
 appreciated.

 We have a number of large apps with multiple terabytes of blobs and a
 vaguely similar configuration. We haven't seen this sort of problem.
 One difference is that we set the blob cache size.  I don't suppose
 you're running of disk space?


No we aren't running out of disk space and I hadn't really thought about 
setting 
a limit to the blob cache size (I assumed there'd be a builtin default). In 
fact, I didn't know this was configurable since it isn't mentioned in the docs 
for repoze.zodbconn (which is what we use to connect to the db)[1]. Fortunately 
it appears like repoze.zodbconn just passes along the parameters it doesn't 
understand to the underlying ClientStorage connector. I shall try setting a 
limit (which instinctively seems like a good thing to do anyways to avoid 
original state and cache state from going out of sync).

 The only suggestion I have is to keep an eye on it and try to
 reporoduce the problem.

Yes, I shall attempt to do that on a dev instance of the app.

 I would think that if a request returns an
 incorrect Blob, it would continue to. If someone reports a bad blob,
 get the URL and see if you can reproduce by making the same request to
 each of the app servers, bypassing the load balencer.  If one server
 is being bad, you can remove it from the LB pool to debug it.

That's the problem. I'd assumed the same behavior. Unfortunately it appears 
like 
incorrect blobs aren't always returned for subsequent requests. I traced down 
one such request to a single server, removed it from the LB and checked the 
same 
URL but the request didn't give me an incorrect image. I wonder, would this 
also 
be dependent on the connection pool or the thread serving the request?

Anyways, thanks for your help, I shall test and report back if I find a 
reliable 
way to reproduce this.

cheers,
- steve

[1] http://docs.repoze.org/zodbconn/narr.html#zeo-uri-scheme
-- 
random spiel: http://lonetwin.net/
what i'm stumbling into: http://lonetwin.stumbleupon.com/
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Wrong blob file being returned (similar to https://mail.zope.org/pipermail/zodb-dev/2011-February/014067.html )

2011-07-13 Thread steve
Hi William,

On 07/13/2011 02:26 AM, William Heymann wrote:
 On Tuesday 12 July 2011, steve wrote:
  Hi,

  I have a setup where 4 ZEO clients running on separate machines connect to
  a single DB server which runs on a different system by itself. The ZEO
  clients and the DB server all are at version ZODB3-3.10.2. Now, since the
  last few weeks some of our users have been reporting that they
  occasionally see incorrect images being returned.

 One thing you may want to look at is the load balancer. Apache has a bug that
 keeps being opened and closed again for swapping data between requests under
 load. Because it happens at the apache level and not the zope level you will
 never see this problem in any of the zope logs.

hmmm, interesting. Since we are using Amazon's Elastic LB to load balance these 
app. servers running of EC2 instances and we do not have any direct access to 
the ELB logs or systems, pining this problem on the ELB might be difficult 
(although AFIAK, the ELB setup also uses apache -- I could be mistaken though).

If it isn't too much effort could you please point me to the apache bug you 
mentioned ?

 Just make sure you don't have a similar situation or you could end up
 debugging the wrong thing to a huge waste of time. In my case I spent a lot of
 time debugging zope and when I finally discovered it was apache that was
 screwing up I ended up just dumping apache for nginx.


Thanks for the info. I shall keep a lookout for this.

cheers,
- steve

-- 
random spiel: http://lonetwin.net/
what i'm stumbling into: http://lonetwin.stumbleupon.com/
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Wrong blob file being returned (similar to https://mail.zope.org/pipermail/zodb-dev/2011-February/014067.html )

2011-07-12 Thread steve
Hi,

I have a setup where 4 ZEO clients running on separate machines connect to a 
single DB server which runs on a different system by itself. The ZEO clients 
and 
the DB server all are at version ZODB3-3.10.2. Now, since the last few weeks 
some of our users have been reporting that they occasionally see incorrect 
images being returned.

On googling I came across the thread below and was wondering whether I am 
seeing 
the same thing as this:

https://mail.zope.org/pipermail/zodb-dev/2011-February/014067.htm

...although the setup and version is different (ie: ZODB-3.8 and RelStorage).

Unfortunately, sorry but I do not know enough about ZODB internals to be able 
to 
say for sure. Is there a way I can test whether the problem is indeed with the 
wrong blob file being returned from the blobcache ? FWIW, we haven't figured 
out 
a way to consistently reproduce this error ourselves. Other things that 
may/may-not be relevant:

a. our site is image heavy (36293 blob files) and the servers are behind a load 
balancer so in a single request to the web-app (a repoze.bfg site) we might 
even 
load collectively 20+ blobs from any of the 4 servers.

b. zeo connection string on the clients
zodb_uri = 
zeo://xxx..xxx.xxx:8886/?blob_dir=%(here)s/../var/blobsshared_blob_dir=falseconnection_pool_size=50cache_size=1024MBdrop_cache_rather_verify=true

c. $ cat var/blobs/.layout
zeocache

Any comments/suggestion on how to isolate and fix this problem would be 
appreciated.

cheers,
- steve

-- 
random spiel: http://lonetwin.net/
what i'm stumbling into: http://lonetwin.stumbleupon.com/
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Is this a bug ? [ZODB3-3.10.0b8]

2010-10-07 Thread steve
Hi,

We recently updated the zodb package to ZODB3-3.10.0b8 on our servers which
serve a repoze.bfg application with a ZODB backned. Since then some methods
which worked before the update have been failing with traces similar to this:

(also at http://pastebin.com/aPKwBQp6 , in case that is easier to read)


File /home/ideo/oproj/src/oproj/oproj/views/userhandling.py, line 393, in 
delete_user
  profiles.remove(username)
File 
/home/ideo/oproj/eggs/repoze.folder-0.4-py2.5.egg/repoze/folder/__init__.py, 
line 210, in remove
  objectEventNotify(ObjectWillBeRemovedEvent(other, self, name))
File 
/home/ideo/oproj/eggs/zope.component-3.6.0-py2.5.egg/zope/component/event.py, 
line 33, in objectEventNotify
  adapters = zope.component.subscribers((event.object, event), None)
File 
/home/ideo/oproj/eggs/zope.component-3.6.0-py2.5.egg/zope/component/_api.py, 
line 138, in subscribers
  return sitemanager.subscribers(objects, interface)
File 
/home/ideo/oproj/eggs/zope.component-3.6.0-py2.5.egg/zope/component/registry.py,
 
line 315, in subscribers
  return self.adapters.subscribers(objects, provided)
File 
/home/ideo/oproj/eggs/zope.interface-3.5.1-py2.5-linux-x86_64.egg/zope/interface/adapter.py,
 
line 535, in subscribers
  subscription(*objects)
File /home/ideo/oproj/src/karl/karl/models/subscribers.py, line 166, in 
profile_removed
  _remove_email(parent, name)
File /home/ideo/oproj/src/karl/karl/models/subscribers.py, line 153, in 
_remove_email
  filtered = [x for x in mapping.items() if x[1] != name]
File 
/home/ideo/oproj/eggs/ZODB3-3.10.0b8-py2.5-linux-x86_64.egg/ZODB/Connection.py,
 line 
859, in setstate
  self._setstate(obj)
File 
/home/ideo/oproj/eggs/ZODB3-3.10.0b8-py2.5-linux-x86_64.egg/ZODB/Connection.py,
 line 
913, in _setstate
  self._reader.setGhostState(obj, p)
File 
/home/ideo/oproj/eggs/ZODB3-3.10.0b8-py2.5-linux-x86_64.egg/ZODB/serialize.py,
 
line 613, in setGhostState
  obj.__setstate__(state)
  SystemError: new style getargs format but argument is not a tuple


I just wanted to confirm whether this is something to do with the app or a ZODB 
bug. Any pointer/help would is appreciated.

cheers,
- steve
-- 
random spiel: http://lonetwin.net/
what i'm stumbling into: http://lonetwin.stumbleupon.com/
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Is this a bug ? [ZODB3-3.10.0b8]

2010-10-07 Thread steve
Hi Jim,
On 10/07/2010 10:20 PM, Jim Fulton wrote:
 What versions of Python are you using on the clients and on the server?


Sorry should have mentioned:
Python 2.5.2 (r252:60911, Jan 24 2010, 17:44:40)
[GCC 4.3.2] on linux2

cheers,
- steve

[...snip...]

-- 
random spiel: http://lonetwin.net/
what i'm stumbling into: http://lonetwin.stumbleupon.com/
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Is this a bug ? [ZODB3-3.10.0b8]

2010-10-07 Thread steve
On 10/07/2010 10:29 PM, steve wrote:
 On 10/07/2010 10:26 PM, steve wrote:
  Hi Jim,
  On 10/07/2010 10:20 PM, Jim Fulton wrote:
   What versions of Python are you using on the clients and on the server?


  Sorry should have mentioned:
  Python 2.5.2 (r252:60911, Jan 24 2010, 17:44:40)
  [GCC 4.3.2] on linux2

 ...on both


ugh, I know this must be annoying but, I forgot to add, clients and servers are 
both x86_64 systems (amazon ec2 instances actually).

cheers,
- steve
-- 
random spiel: http://lonetwin.net/
what i'm stumbling into: http://lonetwin.stumbleupon.com/
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] db.undoLog() does not work as documented ..

2010-08-23 Thread steve
Hello Jim,

On 08/23/2010 08:59 PM, Jim Fulton wrote:
 On Sat, Aug 21, 2010 at 5:39 AM, stevest...@lonetwin.net  wrote:
  Hi,

  I was reading up on the zodb docs, specifically about undoing tansactions[1]
  where it mentions that ...

  
  undoLog(start, end[, func])() method on the DB instance returns the log of 
 past
  transactions, returning transactions between the times start and end, 
 measured
  in seconds from the epoch.
  

 That documentation is incorrect.

 See the definition of undoLog in ZODB/interfaces.py


Thanks for clarifying that. I've filed a bug:

https://bugs.launchpad.net/zodb/+bug/622828

cheers,
- steve

-- 
random spiel: http://lonetwin.net/
what i'm stumbling into: http://lonetwin.stumbleupon.com/
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] db.undoLog() does not work as documented ..

2010-08-21 Thread steve
Hi,

I was reading up on the zodb docs, specifically about undoing tansactions[1] 
where it mentions that ...


undoLog(start, end[, func])() method on the DB instance returns the log of past 
transactions, returning transactions between the times start and end, measured 
in seconds from the epoch.


but for some reason this does seem to work for me with various 
combination/variations of providing the seconds since epoch parameter...

  import time
  import transaction
  from ZODB import FileStorage, DB
  storage = FileStorage.FileStorage('var/karl.db')
  db = DB(storage)
  db.undoLog(time.mktime((2010, 8, 20, 15, 12, 00, 00, 00, 00)), time.time())
[]

or even:

  db.undoLog(1282242600.0, 1282380668.0)
[]

however, when provided with a random int, it seems to work fine, ie:

  db.undoLog(0, sys.maxint)

...works (ie: gives me transaction log from earliest to latest).

So, am I doing something wrong or is the documentation wrong ?

cheers,
- steve

[1] http://www.zodb.org/documentation/guide/transactions.html#undoing-changes
-- 
random spiel: http://lonetwin.net/
what i'm stumbling into: http://lonetwin.stumbleupon.com/
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev