Re: [Zope-dev] Very severe memory leak

2003-08-28 Thread Shane Hathaway
Leonardo Rochael Almeida wrote:
Let's assume we're not hitting a 2.6 bug. we could be hitting a 2.6
feature :-) I've read Casey's article about optimizing ZCatalog and I
know there has been a lot of ZCatalog optimization work. Has there been
any changes in 2.6 (in ZCatalog or thread cache or anywhere else) that
was a tradeoff of memory for speed?
Not that I know of.  To the contrary, I think--we sometimes can't do an 
optimization because it would consume extra memory.

You mean that one ZCatalog object is in the cache, and (indirectly)
attached to it are thousands of DateTimes? Then we have a huge problem.
Why?  Although the catalog manages those DateTime objects, the DateTimes 
are in many ZODB records.

I was counting on the objects in cache parameter to put an upper bound
in Zope memory consumption, even if this upper bound was a little fuzzy.
In the past, the cache size parameter was very fuzzy.  It was not an 
upper bound.  Now, it is not only an upper bound, it also tends to act 
as a lower bound.  (Note that the cache is still allowed to grow 
indefinitely within the scope of a request, however.)

Remove all metadata fields you can. 
 If you have index dates, use DateTimeIndexes rather than the standard 
indexes.


Well, this is a classic ZCatalog+ZClass site. Almost all ZClasses are
CatalogAware and they're all cataloged in the monster ZCatalog named
Catalog in the folder that represents the root of the site (as served
by Apache) because that's how CatalogAware ZClasses work out of the box
(I know the catalog name can be changed). It has a huge amount of
Indexes and Metadata. The client grew the site organically, creating
ZClasses and hanging them in the same ZCatalog as he was needing them.
It's gonna take a lot of time to untangle it all and we're working on
it, but for now we need to deal with the emergency that is the fact tha
migrating to 2.6.1 put us in the very hot spot of not having a way to
put an upper bound to Zope memory consumption.
I would characterize it the other way around; there was no way to 
establish a hard boundary until now.  I suggest you reduce the ZODB 
cache size to 2000 (or less) and see if the problem goes away.

I find it helpful to simply visit the information ZCatalog is storing 
for just one cataloged object.  In the catalog contents view, click one 
object.  Look at each field and consider how ZCatalog might misuse that 
field for other objects.
What kind of misuse you have in mind?
Large strings, DateTimes, and mutable objects are all bad for the catalog.

Anyone know off the top of their head how do I get to the object caches?
app._p_jar._cache will get you started.  It is dictionary-like.

Shane

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-27 Thread Leonardo Rochael Almeida
On Mon, 2003-08-25 at 23:35, Shane Hathaway wrote:
 On 08/25/2003 05:12 PM, Leonardo Rochael Almeida wrote:
 
  Let me restate something important, because I forget about it myself
  sometimes when I'm thinking about this problem: *upgrading to 2.6.1 made
  the situation definetly worse*. The site became lot faster, but the
  memory situation became intolerable, so I believe there is something
  introduced between 2.5.1 and 2.6.1 that is causing our problems. Is
  there a fix in 2.6.2 for some leak introduced *in* the 2.6 series?
 
 I think there are, actually, but AFAIK they were obscure and rare.

That would make the likelyhood of our problems being caused by them very
small. Is there any way to know if we're hitting them?

Let's assume we're not hitting a 2.6 bug. we could be hitting a 2.6
feature :-) I've read Casey's article about optimizing ZCatalog and I
know there has been a lot of ZCatalog optimization work. Has there been
any changes in 2.6 (in ZCatalog or thread cache or anywhere else) that
was a tradeoff of memory for speed?

If there was such a change, it would explain why we had a slow site that
did fit into memory while now we have a fast one that doesn't.

  Another thing, we have already estabilished that clearing the caches
  resets the DateTime refcount, so the DateTime must be anchored to the
  caches somehow. So, if the DateTime refcounts are constantly in the
  50k-100k range, how come the cache object count for all threads is below
  5k?
 
 If the DateTimes are in catalog metadata or indexes, this would make sense.
 
  If they're indirectly attached, it means that a lot of DateTimes are
  anchored in just a few objects, so no matter how much I reduce the
  target cache size, it might never be enough to throw out the
  DateTimes...
 
 They are probably in the ZCatalog.

You mean that one ZCatalog object is in the cache, and (indirectly)
attached to it are thousands of DateTimes? Then we have a huge problem.
I was counting on the objects in cache parameter to put an upper bound
in Zope memory consumption, even if this upper bound was a little fuzzy.


  Remove all metadata fields you can. 
   If you have index dates, use DateTimeIndexes rather than the standard 
 indexes.

Well, this is a classic ZCatalog+ZClass site. Almost all ZClasses are
CatalogAware and they're all cataloged in the monster ZCatalog named
Catalog in the folder that represents the root of the site (as served
by Apache) because that's how CatalogAware ZClasses work out of the box
(I know the catalog name can be changed). It has a huge amount of
Indexes and Metadata. The client grew the site organically, creating
ZClasses and hanging them in the same ZCatalog as he was needing them.

It's gonna take a lot of time to untangle it all and we're working on
it, but for now we need to deal with the emergency that is the fact tha
migrating to 2.6.1 put us in the very hot spot of not having a way to
put an upper bound to Zope memory consumption.

 I find it helpful to simply visit the information ZCatalog is storing 
 for just one cataloged object.  In the catalog contents view, click one 
 object.  Look at each field and consider how ZCatalog might misuse that 
 field for other objects.

What kind of misuse you have in mind?

  hmm, any chance of this remote console showing up somewhere? :-)
 
 I've attached what I threw together. 

Thanks!

  I might use it to navigate the thread object cache.
 
 It's unfortunate you have to go so deep, but that sounds like a good plan.

Anyone know off the top of their head how do I get to the object caches?

Cheers, Leo

-- 
Ideas don't stay in some minds very long because they don't like
solitary confinement.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-26 Thread Shane Hathaway
On 08/25/2003 05:12 PM, Leonardo Rochael Almeida wrote:
But maybe this means that the leak is not related to the DateTime
refcounts. It's just that the fast and continual increase in DateTime
objects is really bugging me. BTW, what is the usual DateTime refcount
in heavy ZCatalog sites you guys run?
We try to keep it near 0. :-)  DateTimes are full of nonsense.  Zope's 
DateTime troubles drove the creation of Python 2.3's new datetime 
module.  From what I can tell, the new datetimes are excellent.

Let me restate something important, because I forget about it myself
sometimes when I'm thinking about this problem: *upgrading to 2.6.1 made
the situation definetly worse*. The site became lot faster, but the
memory situation became intolerable, so I believe there is something
introduced between 2.5.1 and 2.6.1 that is causing our problems. Is
there a fix in 2.6.2 for some leak introduced *in* the 2.6 series?
I think there are, actually, but AFAIK they were obscure and rare.

Another thing, we have already estabilished that clearing the caches
resets the DateTime refcount, so the DateTime must be anchored to the
caches somehow. So, if the DateTime refcounts are constantly in the
50k-100k range, how come the cache object count for all threads is below
5k?
If the DateTimes are in catalog metadata or indexes, this would make sense.

If they're indirectly attached, it means that a lot of DateTimes are
anchored in just a few objects, so no matter how much I reduce the
target cache size, it might never be enough to throw out the
DateTimes...
They are probably in the ZCatalog.  Remove all metadata fields you can. 
 If you have index dates, use DateTimeIndexes rather than the standard 
indexes.

I find it helpful to simply visit the information ZCatalog is storing 
for just one cataloged object.  In the catalog contents view, click one 
object.  Look at each field and consider how ZCatalog might misuse that 
field for other objects.

hmm, any chance of this remote console showing up somewhere? :-)
I've attached what I threw together.  It Works For Me. ;-)  I put it in 
Zope's Extensions directory then made an external method to call the 
listenlocal function.  It sets up a TCP listener bound only to 
localhost, so you have to telnet to port 8765 locally.  No readline 
support.  No method is provided for shutting down the listener (except 
restarting Zope).  To exit the session, push ctrl-D then enter.

I might use it to navigate the thread object cache.
It's unfortunate you have to go so deep, but that sounds like a good plan.

Shane

import socket
from code import InteractiveConsole
from types import IntType, StringType
from thread import start_new_thread, get_ident
import sys
import time


class ThreadedObjectProxy:
Proxy to different objects based which thread invoked it.

def __init__(self, default):
self._default = default
self._alts = {}

def setAlternative(self, o):
self._alts[get_ident()] = o

def delAlternative(self):
try: del self._alts[get_ident()]
except KeyError: pass

def __getattr__(self, name):
ob = self._alts.get(get_ident(), self._default)
return getattr(ob, name)


class RemoteConsole(InteractiveConsole):

def __init__(self, sock, file, filename=None, locals=None):
if filename is None:
filename = str(file)
self.sock = sock
self.file = file
InteractiveConsole.__init__(self, locals=locals, filename=filename)

def raw_input(self, prompt=''):
if prompt:
self.file.write(prompt)
s = self.file.readline().rstrip()
if s == '\x04':  # Ctrl-D
raise EOFError
return s

def interactAndClose(self):
sys.stdout.setAlternative(self.file)
sys.stderr.setAlternative(self.file)
sys.stdin.setAlternative(self.file)
try:
try:
self.interact()
except EOFError:
pass
finally:
sys.stdout.delAlternative()
sys.stderr.delAlternative()
sys.stdin.delAlternative()
self.file.close()
self.sock.close()


def setupStreams():
if not hasattr(sys.stdout, 'setAlternative'):
sys.stdout = ThreadedObjectProxy(sys.stdout)
if not hasattr(sys.stderr, 'setAlternative'):
sys.stderr = ThreadedObjectProxy(sys.stderr)
if not hasattr(sys.stdin, 'setAlternative'):
sys.stdin = ThreadedObjectProxy(sys.stdin)


def accept_connections(s):
while 1:
cs, addr = s.accept()
f = cs.makefile('w+', 0)
i = RemoteConsole(cs, f)
start_new_thread(i.interactAndClose, ())


def listen(addr, locals=None):
setupStreams()
if isinstance(addr, StringType):
t = socket.AF_UNIX
elif isinstance(addr, IntType):
t = socket.AF_INET
addr = ('', addr)
else:
t = socket.AF_INET
s = socket.socket(t, socket.SOCK_STREAM)

Re: [Zope-dev] Very severe memory leak

2003-08-26 Thread Stefan H. Holek
I have got about 7 DateTime refs on that one CMF site at the moment and 
can arbitrarily increase this number by serving out pages ;-). They don't 
stick though, but go back to a stable 4. No cause for alarm here.

Did you rebuild the catalog (-indexes) after the upgrade? New Zope versions 
usually come with important BTree fixes ;-).

Wild guess, but do you by any chance run exUserFolder?

hth,
Stefan
--On Montag, 25. August 2003 18:12 -0300 Leonardo Rochael Almeida 
[EMAIL PROTECTED] wrote:

But maybe this means that the leak is not related to the DateTime
refcounts. It's just that the fast and continual increase in DateTime
objects is really bugging me. BTW, what is the usual DateTime refcount
in heavy ZCatalog sites you guys run?


--
The time has come to start talking about whether the emperor is as well
dressed as we are supposed to think he is.   /Pete McBreen/
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-26 Thread Shane Hathaway
Paul Winkler wrote:
On Mon, Aug 25, 2003 at 10:35:58PM -0400, Shane Hathaway wrote:

They are probably in the ZCatalog.  Remove all metadata fields you can. 
If you have index dates, use DateTimeIndexes rather than the standard 
indexes.


Hmm. ZCatalog provides a DateIndex, is this what you mean?
Googling for DateTimeIndex  zope turns up exactly two results, in 
German which I can't read. (the google-supplied translations are
pretty much gibberish.)
Yes, I meant DateIndex.

Hmmm again: I notice that out-of-the-box, CMF Default's portal_catalog
uses the following:
DateFieldIndex
created FieldIndex
effective   FieldIndex
expires FieldIndex
modifiedFieldIndex
Why are these FieldIndex rather than DateTime index?
No one has corrected them, probably out of fear that something might break.

Shane

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-26 Thread Leonardo Rochael Almeida
On Tue, 2003-08-26 at 12:05, Stefan H. Holek wrote:

 [...]
 Did you rebuild the catalog (-indexes) after the upgrade? New Zope versions 
 usually come with important BTree fixes ;-).

If by rebuilding you mean clicking on the Update button, then yes, as
part of changing the DateTime FieldIndexes to DateIndexes. If you meant
something else, then no.

 Wild guess, but do you by any chance run exUserFolder?

As a matter of fact, I do. Is there a known leak problem with
exUserFolder?

-- 
Ideas don't stay in some minds very long because they don't like
solitary confinement.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-25 Thread Anthony Baxter

 Leonardo Rochael Almeida wrote
 Lately this was not enough, however, as Zope started taking more and
 more memory, to the point that it frequently required more than one
 restart during the day.

Note that running with ZEO will at least mean that your server restarts
happen a lot faster.


-- 
Anthony Baxter [EMAIL PROTECTED]   
It's never too late to have a happy childhood.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-25 Thread Leonardo Rochael Almeida
On Sat, 2003-08-23 at 22:18, Shane Hathaway wrote:
 On 08/22/2003 05:38 PM, Leonardo Rochael Almeida wrote:
  In time, DateTime refcounts eventually dwarves the second place by an
  order of magnitude. I think this is related to the fact that DateTime
  instances are stored as metadata, even though the date indexes have been
  converted to DateTime indexes. The question is, why aren't those
  instances being released? What is holding on to them?
 
 When you flush the cache, those DateTimes should disappear.  If they 
 don't, the leak is keeping them.

They are disappearing. Too bad they return immediately after, as they're
comming from a very heavy catalog query result.

 LeakFinder is an early attempt to share some of the techniques I use for 
 finding leaks.  Unfortunately, those techniques aren't very useful until 
 you've already searched for weeks, which means LeakFinder isn't very 
 good for emergencies.

Well, I have been searching for close to a week now :-)

BTW, removing the third parameter in the PickleCache call makes the
RefCounts tab work again. I'll see if I can get some tracebacks out of
object creations now.

 Here are the things you should look at first:
 
 1) The ZODB cache size.  The meaning of this number changed dramatically 
 in 2.6.  Before 2.6 it was a very vague number.  In 2.6 it's a target 
 number of objects that Zope actually tries to maintain.  Before 2.6 it 
 might have made sense to set the ZODB cache size to some arbitrarily 
 high number like 100,000; in 2.6 you want to start at about 2000 and 
 adjust from there.  There are tools in 2.6 for helping you adjust the 
 number.

We're keeping it at 15,000. Most threads keep around 12k and 13k. One of
them sometimes exagerates at 100k or so. Hmm, funny... when I look at
them now, they're all under 5k...

 2) The number of ImplicitAcquisitionWrappers present in the system.  I 
 have found it to be a reliable indicator of whether you have a leak or 
 not.  Expect this number to stay under 400 or so.  If it grows 
 gradually, there's a leak.  Watch the refcounts screen.

It keeps under 200. At least when I'm looking :-)

 3) Is Python compiled with cyclic garbage collection enabled?  2.4 and 
 above absolutely require cyclic garbage collection.

Yes, I mentioned I checked it in my message. I also checked that the gc
didn't find any uncollectables, so it's definetly *not* a cycle leak.
It's a please-release-my-reference-and-lemme-die-in-peace leak.

 4) Don't use 2.6.1.  Use 2.6.2, which has fixes for known leaks.  It is 
 actually already tagged in CVS as Zope-2_6_2, and it's what zope.org 
 is now running.  Various unrelated things prevented a formal release 
 this week.

You guys are sure about this? The client is very much against running
things from CVS because they just don't want to go thru another upgrade
processes when the real 2.6.2 shows up.

 If all else fails, grep all Python modules for sys._getframe() and 
 sys.exc_info().  These are the primary causes of memory leaks in 
 Python 2.1 and below.  If you're brave, you can just run Zope under 
 Python 2.2, which fixes those particular leaks AFAIK.

Do I need to be particularly brave to run 2.6.2 (as tagged in the CVS)
under Python2.2? is it still an unsupported combination? I know it's
2.7's job to be 2.2 compliant, but I've seen reports of more and more
people running Zope 2.6.x under Python 2.2.

 Finally, there's always hope. :-)  The latest thing I've been doing is 
 running Zope in a debug build of Python.  A debug build makes a magical 
 sys._getobjects() available,

Where do you discover these things? :-)

 allowing you to inspect all live objects 
 through a remote console.

Interesting, which one do you use? does Boa have a remote console? or is
it just plain monitor port?

 Since debug builds aren't much slower than 
 standard builds, you can even run a debug build in production for short 
 periods of time.  I've been building a small library of functions for 
 working in this mode, and if you need them, I'll pass them along.  I'd 
 have to warn you that they are anything but intuitive in their purpose 
 and use, though.

I'll let you know if it gets to that :-)

For now we'll be stress testing to try and locate the URLs that cause
the most leakage.

Cheers, Leo

PS: to the people who offered other suggestions: Yes, we're already
running ZEO to restart faster. No, buying more memory now is not an
option. Yes, we're already automatically restarting the process when the
available memory in the machine gets to a critical state, but not thru
Autolance.

-- 
Ideas don't stay in some minds very long because they don't like
solitary confinement.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-25 Thread Shane Hathaway
Leonardo Rochael Almeida wrote:
On Sat, 2003-08-23 at 22:18, Shane Hathaway wrote:
When you flush the cache, those DateTimes should disappear.  If they 
don't, the leak is keeping them.
They are disappearing. Too bad they return immediately after, as they're
comming from a very heavy catalog query result.
Hmm, that tends to indicate you really need that much memory.

1) The ZODB cache size.  The meaning of this number changed dramatically 
in 2.6.  Before 2.6 it was a very vague number.  In 2.6 it's a target 
number of objects that Zope actually tries to maintain.  Before 2.6 it 
might have made sense to set the ZODB cache size to some arbitrarily 
high number like 100,000; in 2.6 you want to start at about 2000 and 
adjust from there.  There are tools in 2.6 for helping you adjust the 
number.
We're keeping it at 15,000. Most threads keep around 12k and 13k. One of
them sometimes exagerates at 100k or so. Hmm, funny... when I look at
them now, they're all under 5k...
Maybe the individual ZODB objects are excessively large.  A few months 
ago we found a bug in the new zope.org software that made it store file 
uploads as a single chunk.  A lot of those files were over 1 MB in size. 
 This exhausted 1 GB of RAM pretty quickly.  Splitting the files into 
chunks, like Zope usually does automatically, solved the problem.

2) The number of ImplicitAcquisitionWrappers present in the system.  I 
have found it to be a reliable indicator of whether you have a leak or 
not.  Expect this number to stay under 400 or so.  If it grows 
gradually, there's a leak.  Watch the refcounts screen.


It keeps under 200. At least when I'm looking :-)
That's another strong indication that it's not actually leaking.  Or 
perhaps it's a class of leak I have not yet encountered. ;-)

4) Don't use 2.6.1.  Use 2.6.2, which has fixes for known leaks.  It is 
actually already tagged in CVS as Zope-2_6_2, and it's what zope.org 
is now running.  Various unrelated things prevented a formal release 
this week.


You guys are sure about this? The client is very much against running
things from CVS because they just don't want to go thru another upgrade
processes when the real 2.6.2 shows up.
Well, 2.6.2b6 has been released (but not announced), and I've been 
promised there will be no more changes before 2.6.2.

If all else fails, grep all Python modules for sys._getframe() and 
sys.exc_info().  These are the primary causes of memory leaks in 
Python 2.1 and below.  If you're brave, you can just run Zope under 
Python 2.2, which fixes those particular leaks AFAIK.


Do I need to be particularly brave to run 2.6.2 (as tagged in the CVS)
under Python2.2? is it still an unsupported combination? I know it's
2.7's job to be 2.2 compliant, but I've seen reports of more and more
people running Zope 2.6.x under Python 2.2.
2.6 is also intended to be compatible with Python 2.2.  It's just that 
here at ZC we haven't gone through the rigorous analysis we've been 
planning.  Hmm, come to think of it, the community could actually do 
that analysis.  It wouldn't be terribly difficult.  Any volunteers?

Finally, there's always hope. :-)  The latest thing I've been doing is 
running Zope in a debug build of Python.  A debug build makes a magical 
sys._getobjects() available,
Where do you discover these things? :-)
I learned this one from Jeremy Hylton.  He posted some info about it on 
his weblog after he mentioned it.

http://www.python.org/~jeremy/weblog/030410.html

allowing you to inspect all live objects 
through a remote console.
Interesting, which one do you use? does Boa have a remote console? or is
it just plain monitor port?
I rolled my own remote console.  The monitor port is too buggy and Boa 
is overkill for this.

Since debug builds aren't much slower than 
standard builds, you can even run a debug build in production for short 
periods of time.  I've been building a small library of functions for 
working in this mode, and if you need them, I'll pass them along.  I'd 
have to warn you that they are anything but intuitive in their purpose 
and use, though.


I'll let you know if it gets to that :-)

For now we'll be stress testing to try and locate the URLs that cause
the most leakage.
That's an excellent approach.

Shane

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-25 Thread Paul Winkler
On Mon, Aug 25, 2003 at 02:57:50PM -0400, Shane Hathaway wrote:
 2.6 is also intended to be compatible with Python 2.2.  It's just that 
 here at ZC we haven't gone through the rigorous analysis we've been 
 planning.  Hmm, come to think of it, the community could actually do 
 that analysis.  It wouldn't be terribly difficult.  Any volunteers?

Maybe. I'd really like to run with python 2.2. What is involved in the
rigorous analysis? Is there a plan I could follow?

-- 

Paul Winkler
http://www.slinkp.com
Look! Up in the sky! It's THE BABY!
(random hero from isometric.spaceninja.com)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-25 Thread Shane Hathaway
Paul Winkler wrote:
On Mon, Aug 25, 2003 at 02:57:50PM -0400, Shane Hathaway wrote:

2.6 is also intended to be compatible with Python 2.2.  It's just that 
here at ZC we haven't gone through the rigorous analysis we've been 
planning.  Hmm, come to think of it, the community could actually do 
that analysis.  It wouldn't be terribly difficult.  Any volunteers?


Maybe. I'd really like to run with python 2.2. What is involved in the
rigorous analysis? Is there a plan I could follow?
We need to follow the same process we used to move to Python 2.1.

http://dev.zope.org/DevHome/Wikis/DevSite/Projects/SupportPython21/FrontPage

This time it should be simpler, though.  We first need to produce a 
migration assessment, similar to this one:

http://dev.zope.org/Wikis/DevSite/Projects/SupportPython21/MigrationAssessment

Basically, for each significant change made to Python, we need to 
evaluate its impact on Zope.  Simply putting up a wiki page listing all 
changes between Python 2.1 and 2.2 would be a good start.  If you want 
to, you could even include changes up to 2.3, so we don't have to do 
this work again to support Python 2.3.

Hopefully, the evaluation will lead to no changes to Zope, but to have 
confidence in the evaluation, we need to be thorough and impartial.

Shane

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Very severe memory leak

2003-08-25 Thread Tim Peters
[Paul Winkler]
 Maybe. I'd really like to run with python 2.2. What is involved in
 the rigorous analysis? Is there a plan I could follow?

[Shane Hathaway]
 We need to follow the same process we used to move to Python 2.1.


http://dev.zope.org/DevHome/Wikis/DevSite/Projects/SupportPython21/FrontPage

 This time it should be simpler, though.  We first need to produce a
 migration assessment, similar to this one:


http://dev.zope.org/Wikis/DevSite/Projects/SupportPython21/MigrationAssessme
nt

 Basically, for each significant change made to Python, we need to
 evaluate its impact on Zope.  Simply putting up a wiki page listing
 all changes between Python 2.1 and 2.2 would be a good start.  If you
 want to, you could even include changes up to 2.3, so we don't have
 to do this work again to support Python 2.3.

I'd actually recommend skipping 2.2 and going straight to 2.3 -- 2.3 is
basically 2.2 + 16 months of bugfixes and speedups, a very solid release.
It will create more problems for Zope because 2.3 starts complaining about
more deprecated practices that 2.2.3 still lets slide, but they should be
minor problems at worst.  There are deeper benefits too.  For example, the
experimental ZODB3 3.3 alpha release requires Python 2.3; because of a
new-style class bug that still exists in Python 2.2.3, it's not at all
trivial to get ZODB 3.3 to work with Python 2.2.3.  I'd dearly love to skip
that pointlessly time-consuming exercise.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-25 Thread Shane Hathaway
Tim Peters wrote:
I'd actually recommend skipping 2.2 and going straight to 2.3 -- 2.3 is
basically 2.2 + 16 months of bugfixes and speedups, a very solid release.
It will create more problems for Zope because 2.3 starts complaining about
more deprecated practices that 2.2.3 still lets slide, but they should be
minor problems at worst.  There are deeper benefits too.  For example, the
experimental ZODB3 3.3 alpha release requires Python 2.3; because of a
new-style class bug that still exists in Python 2.2.3, it's not at all
trivial to get ZODB 3.3 to work with Python 2.2.3.  I'd dearly love to skip
that pointlessly time-consuming exercise.
+1

Shane

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-25 Thread Leonardo Rochael Almeida
On Mon, 2003-08-25 at 15:57, Shane Hathaway wrote:
 Leonardo Rochael Almeida wrote:
  On Sat, 2003-08-23 at 22:18, Shane Hathaway wrote:
 When you flush the cache, those DateTimes should disappear.  If they 
 don't, the leak is keeping them.
  
  They are disappearing. Too bad they return immediately after, as they're
  comming from a very heavy catalog query result.
 
 Hmm, that tends to indicate you really need that much memory.

If I just ask the client to put more memory in the machine, he'll
rightfully grill me because the site was working less-worse before the
upgrade. The fact remains that under 2.5.1 the leak would usually allow
the site to run for a number of hours before we needed to restart Zope. 

But maybe this means that the leak is not related to the DateTime
refcounts. It's just that the fast and continual increase in DateTime
objects is really bugging me. BTW, what is the usual DateTime refcount
in heavy ZCatalog sites you guys run?

Let me restate something important, because I forget about it myself
sometimes when I'm thinking about this problem: *upgrading to 2.6.1 made
the situation definetly worse*. The site became lot faster, but the
memory situation became intolerable, so I believe there is something
introduced between 2.5.1 and 2.6.1 that is causing our problems. Is
there a fix in 2.6.2 for some leak introduced *in* the 2.6 series?

Another thing, we have already estabilished that clearing the caches
resets the DateTime refcount, so the DateTime must be anchored to the
caches somehow. So, if the DateTime refcounts are constantly in the
50k-100k range, how come the cache object count for all threads is below
5k?

If they're indirectly attached, it means that a lot of DateTimes are
anchored in just a few objects, so no matter how much I reduce the
target cache size, it might never be enough to throw out the
DateTimes...

 [...]
 Maybe the individual ZODB objects are excessively large.  A few months 
 ago we found a bug in the new zope.org software that made it store file 
 uploads as a single chunk.  A lot of those files were over 1 MB in size. 
   This exhausted 1 GB of RAM pretty quickly.  Splitting the files into 
 chunks, like Zope usually does automatically, solved the problem.

Nice to know, but we don't store large files or other objects.

  it just plain monitor port?
 
 I rolled my own remote console.  The monitor port is too buggy and Boa 
 is overkill for this.

hmm, any chance of this remote console showing up somewhere? :-)

I might use it to navigate the thread object cache.

-- 
Ideas don't stay in some minds very long because they don't like
solitary confinement.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-24 Thread Jens Vagelpohl
Well, to at least reduce the immediate pressure, why don't you throw 
more RAM into that server? Memory is cheap.

jens

On Friday, Aug 22, 2003, at 17:38 US/Eastern, Leonardo Rochael Almeida 
wrote:

Hi,

For a long time now, one of our clients, running Zope 2.5.1, has been
experiencing memory leaks. For a time this has been relieved by
restarting Zope every day at 4am.
Lately this was not enough, however, as Zope started taking more and
more memory, to the point that it frequently required more than one
restart during the day.
The machine this is running on is a Xeon 500 MHz with 512 Mb of memory.
Zope is a very memory intensive application, but 512Mb used to be 
enough
for a medium sized portal.

This Zope site makes very heavy use of both ZClasses and ZCatalog, and
it appears to me that this worsening of perfomance and memory
consumption was being caused by the increase in content, which caused 
an
increase in the size of the ZCatalog and in the number of ZClass
instances. A total of 6796 ZClass instances representing News Items are
cataloged. There are other ZCatalogs and ZClass instances representing
calendar events and other stuff (this is a very old Zope portal coded 
by
hand in ZClasses, no CMF, no Plone, not even Pagina1, our ZClass based
CMS)

Fiddling with the cache parameters in the control panel showed that
while we could keep the memory consumption to a point where the daily
restart would be enough (say, with 10k objects per thread), we would 
get
constant thrashing of cache objects, specially DateTime objects due to
the ZCatalog queries, and the machine performance would be close to
intolerable, whereas if the cache parameters where set to allow a fast
performance (with 50k objects per thread) the machine would run out of
memory in 3 to 4 hours. Needless to say this was with heavy use of
RAMCacheManagers, not counting the accelerator proxy in front of it.
Without the RAMCaches, the machine would go down in under 5 minutes of
work hour load. Even with the caches on, the load would never go down
from 2.0 during work hours.

Last tuesday we decided no longer to wait for 2.6.2 and migrated the
site to 2.6.1. We dealt with the ObjectManager-based-ZClass issue,
reformed the ZCatalogs to replace the DateTime FieldIndexes with
DateTimeIndexes and then had a working testing environment, which we
stress tested lightly without detecting any problems and quickly moved
to production. This was late at night
The next morning we were surprised to notice the machine very quickly
ran out of memory. The memory leak was *far more severe* than before.
Zope needed a restart every 15m or so before it would send the machine
into heavy swaping.
In a very non-intuitive hunch I suggested we shut down all RAMCaches
and, amazingly enough, this made the situation a bit more manageable.
We're now restarting every 45 minutes. To our relief, disabling the
RAMCaches had only a barely noticeable effect on performance. The site
kept churning out pages really fast, a testatment to the optimization
job done in the 2.6 series. The load on the machine is rarely above 
0.8,
except when it goes into swap :-)

The number of DateTime refcounts in the Control_Panel, although much
smaller than in Zope 2.5.1 is very high and, mostly importantly,
constantly increasing, as far as I can tell. After 12 minutes of 
uptime,
the top refcounts are:

DateTime.DateTime.DateTime: 96168
BTrees._IOBTree.IOBucket: 43085
BTrees._IIBTree.IIBTree: 40400
BTrees._IIBTree.IIBucket: 23696
OFS.DTMLDocument.DTMLDocument: 23190
BTrees.OIBTree.OIBucket: 14582
BTrees._IIBTree.IISet: 12479
BTrees._IIBTree.IITreeSet: 10823
BTrees.OOBTree.OOBucket: 7088
OFS.Image.Image: 6860
OFS.DTMLMethod.DTMLMethod: 5894
DocumentTemplate.DT_Util.Eval: 3250
OFS.Image.File: 2796
BTrees._IOBTree.IOBTree: 2761
ZClasses.Method.MWp: 1592
In time, DateTime refcounts eventually dwarves the second place by an
order of magnitude. I think this is related to the fact that DateTime
instances are stored as metadata, even though the date indexes have 
been
converted to DateTime indexes. The question is, why aren't those
instances being released? What is holding on to them?

I tried installing the LeakFinder product but discovered it didn't work
before stumbling in a message in the archives that told me exactly that
:-) The RefCounts view in the LeakFinder object fails with the 
following
traceback:

Traceback (innermost last):
  [...]
  Module DocumentTemplate.DT_Util, line 201, in eval
   - __traceback_info__: REQUEST
  Module string, line 0, in ?
  Module Products.LeakFinder.LeakFinder, line 240, in manage_getSample
  Module Products.LeakFinder.LeakFinder, line 163, in 
getControlledRefcounts
  Module Products.LeakFinder.LeakFinder, line 188, in resetCache
TypeError: function takes at most 2 arguments (3 given)

The code in question is:

def resetCache(c, gc):
cache = c._cache
if gc:
cache.minimize(3)  # The minimum that actually 
performs gc.
 

Re: [Zope-dev] Very severe memory leak

2003-08-24 Thread Shane Hathaway
On 08/22/2003 05:38 PM, Leonardo Rochael Almeida wrote:
In time, DateTime refcounts eventually dwarves the second place by an
order of magnitude. I think this is related to the fact that DateTime
instances are stored as metadata, even though the date indexes have been
converted to DateTime indexes. The question is, why aren't those
instances being released? What is holding on to them?
When you flush the cache, those DateTimes should disappear.  If they 
don't, the leak is keeping them.

I tried installing the LeakFinder product but discovered it didn't work
before stumbling in a message in the archives that told me exactly that
:-) The RefCounts view in the LeakFinder object fails with the following
traceback:
LeakFinder is an early attempt to share some of the techniques I use for 
finding leaks.  Unfortunately, those techniques aren't very useful until 
you've already searched for weeks, which means LeakFinder isn't very 
good for emergencies.

Here are the things you should look at first:

1) The ZODB cache size.  The meaning of this number changed dramatically 
in 2.6.  Before 2.6 it was a very vague number.  In 2.6 it's a target 
number of objects that Zope actually tries to maintain.  Before 2.6 it 
might have made sense to set the ZODB cache size to some arbitrarily 
high number like 100,000; in 2.6 you want to start at about 2000 and 
adjust from there.  There are tools in 2.6 for helping you adjust the 
number.

2) The number of ImplicitAcquisitionWrappers present in the system.  I 
have found it to be a reliable indicator of whether you have a leak or 
not.  Expect this number to stay under 400 or so.  If it grows 
gradually, there's a leak.  Watch the refcounts screen.

3) Is Python compiled with cyclic garbage collection enabled?  2.4 and 
above absolutely require cyclic garbage collection.

4) Don't use 2.6.1.  Use 2.6.2, which has fixes for known leaks.  It is 
actually already tagged in CVS as Zope-2_6_2, and it's what zope.org 
is now running.  Various unrelated things prevented a formal release 
this week.

If all else fails, grep all Python modules for sys._getframe() and 
sys.exc_info().  These are the primary causes of memory leaks in 
Python 2.1 and below.  If you're brave, you can just run Zope under 
Python 2.2, which fixes those particular leaks AFAIK.

Finally, there's always hope. :-)  The latest thing I've been doing is 
running Zope in a debug build of Python.  A debug build makes a magical 
sys._getobjects() available, allowing you to inspect all live objects 
through a remote console.  Since debug builds aren't much slower than 
standard builds, you can even run a debug build in production for short 
periods of time.  I've been building a small library of functions for 
working in this mode, and if you need them, I'll pass them along.  I'd 
have to warn you that they are anything but intuitive in their purpose 
and use, though.

Shane

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-24 Thread Dieter Maurer
Leonardo Rochael Almeida wrote at 2003-8-22 18:38 -0300:
  ...
  I tried installing the LeakFinder product but discovered it didn't work
  before stumbling in a message in the archives that told me exactly that
  :-) The RefCounts view in the LeakFinder object fails with the following
  traceback:
  
  Traceback (innermost last):
[...]
Module DocumentTemplate.DT_Util, line 201, in eval
 - __traceback_info__: REQUEST
Module string, line 0, in ?
Module Products.LeakFinder.LeakFinder, line 240, in manage_getSample
Module Products.LeakFinder.LeakFinder, line 163, in getControlledRefcounts
Module Products.LeakFinder.LeakFinder, line 188, in resetCache
  TypeError: function takes at most 2 arguments (3 given)
  ...
  c._cache = PickleCache(c, cache.cache_size, cache.cache_age)

PickleCache now has dropped the cache_age parameter.
Try: PickleCache(c,cache.cache_size)


Dieter

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-24 Thread kosh
On Friday 22 August 2003 03:38 pm, Leonardo Rochael Almeida wrote:
 Hi,

 In trying to narrow down the causes of the leak, we detected what looks
 like a minor one. If you insert a very simple dtml method in Zope root
 and bang on it at a rate of, say, 100 requests per second, Zope will
 increase it's memory footprint very slowly, say, a couple hundred Ks
 every 10 minutes (both RSS and virtual size). I figure this has
 something to do with memory fragmentation or some other low level stuff.
 But this is definetly not our main concern, which is to find out how is
 it that Zope is leaking DateTimes so heavily in our site, and if it is
 really all those DateTimes that are hogging our memory. We need help,
 and we need it desperately. If anyone wants any other information we'd
 be happy to provide.

Okay here are a few things to try based on this information. 

1) Take the zclasses and burn them and do it as a python product instead. From 
items I have seen/done in the past this makes a fairly large difference for 
speed and a little in memory. However with it being faster speed you can have 
smaller caches.

2) Convert dtml objects to python scripts where feasible. Python scripts often 
run 10-40 times faster then dtml does and they seem to have a much better 
memory footprint. With them running a lot faster you can also cut down on 
your caching. You can write write scripts that are fairly complex that can 
generate ~ 200 req/s but I have not gotten even close to that with dtml.

3)If you still have problems use psyco on your python product that you use to 
replace the zclasses and bind it to the methods that make the most speed 
difference. Caching is good but having it run fast enough to not need caching 
is often better.

I have a fair number of sites now that range in size from 100M - 1GB that get 
a fair bit of usage with tens of thousands of objects and they have a memory 
usage of about 20-45M of ram under zope 2.6.1. They used to have a memory 
usage in the 80-400M range but that was before I fixed a number of things.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Very severe memory leak

2003-08-24 Thread Chris McDonough
Leonardo,

If you're using Linux, you might be able to take some of the pressure
off by using AutoLance to automatically restart the server when memory
grows above a certain threshhold.

http://zope.org/Members/mcdonc/Products/AutoLance/view


On Sat, 2003-08-23 at 15:17, Jens Vagelpohl wrote:
 Well, to at least reduce the immediate pressure, why don't you throw 
 more RAM into that server? Memory is cheap.
 
 jens
 



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )