On 9/15/06, Richard Lynch <[EMAIL PROTECTED]> wrote:
> On Fri, September 15, 2006 10:42 am, Matthew H. North wrote:
> > We're developing a web application that involves traversal of a
> > hierarchical database structure (MySQL, PEAR::DB, and
> > PEAR::DB::DataObject).  Currently that traversal is done recursively,
> > and involves visiting thousands of nodes in the tree.  However, the
> > tree is relatively flat, and the recursion never gets more than 4 or 5
> > calls deep.  A severely truncated but illustrative version of the code
> > of interest is:
>
> So you are just visiting the nodes, and not doing anything with them?

We're appending certain fields to put together a total result.  However, as 
mentioned, the amount of data collected is not anywhere near even 1MB, and in 
any event, all references to the result variable are unset (AFAIK) when I 
come up w/ the final 5.5MB number.

>
> It's entirely possible that PEAR::DB and/or DataObject are trying to
> cache something to "help" you...
>
> You should be able to quickly hack a *BAD* page of code with minimal
> error checking to do whatever queries PEAR::DB is doing for you.

Yeah -- I was hoping to expand my understanding of PHP internals so I could 
avoid doing this, but based on more of your comments below I think I'm out of 
luck.

I did note that DataObject was keeping a running cache of result sets (even 
ones that I was done with for some reason), so I added a destructor on my 
DataObject extending classes that cleans those up.  That helps keep the 
running peak mem down, but is not part of the 5.5MB that I can't get rid of.

>
> > trigger_error(memory_get_usage());
> > $result = traverse_hierarchy();
>
> At crucial points within the hierarchy, perhaps at nodes/leaves you
> expect to be at specific milestones (halfway, 25%, 75%, ...) start
> adding code that does crude things like:
>
> if ($node->name == 'This one node we think is halfway through')
> trigger_error(memory_get_usage());
>
> Log the numbers into a db with the node names and then later graph it
> to see if the memory is getting chewed up in a straight line or if it
> jumps at some point.
>
> If there's a big jump somewhere, you know where to look.
>
> If it's a straight line, then you can start doing the same thing line
> by line to find where the RAM is going.

Great idea -- I just completed a degree in computational physics, which 
included courses that involved dumping loads of data and graphing them using 
tools like gnuplot and OpenDx... you'd think I would have thought of this one 
myself (rolling my eyes).

I _had_ thought of dumping the state at various, noted points throughout the 
process, but was hoping to avoid doing this kind of lengthy analysis.

>
> Did you close down the DB connection and kill the PEAR objects?...
>
> PHP's garbage collection has had... issues... in the past.

All of my classes that extend DataObject inherit a common destructor that 
calls the DataObject::free() method, which, supposedly, frees result 
resources.  I'm not sure what you mean by 'kill the PEAR objects', but all 
references go out of scope or are unset.

>
> > The question is this: Given the following assumptions:
> >
> > 1) PHP's memory manager reclaims memory when all references to that
> > memory are
> > gone.
>
> Well, it tries to anyway...
>
> It's not always that simple, particular with variable variables and
> other dynamic features.
>
> > 2) A reference is 'gone' when it goes out of scope or is 'unset'.
>
> Scope seems like it should be simple, but it's not.
>
> Use unset to be certain.
>
> > 3) The only references that remain in the global context are
> > references to globals (all non-global variables have gone out of scope
> > and that memory reclaimed)
>
> See #1.
>
> PHP "scope" is not as clean-cut as C.
>
> A simple "for" loop in PHP leaves the iterator variable, last I checked.
>
> Inside a function, that should go out of scope.  Outside a function,
> it stays around.  foreach, I think, correctly un-scopes the vars.

This collection of statements is very illuminating.  Part of my goal in 
posting this question was to find out more about how PHP internals work, esp. 
wrt GC.  Sounds like there really isn't any hope of getting a solid set of 
rules that I can follow, and I have to allow for a little slop.

Good -- that just means I can stop agonizing over this issue and 'deal with 
it'.

>
> > 5) By doing unset($GLOBALS[$varname]) and unset($$varname), where
> > $varname
> > is
> > each key of the $GLOBALS array, I am effectively eliminating all
> > remaining
> > references, and all allocated memory should be reclaimed by the memory
> > manager (except perhaps for memory associated with function and class
> > definitions).
>
> No.
>
> Dangling pointers and references not correctly cleaned up from a
> function are left out in limbo.

This I find VERY odd.  So if I don't unset all references in a function before 
it exits I lose that memory?

>
> > 6) Resources (think database resources) are automatically freed by
> > garbage collection when there are no more references to them
>
> Probably, eventually, if PHP's GC kicks in when you think it does.
>
> To be certain, close the DB references when you are done with them.
>
> > 7) No additional code is being evaluated within traverse_hierarchy
> > 8) I'm correct that there aren't any circular references in my code
> > nor in any PEAR module code
>
> Circular references are not a problem, really.
>
> It's the ones that get chopped off from any connection to anything you
> can get ahold of and start releasing that matter.
>
> And if you get a whole big chain of them, with no root to tie onto to
> start releasing...

Yes, and this is exactly what I'm worried about.  I didn't develop the code 
with this in mind, so I'm worried I may have unwittingly done exactly this.  
For example, the traversal code keeps a doubly-linked list of nodes 
representing the current branch of the tree.  If I was losing the head of 
that list I'd be causing this problem, and doing it perhaps thousands of 
times.

However, as I mentioned, I've reviewed my code and DataObject and have found 
no 'lost' circular references, so I don't think that's the problem (and thus 
assumption #8).

>
> > Are there any other ways that user code can result in this apparent
> > memory leak situation?  If so, what are they?
>
> PHP Extensions can have a memory leak.  Ain't much PHP can do about
> that, really.

Good point.

>
> > Or, are any of my first 6 assumptions incorrect?
>
> They're a little too optimistic. :-)

Perfect -- this is exactly what I was trying to get at.  I wasn't even sure 
whether I should be totally confident that my assumptions about PHP internals 
were that solid, and therefore, whether I should _really_ be banging my head 
too hard on this.

Thanks for your detailed response -- you answered my questions and then some.

Is there a resource out there that explains PHP internals in detail?

- Matt

-- 
Matthew H. North
mailto:[EMAIL PROTECTED]

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to