Hello,

I have a POE-based HTTP spider, which is a messy mash-up of the POE
sessions API and some help from PoCo::Client::UserAgent.

As a resource is downloaded, I use HTML::LinkExtractor to discover links
in the resource to follow, and I put those links in a DB.

As a session finishes, it takes a fresh link from the DB and creates a
new sesssion to download that.

To try to keep the resource usage sane, the app sets a limit on the
maximum number of sessions allowed, and I choose values between 3 and 64.

The app behaves as I expect for tightly constrained crawls, but for
larger crawls it seems that the _stop handler for my sessions don't get
crawled until the entire crawl is complete.

Reading the docs, I see that the child sessions will keep the redundant
parents alive. Is there a way to create a new session "disinherited",
because its parent doesn't care about it? Or do I have to recode with a
session dedicated to creating new child sessions as older children die?
How will it know when a child dies?

I'm cruising around in the POE Cookbook right now, but if anyone can
point me at the "right" recipie straight away, then that would be really
helpful!
Steven

Reply via email to