On Sat, Jun 18, 2005 at 07:08:15PM +0400, Laura wrote:
> 
> I need to parse sites. So... as we know it`s better to 
> connect different servers in one time than the same in the 
> queue.
> 
> I have 3 packages:
> package 1) gets each site main url from a file and send id 
> to package 2
> package 2) get main url then send it to package 3 to 
> parse, then get new urls (in postback) from this site and 
> send it again to parse until all the urls will be parsed
> package 3) get url and parse  it, postback to package 2
> 
> So, start...
> parsing site 1
> parsing site 2
> parsing site 3
> 
> thats ok 3 diferent servers, but then...
> 
> parsing site`s 1 url 2
> parsing site`s 1 url 3
> parsing site`s 1 url 4
> parsing site`s 1 url 5
> parsing site`s 1 url 6

Each site you parse dumps all its URLs into POE's event queue at once.
Since POE's queue is FIFO, the first URL in is the first one out... so
site 1's URLs 2..6 are together in the queue.

There are a couple ways to work around this:

1. Don't put all the URLs into POE's queue at once.  Instead keep them
somewhere else, and pull a new one from that list as old ones are
finished.

2. Use delay_add() rather than yield() or post() to put the events in
the queue with a small random delay.  This doesn't produce optimal
friendliness, but it does allow URLs from multiple sites to mix.

  $kernel->delay_add( fetch_url => rand(), $url );

-- 
Rocco Caputo - http://poe.perl.org/

Reply via email to