I think at this point, as I've been working on it the past few hours, I
can safely say that it is NOT going into an endless loop. It's just
dying at a certain point. Interestingly, it always seems to dies at the
same point. When indexing a particular site, I noticed that it was dying
on a certain url. I also noticed that it was trying to index some .jpg
and .gif files. I put in a filter so it wouldn't do anything with those
binary files, and then re ran the app. Now, with a lot less files to go
through and hence a lot less work to do, it still died on the same url
as before. I then cleared the database and put in that specific URL and
started there. It indexed that fine and did what it was supposed to do.
No matter what though, if I start at the root of the site, or anywhere
else, it dies right there. I've also noticed that when it gets to the
point where it's dying the hard drive does a massive write.
Interestingly, this happens on the boot drive, and not the raid array
where the database, php, and the webserver live. I've got a fairly
sizable swapfile on both drives, and 1.5 gig of memory. I can't imagine
it's a memory problem, but you never know.
I believe I have the right conditions specified, but I do plan to go and
review all of that both in the app and in the server environment. As for
the status of the code, I'm not sure yet. I need to make something from
this, and I haven't quite figured out yet how people make money from
open source without charging a fortune for support. I'd rather charge
less up front and support it for free, but we'll see what happens.
Nick
Matthew Moldvan wrote:
Even if the system is working correctly the first couple times, it may go
into an endless loop if you do not specify the right conditions, for any
programming application ...
I am very curious about this project ... is it open source? If so, I'd be
interested in taking a look at how you implemented it.
Thanks,
Matthew Moldvan.
System Administrator,
Trilogy International, Inc.
http://www.trilogyintl.com/
-Original Message-
From: Nicholas Fitzgerald [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 12, 2003 7:58 AM
To: [EMAIL PROTECTED]
Subject: Re: [PHP-DB] Re: Real Killer App!
Well, I'm not locking them out exactly, but for good reason. When a url
is first submitted it goes into the database with a checksum value of 0
and a date of -00-00. If the checksum is 0 the spider will process
that url and update the record with the proper info. If the checksum is
not 0, then it checks the date. If the date is passed the date for
reindexing then it goes ahead and updates the record, it also checks
against the checksum to see if the url has changed, in which case it
updates.
It does look like it's going into an endless loop, but the strange thing
is that it goes through the loop successfully a couple of times first.
That's what's got me confused.
Nick
Nelson Goforth wrote:
Do you lock out the URLs that have already been indexed? I'm
wondering if your system is going into an endless loop?