Re: [PHP-DB] Re: Real Killer App!

2003-03-12 Thread Nicholas Fitzgerald
Well, I'm not locking them out exactly, but for good reason. When a url 
is first submitted it goes into the database with a checksum value of 0 
and a date of -00-00.  If the checksum is 0 the spider will process 
that url and update the record with the proper info. If the checksum is 
not 0, then it checks the date. If the date is passed the date for 
reindexing then it goes ahead and updates the record, it also checks 
against the checksum to see if the url has changed, in which case it 
updates.

It does look like it's going into an endless loop, but the strange thing 
is that it goes through the loop successfully a couple of times first. 
That's what's got me confused.

Nick

Nelson Goforth wrote:

Do you lock out the URLs that have already been indexed?  I'm 
wondering if your system is going into an endless loop?






--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


RE: [PHP-DB] Re: Real Killer App!

2003-03-12 Thread Matthew Moldvan
Even if the system is working correctly the first couple times, it may go
into an endless loop if you do not specify the right conditions, for any
programming application ...

I am very curious about this project ... is it open source?  If so, I'd be
interested in taking a look at how you implemented it.

Thanks,
Matthew Moldvan.

System Administrator,
Trilogy International, Inc.
http://www.trilogyintl.com/

-Original Message-
From: Nicholas Fitzgerald [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 12, 2003 7:58 AM
To: [EMAIL PROTECTED]
Subject: Re: [PHP-DB] Re: Real Killer App!


Well, I'm not locking them out exactly, but for good reason. When a url 
is first submitted it goes into the database with a checksum value of 0 
and a date of -00-00.  If the checksum is 0 the spider will process 
that url and update the record with the proper info. If the checksum is 
not 0, then it checks the date. If the date is passed the date for 
reindexing then it goes ahead and updates the record, it also checks 
against the checksum to see if the url has changed, in which case it 
updates.

It does look like it's going into an endless loop, but the strange thing 
is that it goes through the loop successfully a couple of times first. 
That's what's got me confused.

Nick

Nelson Goforth wrote:

 Do you lock out the URLs that have already been indexed?  I'm 
 wondering if your system is going into an endless loop?






-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] Re: Real Killer App!

2003-03-12 Thread Nicholas Fitzgerald
I think at this point, as I've been working on it the past few hours, I 
can safely say that it is NOT going into an endless loop. It's just 
dying at a certain point. Interestingly, it always seems to dies at the 
same point. When indexing a particular site, I noticed that it was dying 
on a certain url. I also noticed that it was trying to index some .jpg 
and .gif files. I put in a filter so it wouldn't do anything with those 
binary files, and then re ran the app. Now, with a lot less files to go 
through and hence a lot less work to do, it still died on the same url 
as before. I then cleared the database and put in that specific URL and 
started there. It indexed that fine and did what it was supposed to do. 
No matter what though, if I start at the root of the site, or anywhere 
else, it dies right there. I've also noticed that when it gets to the 
point where it's dying the hard drive does a massive write. 
Interestingly, this happens on the boot drive, and not the raid array 
where the database, php, and the webserver live. I've got a fairly 
sizable swapfile on both drives, and 1.5 gig of memory. I can't imagine 
it's a memory problem, but you never know.

I believe I have the right conditions specified, but I do plan to go and 
review all of that both in the app and in the server environment. As for 
the status of the code, I'm not sure yet. I need to make something from 
this, and I haven't quite figured out yet how people make money from 
open source without charging a fortune for support. I'd rather charge 
less up front and support it for free, but we'll see what happens.

Nick

Matthew Moldvan wrote:

Even if the system is working correctly the first couple times, it may go
into an endless loop if you do not specify the right conditions, for any
programming application ...
I am very curious about this project ... is it open source?  If so, I'd be
interested in taking a look at how you implemented it.
Thanks,
Matthew Moldvan.
System Administrator,
Trilogy International, Inc.
http://www.trilogyintl.com/
-Original Message-
From: Nicholas Fitzgerald [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 12, 2003 7:58 AM
To: [EMAIL PROTECTED]
Subject: Re: [PHP-DB] Re: Real Killer App!
Well, I'm not locking them out exactly, but for good reason. When a url 
is first submitted it goes into the database with a checksum value of 0 
and a date of -00-00.  If the checksum is 0 the spider will process 
that url and update the record with the proper info. If the checksum is 
not 0, then it checks the date. If the date is passed the date for 
reindexing then it goes ahead and updates the record, it also checks 
against the checksum to see if the url has changed, in which case it 
updates.

It does look like it's going into an endless loop, but the strange thing 
is that it goes through the loop successfully a couple of times first. 
That's what's got me confused.

Nick

Nelson Goforth wrote:

 

Do you lock out the URLs that have already been indexed?  I'm 
wondering if your system is going into an endless loop?