Has anyone built any spidering or indexing tools using wiTango?

The typical spider would:

Use <@URL> with a starting address
Parse to pull out desired tags <title>, <hn>, <meta>, <a> etc. 
Tokenizing on the space and other characters to build an array
Counting occurences of words.
Store in db, in page, tag and word tables
Page table has URL, ID
Word table has tuples of Page.ID, word, count
Tag table has Page.ID, tag (or TagID), tag content.
Loop through the <a> tags to a given depth, checking the page table to 
see if we've been there already.


With so many useful indexing tools out there, why build another?  I don't 
want to, which is why I would like to stand on the shoulders of one of 
the giants on this list.  But there are two areas where I think this 
could be useful:

1.  Link checking for dynamic sites (to make sure my applications work -- 
no mispelled parameters for example).  I asked about this last week, but 
didn't get any responses.
2.  It would be nice to have all my dynamic content handled by a single 
execution engine, including search results that return dynamically 
generated pages.   

So anyone have anything they want to contribute?

Bill Conlon

To the Point
345 California Avenue Suite 2
Palo Alto, CA 94306

office: 650.327.2175
fax:    650.329.8335
mobile: 650.906.9929
e-mail: mailto:[EMAIL PROTECTED]
web:    http://www.tothept.com


________________________________________________________________________
TO UNSUBSCRIBE: send a plain text/US ASCII email to [EMAIL PROTECTED]
                with unsubscribe witango-talk in the message body

Reply via email to