Has anyone built any spidering or indexing tools using wiTango? The typical spider would:
Use <@URL> with a starting address Parse to pull out desired tags <title>, <hn>, <meta>, <a> etc. Tokenizing on the space and other characters to build an array Counting occurences of words. Store in db, in page, tag and word tables Page table has URL, ID Word table has tuples of Page.ID, word, count Tag table has Page.ID, tag (or TagID), tag content. Loop through the <a> tags to a given depth, checking the page table to see if we've been there already. With so many useful indexing tools out there, why build another? I don't want to, which is why I would like to stand on the shoulders of one of the giants on this list. But there are two areas where I think this could be useful: 1. Link checking for dynamic sites (to make sure my applications work -- no mispelled parameters for example). I asked about this last week, but didn't get any responses. 2. It would be nice to have all my dynamic content handled by a single execution engine, including search results that return dynamically generated pages. So anyone have anything they want to contribute? Bill Conlon To the Point 345 California Avenue Suite 2 Palo Alto, CA 94306 office: 650.327.2175 fax: 650.329.8335 mobile: 650.906.9929 e-mail: mailto:[EMAIL PROTECTED] web: http://www.tothept.com ________________________________________________________________________ TO UNSUBSCRIBE: send a plain text/US ASCII email to [EMAIL PROTECTED] with unsubscribe witango-talk in the message body
