Re: index library

Brian Yennie Sun, 18 Apr 2004 22:24:07 -0700

If you're looking for inspiring documentation, I would recommend checking out Apple's old AIAT SDK. It's the basis for MacOS Find-By-Content, but the really cool thing is that the documentation spells out very nicely how vector and inverted indices work.

Or... try googling "inverted vector index".

In the past I've hooked several engines up with Rev (including AIAT) but they all required externals and/or separate apps running (there's a Java-based spinoff of AIAT called "Lucene" from the Apache project which is interesting but you'd have to write a java app and talk back and forth most likely).

If you could implement the basic inverted vector index algorithms and figure out an efficient way to store the indices on disk, it could become a pretty decent engine in Transcript, even if it might not be suitable for indexing your hard drive or spidering the web...

For more fun reading, there's stemming (which is pretty crude and easy), thesauri (which you have to be very careful with or you just increase noise), stopword removal (i.e. cutting out the "and" and "the" words), and relevancy ranking. All of this is covered in the aforementioned AIAT SDK.

Pretty interesting stuff, keep me posted if you take a crack at it- I can't really co-conspire at the moment but I'd be happy to chime in where I'm helpful.

HTH,
Brian

hypertexting of words in a large text corpus. I can find several such libraries on web, but in languages that dont port well to transcript (ie, needing pointers and multidim arrays. sigh). I would gladly work with anybody wanting to do one.


_______________________________________________
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: index library

Reply via email to