27 jan 2009 kl. 17.23 skrev Neal Richter:


Is it really neccessary to use Solr for it? Things going much faster with Lucene low-level api and much faster if you're loading the classification
corpus into the RAM.

Good points.  At the moment I'd rather have a daemon with a service
API.. as well as the filtering/tokenization capabilities Solr has
built in.  Probably will attempt to get the corpus' index in memory
via large memory allocation.

If it doesn't scale then I'll either go to Lucene api or implement a
custom inverted index via memcached.

Other note /at the moment/ is that it's not going to be a deeply
hierarchical taxonomy, much less a full indexing of an RDF/OWL
schema.. there are some gotchas for that.

If your corpus is small enought you may want to take a look at lucene/ contrib/instantiated. It was made just for these sort of things.


    karl


Reply via email to