On 30/04/11 21:38, MZMcBride wrote:
> Where's the best documentation for the search setup? And are there any pages

If you by setup you mean the setup WMF is using then [1]. If you by 
setup you mean how we use Lucene (with some historical context) then [2] 
and [3] are a good starting point. Apart from that, it's reading the 
comments in the code.

> with a roadmap for future development?

The roadmap is pretty much solving the bugs reported in bugzilla for the 
lucene-search extension. There is quite a few of them, but most of them 
are of technical nature.

Any further improvements in the *quality* of search results would 
require employing someone who specialises in natural language 
processing/data mining/search to improve on the existing algorithms. The 
algorithms we currently use are pretty much the-state-of-the-art in the 
opensource world, and I would consider any further improvement as proper 
scientific research.

> I'm particularly curious if the Java component can't be killed.

I would doubt it. It isn't the case that we simply use Lucene 
out-of-the-box and that we could switch to another port. In fact, the 
backend search extension (lucene-search) is pretty big with some 50k 
lines of code. It implements a couple of algorithms I put together to 
work with the way how information is structured on Wikipedia, in 
languages I speak.

r.

[1] http://wikitech.wikimedia.org/view/Search
[2] http://www.mediawiki.org/wiki/User:Rainman
[3] http://www.mediawiki.org/wiki/User:Rainman/search_internals


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to