We have an open source Web/XML search engine called "SDX" (http://sdx.culture.fr) that aims at making XML structures and contents searchable, including fulltext, using a "classic" search engine, Lucene on top of Cocoon 2. Works pretty well.
Documentation is in french, but source code comments and API documentation in English. Works in multilingual environments pretty well too. See the Savannah project for more information http://savannah.gnu.org/projects/sdx/. The best si to get latest CVS, with thesaurus-based searching among other new functionalities. Version 2.1RC will be released next week or so. Martin Sévigny