Another option would be a Tika handler that converted the MathML to a Solr document.
On Apr 15, 2010, at 2:38 AM, Ryan McKinley wrote: > (perhaps more appropriate on solr-user@) > > It sounds like you want to make a MathML filter? Check out the > analyzer packages... > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > > simple example: > https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java > > ryan > > > 2010/4/14 <m...@gjgt.sk>: >> Hello everybody, >> >> I'm new to all this so I hope this isn't too noob a question and that it >> isn't very inappropriate here. >> >> I'm currently working on a indexing/searching application based on Apache >> Lucene core, that can process mathematical formulae in MathML format >> (which is extension to XML) and store it in the index for searching. No >> troubles here, since I'm making everything above Lucene. >> >> But I started to think it would be nice to write this mathematical >> extension so it could be incorporated into Solr as easy as possible in the >> future. The thing is I looked into Solr's sources and I'm all confused to >> be honest and don't know which way to do this. >> >> Basic workflow of the whole math processing would be: >> Check the input document for any math->if found, mathematical unit needs >> to process it and produce many string-represented formulae with different >> boosts->put these into index not tokenized furthermore. >> >> That's about it. >> Any ideas? Any help will be appreciated. >> >> Thank you >> >> Martin >> >> -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search