Another option would be a Tika handler that converted the MathML to a Solr 
document.

On Apr 15, 2010, at 2:38 AM, Ryan McKinley wrote:

> (perhaps more appropriate on solr-user@)
> 
> It sounds like you want to make a MathML filter?  Check out the
> analyzer packages...
> 
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> 
> simple example:
> https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java
> 
> ryan
> 
> 
> 2010/4/14  <m...@gjgt.sk>:
>> Hello everybody,
>> 
>> I'm new to all this so I hope this isn't too noob a question and that it
>> isn't very inappropriate here.
>> 
>> I'm currently working on a indexing/searching application based on Apache
>> Lucene core, that can process mathematical formulae in MathML format
>> (which is extension to XML) and store it in the index for searching. No
>> troubles here, since I'm making everything above Lucene.
>> 
>> But I started to think it would be nice to write this mathematical
>> extension so it could be incorporated into Solr as easy as possible in the
>> future. The thing is I looked into Solr's sources and I'm all confused to
>> be honest and don't know which way to do this.
>> 
>> Basic workflow of the whole math processing would be:
>> Check the input document for any math->if found, mathematical unit needs
>> to process it and produce many string-represented formulae with different
>> boosts->put these into index not tokenized furthermore.
>> 
>> That's about it.
>> Any ideas? Any help will be appreciated.
>> 
>> Thank you
>> 
>> Martin
>> 
>> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Reply via email to