Hello everybody, As part of a project, I am working on a FOSS tool that will build language models using data obtained from the web which will then be used for speech recognition. I plan to make this tool quite compact by encapsulating as much as I can in a single Java application and not requiring the user to install/configure tons of stuff.
I have managed to set up Nutch and am able to crawl a website inside a Java application. The next thing I need to do is to search for certain keywords in the obtained data. I have read that the ability to build Lucene indexes has been removed from Nutch and we now need to use Solr instead. The way Solr works (servlets, HTTP) is not really appropriate for a tool that only needs search functionality that is invisible to the user. What would you recommend me to do in this case? Is there absolutely no way of building Lucene indexes? I could not find anything other than recommendations to use Solr instead. Should I try to use an older version of Nutch? Thanks in advance, Emre

