Hello everybody,

As part of a project, I am working on a FOSS tool that will build language
models using data obtained from the web which will then be used for speech
recognition. I plan to make this tool quite compact by encapsulating as
much as I can in a single Java application and not requiring the user to
install/configure tons of stuff.

I have managed to set up Nutch and am able to crawl a website inside a Java
application. The next thing I need to do is to search for certain keywords
in the obtained data. I have read that the ability to build Lucene indexes
has been removed from Nutch and we now need to use Solr instead. The way
Solr works (servlets, HTTP) is not really appropriate for a tool that only
needs search functionality that is invisible to the user.

What would you recommend me to do in this case? Is there absolutely no way
of building Lucene indexes? I could not find anything other than
recommendations to use Solr instead. Should I try to use an older version
of Nutch?

Thanks in advance,

Emre

Reply via email to