Tesfaye, > I am able to configure Nutch and use it on my PC. > I am working a thesis on a local search engine. > I hope in the way I understood Nutch, it is automatically indexing the > documents it has crawled. > I want to do some preprocessing on the documents cralwed before they get > indexed. Can you help me > on how to go about?
You can probably write plugins that can help you achieve this. Please take a look at http://wiki.apache.org/nutch/PluginCentral. For e.g. if you want to parse special tags in documents you've crawled and want to index on them, it is possible to do so using something like is documented here: http://wiki.apache.org/nutch/HowToMakeCustomSearch Can you see if this gives you an idea to get forward ? Thanks Hemanth

