Although a couple of people mentioned that you can do this
since Nutch is open source, I'd like to play devil's advocate
and say that it is difficult to do #3.
Although you can make little tweaks pretty easily like
boosting words in the title or URL, changing the main
crawling algorithm and/or searching algorithm requires
lots of changes to core code. If you change it, it will
be difficult to merge future changes into your code.
You can definitely do it though. You should just know
what you're getting into.
Howie
Dear nutchers
This is my first time that i ask a question to nutch users.
I am a researcher working on web retreval and i am asking if i can use
nutch for the following:
1- Can i make nutch begin from a seed urls brought through the Google
API ?
2- Can i see the algorithms that make crawling and compare queries to
search results?
3- Can i modify these algorithms and replace them with my own
algorithms?
---------------------------------
Blab-away for as little as 1ยข/min. Make PC-to-Phone Calls using Yahoo!
Messenger with Voice.