Would this be usefull to anyone.  I am new to nutch but from what I've
seen so far it would make my life easier.

I was thinking about having a small web interface that would run the
crawler.  This would be more for intranet crawling rather then whole
web.  It would basicly have a database of what URL's you wanted to
crawl along with properties associated with that site (Regex, depth,
delays etc).   It would then monitor the log file and keep track of
number of sites crawler, # errors, # success and what not and display
it in a nice layout/status page.

I find that I'll be adding and deleting regex from the urlfilter files
all the time based on the sites i'm crawling.  It would be nice to
have all that organized in a database so if say I ever want to recrawl
a site, I just hit 2 buttons and all the regex load up, the right
starting URL and it just starts to crawl.

As I am new maybe this wouldn't be usefull and there is already
processes that take care of all this?  Or maybe i'm just not using
nutch correctly?  If this would help anyone I am thinking about
writing it and i'll make it availble to whoever wants it.

Reply via email to