hi.

I'm been experimenting a bit with nutch internally, and it looks great.

two of the things that people have asked me about is:

getting some kind of 'last modified' field on the page being crawled, so we could ask nutch .. search only documents/pages modified in the last year. Is this currently possible, if not (presuming the web-server returns a last-modified date, or we can get the info out of the word properties part ) how hard would it be to implement it?

the second is to have a web-page to inject a new url into the search,
now I understand that writing the page would be trivial, but I'm not sure if you can be running a fetch or something while injecting a new url. (does that make sense?)


also.. the clustering option is really nice, IMHO it should be on by default.


Regards Ian



-------------------------------------------------------
This Newsletter Sponsored by: Macrovision For reliable Linux application installations, use the industry's leading
setup authoring tool, InstallShield X. Learn more and evaluate today. http://clk.atdmt.com/MSI/go/ins0030000001msi/direct/01/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to