Re: Difference between Deiselpoint and Nutch?

David M. Cole Fri, 18 Sep 2009 10:16:48 -0700

At 12:46 PM -0400 9/18/09, Paul Tomblin wrote:

Nutch is, I think, doing the right thing by not
crawling it, but I can't convince her of this because she's convinced that
DP is commercial and Nutch is "only" Open Source, so obviously DP is right.

Just the opposite ... the commercial product is doing it *wrong* (notrespecting robots.txt) while the open source product is doing it*right* (respecting the file).

The client is ornery and is doing something patently against thewishes (expressed in the robots.txt file) of the owner(s) of thecontent (unless she has permission, in which case get the owner[s] ofthe content to include your Nutch agent name in their robots.txtfile[s]).

I know how far and few between paying clients are these days, butpersonally -- under the circumstances you've described -- I thinkI'd walk away from this project.


\dmc

PS: The robots.txt file shouldn't have any mention of a sitemap,except possibly to include the URL.


--
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+
   David M. Cole                                            d...@colegroup.com
   Editor & Publisher, NewsInc. <http://newsinc.net>        V: (650) 557-2993
   Consultant: The Cole Group <http://colegroup.com/>       F: (650) 475-8479
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+

Re: Difference between Deiselpoint and Nutch?

Reply via email to