+1

-----Original Message-----
From: Franz Werfel [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 07, 2006 10:11 AM
To: nutch-user@lucene.apache.org
Subject: Re: project vitality? / less documentation is more!


Hello,

"single site crawling" wouldn't address the confusion that results from
the fact that the 'crawl' command is actually the concatenation of
several commands; and it would not be true either, since you can do
"several sites crawling" with 'crawl'.

But I have to agree that it helps "getting up and running quickly";
however my point is that, after this first phase, it is _more_ difficult
to go to the next phase than if one hadn't used this first step first...

Maybe at the end of the tutorial for "Intranet crawling" the following
sentence could be added: "If you want to crawl the same site _again_,
use the whole-web tutorial below, and NOT the crawl command."

Also, the sentence "Whole-web crawling is designed to handle very large
crawls which may take weeks to complete, running on multiple machines"
is misleading, since one has to use whole-web crawling to fine-tune or
recrawl even the smallest of websites.

The distinction is not only on the scale of the project, but on the
level of control one wants (IMHO). The documentation should at least
give hints in that direction.

Thanks, Frank.



On 3/7/06, Vanderdray, Jacob <[EMAIL PROTECTED]> wrote:
> -1
>
>        I found the instructions for doing an "Intranet crawl" 
> extremely helpful for getting up and running quickly.  I went back 
> later and figured out more about what it was actually doing.  Perhaps 
> the name could just be changed to "Single Site Crawling with the Nutch

> Shell Script" and some explanatory text could be added.
>
>        I'll try to take the time today to put a version of the 
> tutorial on the wiki that does that.  Then if folks agree, I'll put 
> together a patch that changes the site links for the tutorial to point

> at the wiki.
>
> Thanks,
> Jake.
>
> -----Original Message-----
> From: Franz Werfel [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, March 07, 2006 3:01 AM
> To: nutch-user@lucene.apache.org
> Subject: Re: project vitality? / less documentation is more!
>
> Hello,
>
> Just my 2 cents: the "Intranet crawl" functionnality is VERY 
> confusing.
>
> If it was just taken out of the tutorial, and out of the set of 
> commands, that would actually help A LOT: I understood many many 
> things about Nutch once I tried so-called whole-web crawling, where 
> one has to use every command one at a time. And that would also 
> eliminate all the questions about "how to recrawl", etc.
>
> Or maybe a change of name would be enough: "Intranet crawl" could be 
> called "fast-setup crawl", and "whole-web crawling", "serious crawling

> for Intranet or whole-web projects".
>
> What do you think?
>
> Thanks, Frank.
>

Reply via email to