Off the top of my head one property springs to mind. Which you may or may
not have configured in nutch-site

http.content.limit

However I think that this is not the source of the problem.
I would advise you to have a look at your hadoop log file for any obvious
warnings... how do you know "he sweeps up about 50 lines after
that he does not sweep over the text"? Have you looked at a dump of the
crawldb to see what content the database is aware of?

Without verifying answers to some of the above it is hard to decouple the
errors in nutch from the legacy architecture of < Nutch 1.3


On Thu, Jun 16, 2011 at 3:03 PM, Jefferson <[email protected]> wrote:

> Hi
> I'm testing the nutch, I followed the tutorial in the nutch,
> but I found a problem. I ran the command bin / nutch crawl
> 6 sites in plain text that contains only about 400 lines of text, so far so
> normal. When I do a search with Nutch, he sweeps up about 50 lines after
> that he does not sweep over the text. If I look, for example by "church"
> and
> this word is beyond the first 50 lines of text, it returns 0 results.
> Anyone have any solution for this?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-with-Nutch-Search-tp3072077p3072077.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*Lewis*

Reply via email to