That might be it. The page says Content-Length: 340671

> Hi Jefferson,
> 
> I cannot access either your nutch-site or nutch-default but I see that your
> http.content.limit is  INFO http.Http - http.content.limit = 65536
> 
> It is a fairly large page so maybe this can be the cause. I'm sorrry I
> don't have access to my linux worktop so I can't test myself can you
> please advise if this has been accounted for in your nutch-site. Anything
> over the default 65536 limit is truncated therefore you may not be able to
> search for it.
> 
> Further to this it seems that the hadoop.log does not show any eratic
> bahaviour.
> 
> On Fri, Jun 24, 2011 at 7:40 AM, Jefferson <[email protected]> wrote:
> > My problem is in the search.
> > I made the site crawler http://en.wikipedia.org/wiki/Albert_Einstein
> > When I access the http://localhost:8080/nutch-1.1/
> > and digit <Adolf Hitler> returns me a result, ok.
> > When I type <phenomena> returns 0 results, not ok.
> > 
> > Attached is my config files and logging.
> > thanks
> > 
> > http://lucene.472066.n3.nabble.com/file/n3104461/nutch-site.xml
> > nutch-site.xml
> > http://lucene.472066.n3.nabble.com/file/n3104461/nutch-default.xml
> > nutch-default.xml
> > http://lucene.472066.n3.nabble.com/file/n3104461/hadoop.log hadoop.log
> > http://lucene.472066.n3.nabble.com/file/n3104461/crawl.log crawl.log
> > 
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Problem-in-search-tp3104461p3104461.ht
> > ml Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to