We are using nutch version nutch-2008-07-22_04-01-29.
We have a crawldb with over 1 million urls.

We have noticed some of the urls in search results
do not have titles.  After some research comparing
urls with titles and urls without titles, the urls
without titles have empty parsetext.

Why would some urls have empty parsetext?
Is there some place I can look to determine why
parsetext is missing?

Is the only way to reparse those urls with empty
parsetext to remove the crawl_parse directory for
the corresponding segment and run the nutch parse
command?

Is there something I should do to guarantee all
urls get a parsetext, and hopefully, a title?

Thanks in advance for any assistance or pointers
to other resources or ideas.

JohnM

-- 
john mendenhall
[EMAIL PROTECTED]
surf utopia
internet services

Reply via email to