hi,
Can u post some of the urls for which parse text is missing.
On Tue, Oct 21, 2008 at 6:44 AM, John Mendenhall <[EMAIL PROTECTED]>wrote:
> We are using nutch version nutch-2008-07-22_04-01-29.
> We have a crawldb with over 1 million urls.
>
> We have noticed some of the urls in search results
> do not have titles. After some research comparing
> urls with titles and urls without titles, the urls
> without titles have empty parsetext.
>
> Why would some urls have empty parsetext?
> Is there some place I can look to determine why
> parsetext is missing?
>
> Is the only way to reparse those urls with empty
> parsetext to remove the crawl_parse directory for
> the corresponding segment and run the nutch parse
> command?
>
> Is there something I should do to guarantee all
> urls get a parsetext, and hopefully, a title?
>
> Thanks in advance for any assistance or pointers
> to other resources or ideas.
>
> JohnM
>
> --
> john mendenhall
> [EMAIL PROTECTED]
> surf utopia
> internet services
>