> Can u post some of the urls for which parse text is missing.

I am unable to post the actual urls.  This is a private
project for which exact urls cannot be shared.

JohnM




> On Tue, Oct 21, 2008 at 6:44 AM, John Mendenhall <[EMAIL PROTECTED]>wrote:
> 
> > We are using nutch version nutch-2008-07-22_04-01-29.
> > We have a crawldb with over 1 million urls.
> >
> > We have noticed some of the urls in search results
> > do not have titles.  After some research comparing
> > urls with titles and urls without titles, the urls
> > without titles have empty parsetext.
> >
> > Why would some urls have empty parsetext?
> > Is there some place I can look to determine why
> > parsetext is missing?
> >
> > Is the only way to reparse those urls with empty
> > parsetext to remove the crawl_parse directory for
> > the corresponding segment and run the nutch parse
> > command?
> >
> > Is there something I should do to guarantee all
> > urls get a parsetext, and hopefully, a title?
> >
> > Thanks in advance for any assistance or pointers
> > to other resources or ideas.
> >
> > JohnM

-- 
john mendenhall
[EMAIL PROTECTED]
surf utopia
internet services

Reply via email to