> Can u post some of the urls for which parse text is missing. I am unable to post the actual urls. This is a private project for which exact urls cannot be shared.
JohnM > On Tue, Oct 21, 2008 at 6:44 AM, John Mendenhall <[EMAIL PROTECTED]>wrote: > > > We are using nutch version nutch-2008-07-22_04-01-29. > > We have a crawldb with over 1 million urls. > > > > We have noticed some of the urls in search results > > do not have titles. After some research comparing > > urls with titles and urls without titles, the urls > > without titles have empty parsetext. > > > > Why would some urls have empty parsetext? > > Is there some place I can look to determine why > > parsetext is missing? > > > > Is the only way to reparse those urls with empty > > parsetext to remove the crawl_parse directory for > > the corresponding segment and run the nutch parse > > command? > > > > Is there something I should do to guarantee all > > urls get a parsetext, and hopefully, a title? > > > > Thanks in advance for any assistance or pointers > > to other resources or ideas. > > > > JohnM -- john mendenhall [EMAIL PROTECTED] surf utopia internet services
