Marko Bauhardt wrote:
> 
> Am 26.05.2006 um 01:57 schrieb Stefan Neufeind:
>>> Modified. If not, date=FetchTime.
>>
>> Hi Marko,
>>
> 
> Hi Stefan,
> 
>> that hint really helped. Can you maybe also help me out with sort=title?
>> See also:
>> http://issues.apache.org/jira/browse/NUTCH-287
>>
>> The problem is that it works on some searches - but not always. Could it
>> be that maybe some plugins don't write a title or write title as
>> null/empty and that leads to problems? What could I do:
> 
> If a html page begins with "<?xml", then the textparser is used and not
> the html parser (i am not sure). If the TextParser is used to parse this
> page, then no title will be extract. So in this case the title is empty
> and the summary is xml-code.
> 
> Please verify your pages , that have no title and look whether "<?xml"
> exists at the begin of this page.

I could understand that those documents are "problematic" in sorting -
e.g. they would all be in front or at the end of the sorted list. But
why does this actually lead to no output/an exception/...?

Maybe in case no title is present at least _something_ could be used -
e.g. the URL instead or so?


Regards,
 Stefan

Reply via email to