Marko Bauhardt wrote: > > Am 26.05.2006 um 01:57 schrieb Stefan Neufeind: >>> Modified. If not, date=FetchTime. >> >> Hi Marko, >> > > Hi Stefan, > >> that hint really helped. Can you maybe also help me out with sort=title? >> See also: >> http://issues.apache.org/jira/browse/NUTCH-287 >> >> The problem is that it works on some searches - but not always. Could it >> be that maybe some plugins don't write a title or write title as >> null/empty and that leads to problems? What could I do: > > If a html page begins with "<?xml", then the textparser is used and not > the html parser (i am not sure). If the TextParser is used to parse this > page, then no title will be extract. So in this case the title is empty > and the summary is xml-code. > > Please verify your pages , that have no title and look whether "<?xml" > exists at the begin of this page.
I could understand that those documents are "problematic" in sorting - e.g. they would all be in front or at the end of the sorted list. But why does this actually lead to no output/an exception/...? Maybe in case no title is present at least _something_ could be used - e.g. the URL instead or so? Regards, Stefan
