Don't know but you can try to upgrading to 0.7.2
See Nutch Change Log:
http://svn.apache.org/viewcvs.cgi/lucene/nutch/branches/branch-0.7/CHANGES.txt?rev=390158
Dima Mazmanov wrote:
Hi,Håvard.
Thank you again for your help.
..mmm. there is else once thing I'm cuerious about...
The search result of several sites displays content like following :
Cool-Warez
[html] - 19.1 k - 11/3/2006
... Avatars გართობა კონტაქტი
როგორ მოვხსნათ www.sendspace.com Многие из Вас ... вопрос: "Как качать
с http://www
http://www.cool.caucasus.net/index_moxsna_2.htm (Cached) (More from
www.cool.caucasus.net)
as you can see there is a lot of spaces between words.. is this bug or
what?...
maybe it's because of different borders in web page and nutch places
spaces by his own ???
Is there any way to avoid this problem?