Don't know but you can try to upgrading to 0.7.2

See Nutch Change Log:
http://svn.apache.org/viewcvs.cgi/lucene/nutch/branches/branch-0.7/CHANGES.txt?rev=390158

Dima Mazmanov wrote:
Hi,Håvard.
Thank you again for your help.
..mmm. there is else once thing  I'm cuerious about...
The search result of several sites displays content like following :

Cool-Warez
[html] - 19.1 k - 11/3/2006
... Avatars გართობა კონტაქტი როგორ მოვხსნათ www.sendspace.com Многие из Вас ... вопрос: "Как качать с http://www http://www.cool.caucasus.net/index_moxsna_2.htm (Cached) (More from www.cool.caucasus.net)

as you can see there is a lot of spaces between words.. is this bug or what?... maybe it's because of different borders in web page and nutch places spaces by his own ???
Is there any way to avoid this problem?


Reply via email to