Thanks for the help Doug.

This is what I am getting after adding that little line.

[EMAIL PROTECTED] nutch-nightly]# bin/nutch dedup segments dedup.tmp
040922 062233 Clearing old deletions in segments/20040829092114/index
040922 062234 Clearing old deletions in segments/20040829122947/index
040922 062234 Clearing old deletions in segments/20040829124357/index
040922 062234 Clearing old deletions in segments/20040829130541/index
040922 062234 Clearing old deletions in segments/20040829212107/index
040922 062234 Clearing old deletions in segments/20040829225928/index
040922 062234 Clearing old deletions in segments/20040830042947/index
040922 062234 Clearing old deletions in segments/20040830043001/index
040922 062234 Clearing old deletions in segments/20040830065943/index
040922 062234 Clearing old deletions in segments/20040830111830/index
040922 062237 Reading url hashes...
040922 062237 loading file:/root/nutch-nightly/conf/nutch-default.xml
040922 062238 loading file:/root/nutch-nightly/conf/nutch-site.xml
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index:
12243, Size: 12
  at java.util.ArrayList.RangeCheck(ArrayList.java:507)
  at java.util.ArrayList.get(ArrayList.java:324)
  at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155)
  at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:66)
  at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237)
  at
net.nutch.indexer.DeleteDuplicates.computeHashes(DeleteDuplicates.java:182)
  at
net.nutch.indexer.DeleteDuplicates.deleteUrlDuplicates(DeleteDuplicates.java
:149)
  at net.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java:264)
[EMAIL PROTECTED] nutch-nightly]#

Something very strange is going on.  I posted this in the dev group, but I
think it has to do with everything.  On this same box, I am trying to update
the db and get this error:

Ű)ự�)բ

��>�>!�>��0.http://www.angelfire.com/ga/Jannat/Shadow.html.http://www.angelf
ire.com/ga/Jannat/Shadow.html�Ѧֽ���

�ﰳ9R>p>19��/-http://www.angelfire.com/ga/JayR>�z>�m#!http://www.angelfire
.com/ga/KSBA/!http://www.angelfire.com/ga/KSBA/�}�
�©��Ľϣ�9�>&Ǯ>�}+)http://www.angelfire.com/ga/KingGhidorah/)http://www
.angelfire.com/ga/KingGhidorah/ʩoñӻ飬y�l�>�>��53http://www.angelfire.co
m/ga/LBAE߯Ź۾۶NWସ��>�>��20http://www.angelfire.com/ga/Link64/Z64Enter.
html0http://www.angelfire.com/ga/Link64/Z64Enter.html�ۢx�`\�鬫(�ʩ>�>,*ht
tp://www.angelfire.com/ga/MAP/bogey.html*http://www.angelfire.com/ga/MAP/bog
ey.html�ԧ�5�LĿ��>�>�{*(http://www.angelfire.com/ga/MattSteiner/(http://
www.angelfire.com/ga/MattSteiner/K^ܻTr.Ǫk85ټL��
�۪>�>�o$"http://www.angelfire.com/ga/RRCOC/"http://www.angelfire.com/ga/RR
COC/Z!Jj˴]Vٳ%��h�>[EMAIL PROTECTED];http://www.angelfire.com/ga/SGRLibraries/HahiraL
ibrary.html;http://www.angelfire.com/ga/SGRLibraries/HahiraLibrary.htmlNä¼¢^
Pͧkײ���>�>��86http://www.angelfire.com/ga/SGs޼'t^ܻ�s/Lakeland.html6ht
tp://www.angelfire.com/ga/SGRLibraries/Lakeland.html�
��>�>��97http://www.angelfire.com/ga/SGRLibraries/Southside.html7http://
www.angelfire.com/ga/SGRLibraries/Southside.html�ѣ�r-�~�ɽ�>�>��;9http:/
/www.angelfire.com/ga/SGRLibraries/Statenville.html9http://www.angelfire.com
/ga/SGRLibraries/Statenville.html6ެ׼�̯;���>�>�{*(http://www.angelfire
.com/ga/SYKOTEKNEON/(http://www.angelfire.com/ga/SYKOTEKNEON/ʷb�Ӥz㾨�i�
��>�>,*http://ww�q�gelfire.com/ga/SkerriesHarps/*http��>�>�s&$http://www.a
ngelfire.com/ga/Solomon/$http://www.angelfire.com/ga/Solomon/�N�Ҷu=s����>
�>��20http://www.angelfire.com/ga/Starcheer/index.html0http://www.angelfire.
com/ga/Starcheer/index.html�hu&ֶå�PV��
>�>��20http://www.angelfire.com/ga/TauBetaSigmaZetaTau/0http://www.angelfire
.com/ga/TauBetaSigmaZetaTau/v�G��vs!|���>�>�s&$http://www.angelfire.com/
ga/TheCat0/$http://www.angelfire.com/ga/TheCat0/��Ev\;
�

�D>���0.http://www.angelfire.com/ga/ThirdAgeTreasures/.http://www.angelfire
.com/ga/ThirdAgeTreasures/
RV
gǢC�ץ��>�>�o$"http://www.angelfire.com/ga/Verma/"http://www.angelfire.com
/ga/Verma/ӭUɫ�Y貫MͲ�Ͼ>�>��31http://www.angelfire.com/ga/ZetaTalpha/i
ndex.html1http://www.angelfire.com/ga/ZetaTalpha/index.htmlͤJs���߲���>�>
��64http://www.angelfire.com/ga/ZoonPolitikon/index.html4http://www.angelfir
e.com/ga/ZoonPolitikon/index.htmlò�

趥fhеۡ��>㵾��0.http://www.angelfire.com/ga/achamtb/clubs.html.http://ww
w.angelfire.com/ga/achamtb/clubs.html�
ü
]Ͻڪ�r3}�c�>"캾��31http://www.angelfire.com/ga/achamtb/downhill.html1http
://www.angelfire.com/ga/achamtb/downhill.htmlI7ɿc۸�n�c�>"캾��.,http://www
.angelfire.com/ga/achamtb/faq.html,http://www.angelfire.com/ga/achamtb/faq.h
tml'婩���tx��"ٺ��c�>"캾��53http://www.angelfire.com/ga/achamtb/new_photos
.html3http://www.angelfire.com/ga/achamtb/new_photos.htmlҿ�

� �cǼ#�c�>"캾��31http://www.angelfire.com/ga/achamtb/overview.html1http://
www.angelfire.com/ga/achamtb/overview.htmlÓ¤tÕ¯]
,`��c�>"캾��31http://www.angelfire.com/ga/achamtb/pictures.html1http://www.
angelfire.com/ga/achamtb/pictures.htmleZ{��}��o�c�>"캾��31http://www.angel
fire.com/ga/achamtb/survival.html1http://www.angelfire.com/ga/achamtb/surviv
al.html�ծ
�ֺ=�c�>"캾��1/http://www.angelfire.com/ga/achamtb/trails.html/http://www.a
ngelfire.com/ga/achamtb/trails.html�㥽W{j;g(ӡb��c�>"캾�s&$http://www.ange
lfire.com/ga/aeontrix$http://www.angelfire.com/ga/aeontrixr)f䨹!?l�>�>��
[EMAIL PROTECTED]://www.angelfire.com/ga/angelhugspage/[EMAIL PROTECTED]://ww
w.angelfire.com/ga/angelhugspage/hugsforsinglemoms.htmlNO�
�$�����
���^U>)C>��53http://www.angelfire.com/ga/batwentyone/Attack.html3http://www
.angelfire.com/ga/batwentyone/Attack.htmlܯ�
Ѻc&Ťҳ��^U>
<>��42http://www.angelfire.com/ga/batwentyone/Bases.html2http://www.angelfir
e.com/ga/batwentyone/Bases.html˨J/�ȣ$c%��^U>
<>��64http://www.angelfire.com/ga/batwentyone/Schools.html4http://www.angelf
ire.com/ga/batwentyone/Schools.htmlئA�3f˻V�е�^U>
<>��64http://www.angelfire.com/ga/batwentyone/Spe.Msn.html4http://www.angelf
ire.com/ga/batwentyone/Spe.Msn.html
�
 7֬��?}��^U>
<>��75http://www.angelfire.com/ga/batwenE�
J8�^U>ners.html5http://www.angelfire<>��75http://www.angelfire.com/ga/batwe
ntyone/figthers.html5http://www.angelfire.com/ga/batwentyone/figthers.html�
�
�:飻ӳ��^U>
<>��64http://www.angelfire.com/ga/batwentyone/gallery.html4http://www.angelf
ire.com/ga/batwentyone/gallery.html�ڹ {
/_d}׶�^U>
<>��75http://www.angelfire.com/ga/batwentyone/thankyou.html5http://www.angel
fire.com/ga/batwentyone/thankyou.htmlϬϩ
fͤéƳ½_�^U>
<>��97http://www.angelfire.com/ga/batwentyone/transports.html7http://www.ang
elfire.com/ga/batwentyone/transports.html�U�Mnƨ֡.��^U>
<>��64http://www.angelfire.com/ga/batwentyone/utility.html4http://www.angelf
ire.com/ga/batwentyone/utility.htmlP¼rҫ$.D;ȣҾ�^U>
<>��42http://www.angelfire.com/ga/batwentyone/valor.html2http://www.angelfir
e.com/ga/batwentyone/valor.htmlҥ����x��^U>
<>�q%#http://www.angelfire.com/ga/bazuka/#http://www.angelfire.com/ga/bazuka
//#ʵ�UЫ����>$z��k&�>�>�w read 26994 bytes, should read
779577707lfire.com/ga/b^[[?6c^[[?6c^[[?6c^[[?6c^[[?6c^[[?6c^[[?6c^[[?6c^[[?6
c^[[?6cPuTTY^[[?6c^[[?6c^[[?6c^[[?6c^[[?6c^[[?6c   at
net.nutch.io.SequenceFile$Reader.next(SequenceFile.java:192)
  at net.nutch.io.SequenceFile$Reader.next(SequenceFile.java:205)
  at net.nutch.io.MapFile$Reader.next(MapFile.java:300)
  at
net.nutch.db.WebDBWriter$PagesByURLProcessor.mergeEdits(WebDBWriter.java:623
)
  at net.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.java:543)
  at net.nutch.db.WebDBWriter.close(WebDBWriter.java:1534)
  at net.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:297)
  at net.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:342)
You have new mail in /var/spool/mail/root
[EMAIL PROTECTED] nutch-nightly]#

Do I start the database over or is this some segment that is corrupting
everything?

Thanks,

Jason

----- Original Message ----- 
From: "Doug Cutting" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, September 21, 2004 10:43 AM
Subject: Re: [Nutch-general] Is the Database toast?


> It actually looks more like a segment's index is trashed.
>
> Try using the following patch to identify the troubled segment, then
> re-index it.
>
> Doug
>



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to