Thanks for the help Doug. This is what I am getting after adding that little line.
[EMAIL PROTECTED] nutch-nightly]# bin/nutch dedup segments dedup.tmp 040922 062233 Clearing old deletions in segments/20040829092114/index 040922 062234 Clearing old deletions in segments/20040829122947/index 040922 062234 Clearing old deletions in segments/20040829124357/index 040922 062234 Clearing old deletions in segments/20040829130541/index 040922 062234 Clearing old deletions in segments/20040829212107/index 040922 062234 Clearing old deletions in segments/20040829225928/index 040922 062234 Clearing old deletions in segments/20040830042947/index 040922 062234 Clearing old deletions in segments/20040830043001/index 040922 062234 Clearing old deletions in segments/20040830065943/index 040922 062234 Clearing old deletions in segments/20040830111830/index 040922 062237 Reading url hashes... 040922 062237 loading file:/root/nutch-nightly/conf/nutch-default.xml 040922 062238 loading file:/root/nutch-nightly/conf/nutch-site.xml Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 12243, Size: 12 at java.util.ArrayList.RangeCheck(ArrayList.java:507) at java.util.ArrayList.get(ArrayList.java:324) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:66) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237) at net.nutch.indexer.DeleteDuplicates.computeHashes(DeleteDuplicates.java:182) at net.nutch.indexer.DeleteDuplicates.deleteUrlDuplicates(DeleteDuplicates.java :149) at net.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java:264) [EMAIL PROTECTED] nutch-nightly]# Something very strange is going on. I posted this in the dev group, but I think it has to do with everything. On this same box, I am trying to update the db and get this error: Ű)ự�)Õ¢ ��>�>!�>��0.http://www.angelfire.com/ga/Jannat/Shadow.html.http://www.angelf ire.com/ga/Jannat/Shadow.html�Ѧֽ��� �ﰳ9R>p>19��/-http://www.angelfire.com/ga/JayR>�z>�m#!http://www.angelfire .com/ga/KSBA/!http://www.angelfire.com/ga/KSBA/�}� �©��Ľϣ�9�>&Ç®>�}+)http://www.angelfire.com/ga/KingGhidorah/)http://www .angelfire.com/ga/KingGhidorah/Ê©oñӻ飬y�l�>�>��53http://www.angelfire.co m/ga/LBAE߯Ź۾۶NWସ��>�>��20http://www.angelfire.com/ga/Link64/Z64Enter. html0http://www.angelfire.com/ga/Link64/Z64Enter.html�ۢx�`\�鬫(�ʩ>�>,*ht tp://www.angelfire.com/ga/MAP/bogey.html*http://www.angelfire.com/ga/MAP/bog ey.html�ԧ�5�LĿ��>�>�{*(http://www.angelfire.com/ga/MattSteiner/(http:// www.angelfire.com/ga/MattSteiner/K^Ü»Tr.Ǫk85Ù¼L�� �۪>�>�o$"http://www.angelfire.com/ga/RRCOC/"http://www.angelfire.com/ga/RR COC/Z!JjË´]VÙ³%��h�>[EMAIL PROTECTED];http://www.angelfire.com/ga/SGRLibraries/HahiraL ibrary.html;http://www.angelfire.com/ga/SGRLibraries/HahiraLibrary.htmlNä¼¢^ Pͧkײ���>�>��86http://www.angelfire.com/ga/SGsÞ¼'t^ܻ�s/Lakeland.html6ht tp://www.angelfire.com/ga/SGRLibraries/Lakeland.html� ��>�>��97http://www.angelfire.com/ga/SGRLibraries/Southside.html7http:// www.angelfire.com/ga/SGRLibraries/Southside.html�ѣ�r-�~�ɽ�>�>��;9http:/ /www.angelfire.com/ga/SGRLibraries/Statenville.html9http://www.angelfire.com /ga/SGRLibraries/Statenville.html6ެ׼�̯;���>�>�{*(http://www.angelfire .com/ga/SYKOTEKNEON/(http://www.angelfire.com/ga/SYKOTEKNEON/Ê·b�Ӥz㾨�i� ��>�>,*http://ww�q�gelfire.com/ga/SkerriesHarps/*http��>�>�s&$http://www.a ngelfire.com/ga/Solomon/$http://www.angelfire.com/ga/Solomon/�N�Ҷu=s����> �>��20http://www.angelfire.com/ga/Starcheer/index.html0http://www.angelfire. com/ga/Starcheer/index.html�hu&ֶå�PV�� >�>��20http://www.angelfire.com/ga/TauBetaSigmaZetaTau/0http://www.angelfire .com/ga/TauBetaSigmaZetaTau/v�G��vs!|���>�>�s&$http://www.angelfire.com/ ga/TheCat0/$http://www.angelfire.com/ga/TheCat0/��Ev\; � �D>���0.http://www.angelfire.com/ga/ThirdAgeTreasures/.http://www.angelfire .com/ga/ThirdAgeTreasures/ RV gÇ¢C�ץ��>�>�o$"http://www.angelfire.com/ga/Verma/"http://www.angelfire.com /ga/Verma/ÓUɫ�Y貫MͲ�Ͼ>�>��31http://www.angelfire.com/ga/ZetaTalpha/i ndex.html1http://www.angelfire.com/ga/ZetaTalpha/index.htmlͤJs���߲���>�> ��64http://www.angelfire.com/ga/ZoonPolitikon/index.html4http://www.angelfir e.com/ga/ZoonPolitikon/index.htmlò� è¶¥fhеۡ��>㵾��0.http://www.angelfire.com/ga/achamtb/clubs.html.http://ww w.angelfire.com/ga/achamtb/clubs.html� ü ]Ͻڪ�r3}�c�>"캾��31http://www.angelfire.com/ga/achamtb/downhill.html1http ://www.angelfire.com/ga/achamtb/downhill.htmlI7É¿c۸�n�c�>"캾��.,http://www .angelfire.com/ga/achamtb/faq.html,http://www.angelfire.com/ga/achamtb/faq.h tml'婩���tx��"ٺ��c�>"캾��53http://www.angelfire.com/ga/achamtb/new_photos .html3http://www.angelfire.com/ga/achamtb/new_photos.htmlҿ� � �cǼ#�c�>"캾��31http://www.angelfire.com/ga/achamtb/overview.html1http:// www.angelfire.com/ga/achamtb/overview.htmlÓ¤tÕ¯] ,`��c�>"캾��31http://www.angelfire.com/ga/achamtb/pictures.html1http://www. angelfire.com/ga/achamtb/pictures.htmleZ{��}��o�c�>"캾��31http://www.angel fire.com/ga/achamtb/survival.html1http://www.angelfire.com/ga/achamtb/surviv al.html�ծ �ֺ=�c�>"캾��1/http://www.angelfire.com/ga/achamtb/trails.html/http://www.a ngelfire.com/ga/achamtb/trails.html�㥽W{j;g(Ó¡b��c�>"캾�s&$http://www.ange lfire.com/ga/aeontrix$http://www.angelfire.com/ga/aeontrixr)f䨹!?l�>�>�� [EMAIL PROTECTED]://www.angelfire.com/ga/angelhugspage/[EMAIL PROTECTED]://ww w.angelfire.com/ga/angelhugspage/hugsforsinglemoms.htmlNO� �$����� ���^U>)C>��53http://www.angelfire.com/ga/batwentyone/Attack.html3http://www .angelfire.com/ga/batwentyone/Attack.htmlܯ� Ѻc&Ťҳ��^U> <>��42http://www.angelfire.com/ga/batwentyone/Bases.html2http://www.angelfir e.com/ga/batwentyone/Bases.html˨J/�ȣ$c%��^U> <>��64http://www.angelfire.com/ga/batwentyone/Schools.html4http://www.angelf ire.com/ga/batwentyone/Schools.htmlئA�3fË»V�е�^U> <>��64http://www.angelfire.com/ga/batwentyone/Spe.Msn.html4http://www.angelf ire.com/ga/batwentyone/Spe.Msn.html � 7֬��?}��^U> <>��75http://www.angelfire.com/ga/batwenE� J8�^U>ners.html5http://www.angelfire<>��75http://www.angelfire.com/ga/batwe ntyone/figthers.html5http://www.angelfire.com/ga/batwentyone/figthers.html� � �:飻ӳ��^U> <>��64http://www.angelfire.com/ga/batwentyone/gallery.html4http://www.angelf ire.com/ga/batwentyone/gallery.html�ڹ { /_d}׶�^U> <>��75http://www.angelfire.com/ga/batwentyone/thankyou.html5http://www.angel fire.com/ga/batwentyone/thankyou.htmlϬϩ fͤéƳ½_�^U> <>��97http://www.angelfire.com/ga/batwentyone/transports.html7http://www.ang elfire.com/ga/batwentyone/transports.html�U�Mnƨ֡.��^U> <>��64http://www.angelfire.com/ga/batwentyone/utility.html4http://www.angelf ire.com/ga/batwentyone/utility.htmlP¼rÒ«$.D;ȣҾ�^U> <>��42http://www.angelfire.com/ga/batwentyone/valor.html2http://www.angelfir e.com/ga/batwentyone/valor.htmlҥ����x��^U> <>�q%#http://www.angelfire.com/ga/bazuka/#http://www.angelfire.com/ga/bazuka //#ʵ�U����>$z��k&�>�>�w read 26994 bytes, should read 779577707lfire.com/ga/b^[[?6c^[[?6c^[[?6c^[[?6c^[[?6c^[[?6c^[[?6c^[[?6c^[[?6 c^[[?6cPuTTY^[[?6c^[[?6c^[[?6c^[[?6c^[[?6c^[[?6c at net.nutch.io.SequenceFile$Reader.next(SequenceFile.java:192) at net.nutch.io.SequenceFile$Reader.next(SequenceFile.java:205) at net.nutch.io.MapFile$Reader.next(MapFile.java:300) at net.nutch.db.WebDBWriter$PagesByURLProcessor.mergeEdits(WebDBWriter.java:623 ) at net.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.java:543) at net.nutch.db.WebDBWriter.close(WebDBWriter.java:1534) at net.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:297) at net.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:342) You have new mail in /var/spool/mail/root [EMAIL PROTECTED] nutch-nightly]# Do I start the database over or is this some segment that is corrupting everything? Thanks, Jason ----- Original Message ----- From: "Doug Cutting" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, September 21, 2004 10:43 AM Subject: Re: [Nutch-general] Is the Database toast? > It actually looks more like a segment's index is trashed. > > Try using the following patch to identify the troubled segment, then > re-index it. > > Doug > ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
