That's what I had read on another post as well, but somehow, I can't
understand how it can be corrupted! It's not even a massive index. Just a
couple of urls. Every step that I followed was per the tutorials on the wiki
page.

Here's the list under /indexes:

drwxr-xr-x  2 root root 4096 Jan 31 16:21 part-00000
drwxr-xr-x  2 root root 4096 Jan 31 16:21 part-00001

This is what's under part-00000

-rw-r--r--  1 root root    2 Jan 31 16:21 _2.f0
-rw-r--r--  1 root root    2 Jan 31 16:21 _2.f1
-rw-r--r--  1 root root    2 Jan 31 16:21 _2.f2
-rw-r--r--  1 root root    2 Jan 31 16:21 _2.f3
-rw-r--r--  1 root root    2 Jan 31 16:21 _2.f4
-rw-r--r--  1 root root    2 Jan 31 16:21 _2.f5
-rw-r--r--  1 root root  399 Jan 31 16:21 _2.fdt
-rw-r--r--  1 root root   16 Jan 31 16:21 _2.fdx
-rw-r--r--  1 root root   74 Jan 31 16:21 _2.fnm
-rw-r--r--  1 root root  945 Jan 31 16:21 _2.frq
-rw-r--r--  1 root root 1790 Jan 31 16:21 _2.prx
-rw-r--r--  1 root root  105 Jan 31 16:21 _2.tii
-rw-r--r--  1 root root 6850 Jan 31 16:21 _2.tis
-rw-r--r--  1 root root    4 Jan 31 16:21 deletable
-rw-r--r--  1 root root    0 Jan 31 16:21 index.done
-rw-r--r--  1 root root   27 Jan 31 16:21 segments

This is what's under part-00001

-rw-r--r--  1 root root  0 Jan 31 16:21 index.done
-rw-r--r--  1 root root 20 Jan 31 16:21 segments
 
By the way, also to mention here that I am running dedup on the DFS system.
I haven't tried running it on the local system yet, but does that matter?

Thanks for your help.



Hetal Shah wrote:
> Hey guys
>  
> Been breaking my head over this error for a while now, but don't seem 
> to be getting anywhere! I have tried creating / recreating the index 
> several times, and also made sure that all settings were as "per the 
> book". I read somewhere on one of the other posts that this error 
> could be due to a corrupted index, but somehow, I don't think that's 
> the case. I only have a few urls in the index with depth 1, so it's not
even a large crawl!
>  
> There are two directories in my crawled/indexes directory, viz. 
> part-00000 and part-00001.
>   

Could you do an 'ls -l' to show the content and sizes of these parts?
>  
> Task TASKID="tip_0009_m_000001" TASK_TYPE="MAP" TASK_STATUS="FAILED"
> FINISH_TIME="1170237489795"
ERROR="java.lang.ArrayIndexOutOfBoundsException:
> -1
>  at 
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:109)
>   

This usually indicates that one or more indexes under crawled/indexes is
invalid - nonexistent, incomplete or corrupt.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||  \|
||  |  Embedded Unix, System Integration http://www.sigram.com  Contact:
info at sigram dot com




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to