I am getting warnings in hadoop.log that segments.gen and segments_2 are not directories, and as you can see by the listing, they are in fact files not directories. I'm not sure what stage of the process this is happening in, as I just now stumbled on them, but it concerns me that it says it is skipping something. Any ideas before I start digging further?
2009-11-30 08:28:56,344 WARN mapred.FileInputFormat - Can't open index at hdfs://nn1:9000/user/nutch/crawl/index1/segments.gen:0+2147483647, skipping. (hdfs://nn1:9000/user/nutch/crawl/index1/segments.gen not a directory) 2009-11-30 08:29:00,509 WARN mapred.FileInputFormat - Can't open index at hdfs://nn1:9000/user/nutch/crawl/index2/segments.gen:0+2147483647, skipping. (hdfs://nn1:9000/user/nutch/crawl/index2/segments.gen not a directory) 2009-11-30 08:29:04,314 WARN mapred.FileInputFormat - Can't open index at hdfs://nn1:9000/user/nutch/crawl/index2/segments_2:0+2147483647, skipping. (hdfs://nn1:9000/user/nutch/crawl/index2/segments_2 not a directory) [nu...@nn1 logs]$ cd ~/crawl/search/ [nu...@nn1 search]$ bin/hadoop dfs -ls crawl/index1 Found 10 items -rw-r--r-- 1 nutch supergroup 454257 2009-11-30 08:28 /user/nutch/crawl/index1/_0.fdt -rw-r--r-- 1 nutch supergroup 20300 2009-11-30 08:28 /user/nutch/crawl/index1/_0.fdx -rw-r--r-- 1 nutch supergroup 81 2009-11-30 08:28 /user/nutch/crawl/index1/_0.fnm -rw-r--r-- 1 nutch supergroup 2641385 2009-11-30 08:28 /user/nutch/crawl/index1/_0.frq -rw-r--r-- 1 nutch supergroup 15226 2009-11-30 08:28 /user/nutch/crawl/index1/_0.nrm -rw-r--r-- 1 nutch supergroup 5122161 2009-11-30 08:28 /user/nutch/crawl/index1/_0.prx -rw-r--r-- 1 nutch supergroup 30777 2009-11-30 08:28 /user/nutch/crawl/index1/_0.tii -rw-r--r-- 1 nutch supergroup 2199031 2009-11-30 08:28 /user/nutch/crawl/index1/_0.tis -rw-r--r-- 1 nutch supergroup 20 2009-11-30 08:28 /user/nutch/crawl/index1/segments.gen -rw-r--r-- 1 nutch supergroup 58 2009-11-30 08:28 /user/nutch/crawl/index1/segments_2 Jesse int GetRandomNumber() { return 4; // Chosen by fair roll of dice // Guaranteed to be random } // xkcd.com