Re: mapred bug -- bad part calculation?

2005-11-09 Thread Paul Baclace
Rod Taylor wrote: The attached patches for Generator.java and Injector.java allow a specific temporary directory to be specified. This gives Nutch the full path to these temporary directories and seems to fix the No input directories issue when using a local filesystem with multiple task

Re: mapred bug -- bad part calculation?

2005-11-08 Thread Doug Cutting
Rod Taylor wrote: The attached patches for Generator.java and Injector.java allow a specific temporary directory to be specified. This gives Nutch the full path to these temporary directories and seems to fix the No input directories issue when using a local filesystem with multiple task

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Rod Taylor
On Fri, 2005-11-04 at 20:41 -0800, Doug Cutting wrote: Rod Taylor wrote: Here you go. local filesystem and a single job tracker on another machine. When the tasktracker and jobtracker are on the same box there isn't a problem. When they are on different machines it runs into issues.

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Massimo Miccoli
Hello Nutch devs, I have same problems. I have 10 hosts and one master. For each host I have a datanode and tasktracer. My mapred conf is 100 maps and 25 reducers. Belove the logs with errors. Thanks 051107 144101 task_r_pd3ybk 0.224% reduce copy 051107 144102 Moving bad file

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Rod Taylor
The attached patches for Generator.java and Injector.java allow a specific temporary directory to be specified. This gives Nutch the full path to these temporary directories and seems to fix the No input directories issue when using a local filesystem with multiple task trackers. On Mon,

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Paul Baclace
Rod Taylor wrote: The attached patches for Generator.java and Injector.java allow a specific temporary directory to be specified. This gives Nutch the full path to these temporary directories and seems to fix the No input directories issue when using a local filesystem with multiple task

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Rod Taylor
On Mon, 2005-11-07 at 17:26 -0800, Paul Baclace wrote: Rod Taylor wrote: The attached patches for Generator.java and Injector.java allow a specific temporary directory to be specified. This gives Nutch the full path to these temporary directories and seems to fix the No input directories

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Paul Baclace
Rod Taylor wrote: NDFS accomplishes the above path finding by auto-prefixing any path not beginning with / with a /user/$USER. I didn't think it was appropriate for LocalFileSystem.java to be mucking around trying to automatically adjust paths to what the user may have intended. Grep-ing for

Re: mapred bug -- bad part calculation?

2005-11-07 Thread Rod Taylor
On Mon, 2005-11-07 at 18:12 -0800, Paul Baclace wrote: Rod Taylor wrote: NDFS accomplishes the above path finding by auto-prefixing any path not beginning with / with a /user/$USER. I didn't think it was appropriate for LocalFileSystem.java to be mucking around trying to automatically

Re: mapred bug -- bad part calculation?

2005-11-05 Thread Stefan Groschupf
I tried running one datanode per machine connecting back to the same SAN but it seemed pretty clunky. SAN in general is a bad idea. A SAN is too slow for a serious setup. ... and it is the single point of failure... Better use many local hdd. Stefan

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: Every segment that I fetch seems to be missing a part when stored on the filesystem. The stranger thing is it is always the same part (very reproducible). This sounds strange. Are the datanode errors always on the same host? How many hosts are you running this on? Doug

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote: Rod Taylor wrote: Every segment that I fetch seems to be missing a part when stored on the filesystem. The stranger thing is it is always the same part (very reproducible). This sounds strange. Are the datanode errors always on the

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote: Rod Taylor wrote: Every segment that I fetch seems to be missing a part when stored on the filesystem. The stranger thing is it is always the same part (very reproducible). This sounds strange. Are the datanode errors always on the

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: There is only a single datanode and there are 20 hosts. That's a lot of load on one datanode. I typically run a datanode on every host, accessing the local drives on that host. Doug

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: I tried running one datanode per machine connecting back to the same SAN but it seemed pretty clunky. A crash of any datanode would take down the entire system (no data replication since it's a common data-store in the end). Reducing it to a single datanode did not have this

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 19:43 -0800, Doug Cutting wrote: Rod Taylor wrote: I tried running one datanode per machine connecting back to the same SAN but it seemed pretty clunky. A crash of any datanode would take down the entire system (no data replication since it's a common data-store in

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 22:57 -0500, Rod Taylor wrote: On Fri, 2005-11-04 at 19:43 -0800, Doug Cutting wrote: Rod Taylor wrote: I tried running one datanode per machine connecting back to the same SAN but it seemed pretty clunky. A crash of any datanode would take down the entire system

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: Here you go. local filesystem and a single job tracker on another machine. When the tasktracker and jobtracker are on the same box there isn't a problem. When they are on different machines it runs into issues. This is using mapred.local.dir on the local machine (not sharedd

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 20:41 -0800, Doug Cutting wrote: Rod Taylor wrote: Here you go. local filesystem and a single job tracker on another machine. When the tasktracker and jobtracker are on the same box there isn't a problem. When they are on different machines it runs into issues.

mapred bug -- bad part calculation?

2005-11-03 Thread Rod Taylor
Sources are from October 31st. Sun Standard Edition 1.5.0_02-b09 for amd64 Every segment that I fetch seems to be missing a part when stored on the filesystem. The stranger thing is it is always the same part (very reproducible). If I have mapred.reduce.tasks set to 20, the hole is at part 13.

Re: mapred bug -- bad part calculation?

2005-11-03 Thread Rod Taylor
I forgot to provide this earlier. Here is nutch ndfs -ls output for the directory structure of a segment with a failed part-00013. [EMAIL PROTECTED] ~]$ /opt/nutch/bin/nutch ndfs -ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133 051103 162002 parsing