Rod Taylor wrote:
The attached patches for Generator.java and Injector.java allow a
specific temporary directory to be specified. This gives Nutch the full
path to these temporary directories and seems to fix the No input
directories issue when using a local filesystem with multiple task
Rod Taylor wrote:
The attached patches for Generator.java and Injector.java allow a
specific temporary directory to be specified. This gives Nutch the full
path to these temporary directories and seems to fix the No input
directories issue when using a local filesystem with multiple task
On Fri, 2005-11-04 at 20:41 -0800, Doug Cutting wrote:
Rod Taylor wrote:
Here you go. local filesystem and a single job tracker on another
machine. When the tasktracker and jobtracker are on the same box there
isn't a problem. When they are on different machines it runs into
issues.
Hello Nutch devs,
I have same problems. I have 10 hosts and one master. For each host I
have a datanode and tasktracer.
My mapred conf is 100 maps and 25 reducers. Belove the logs with errors.
Thanks
051107 144101 task_r_pd3ybk 0.224% reduce copy
051107 144102 Moving bad file
The attached patches for Generator.java and Injector.java allow a
specific temporary directory to be specified. This gives Nutch the full
path to these temporary directories and seems to fix the No input
directories issue when using a local filesystem with multiple task
trackers.
On Mon,
Rod Taylor wrote:
The attached patches for Generator.java and Injector.java allow a
specific temporary directory to be specified. This gives Nutch the full
path to these temporary directories and seems to fix the No input
directories issue when using a local filesystem with multiple task
On Mon, 2005-11-07 at 17:26 -0800, Paul Baclace wrote:
Rod Taylor wrote:
The attached patches for Generator.java and Injector.java allow a
specific temporary directory to be specified. This gives Nutch the full
path to these temporary directories and seems to fix the No input
directories
Rod Taylor wrote:
NDFS accomplishes the above path finding by auto-prefixing any path not
beginning with / with a /user/$USER. I didn't think it was appropriate
for LocalFileSystem.java to be mucking around trying to automatically
adjust paths to what the user may have intended.
Grep-ing for
On Mon, 2005-11-07 at 18:12 -0800, Paul Baclace wrote:
Rod Taylor wrote:
NDFS accomplishes the above path finding by auto-prefixing any path not
beginning with / with a /user/$USER. I didn't think it was appropriate
for LocalFileSystem.java to be mucking around trying to automatically
I tried running one datanode per machine connecting back to the
same SAN
but it seemed pretty clunky.
SAN in general is a bad idea. A SAN is too slow for a serious setup.
... and it is the single point of failure...
Better use many local hdd.
Stefan
Rod Taylor wrote:
Every segment that I fetch seems to be missing a part when stored on the
filesystem. The stranger thing is it is always the same part (very
reproducible).
This sounds strange. Are the datanode errors always on the same host?
How many hosts are you running this on?
Doug
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote:
Rod Taylor wrote:
Every segment that I fetch seems to be missing a part when stored on the
filesystem. The stranger thing is it is always the same part (very
reproducible).
This sounds strange. Are the datanode errors always on the
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote:
Rod Taylor wrote:
Every segment that I fetch seems to be missing a part when stored on the
filesystem. The stranger thing is it is always the same part (very
reproducible).
This sounds strange. Are the datanode errors always on the
Rod Taylor wrote:
There is only a single datanode and there are 20 hosts.
That's a lot of load on one datanode. I typically run a datanode on
every host, accessing the local drives on that host.
Doug
Rod Taylor wrote:
I tried running one datanode per machine connecting back to the same SAN
but it seemed pretty clunky. A crash of any datanode would take down
the entire system (no data replication since it's a common data-store in
the end). Reducing it to a single datanode did not have this
On Fri, 2005-11-04 at 19:43 -0800, Doug Cutting wrote:
Rod Taylor wrote:
I tried running one datanode per machine connecting back to the same SAN
but it seemed pretty clunky. A crash of any datanode would take down
the entire system (no data replication since it's a common data-store in
On Fri, 2005-11-04 at 22:57 -0500, Rod Taylor wrote:
On Fri, 2005-11-04 at 19:43 -0800, Doug Cutting wrote:
Rod Taylor wrote:
I tried running one datanode per machine connecting back to the same SAN
but it seemed pretty clunky. A crash of any datanode would take down
the entire system
Rod Taylor wrote:
Here you go. local filesystem and a single job tracker on another
machine. When the tasktracker and jobtracker are on the same box there
isn't a problem. When they are on different machines it runs into
issues.
This is using mapred.local.dir on the local machine (not sharedd
On Fri, 2005-11-04 at 20:41 -0800, Doug Cutting wrote:
Rod Taylor wrote:
Here you go. local filesystem and a single job tracker on another
machine. When the tasktracker and jobtracker are on the same box there
isn't a problem. When they are on different machines it runs into
issues.
Sources are from October 31st. Sun Standard Edition 1.5.0_02-b09 for
amd64
Every segment that I fetch seems to be missing a part when stored on the
filesystem. The stranger thing is it is always the same part (very
reproducible).
If I have mapred.reduce.tasks set to 20, the hole is at part 13.
I forgot to provide this earlier. Here is nutch ndfs -ls output for the
directory structure of a segment with a failed part-00013.
[EMAIL PROTECTED] ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133
051103 162002 parsing
21 matches
Mail list logo