free disk space
I'm using nutch v.0.8 and have 3 computers. Two of them have datanode and tasktracker running, another one has name node and jobtracker running. Do I need more disk space with tasktrackers and jobtracker running, as the number of pages processed is growing along with the size of database? Would I be able to add the 3d datanode when I run out of free disk space on those computers with datanode installed? How much free disk space do I need in order for task- and jobtrackers to work properly?
No space left on device
I'm using nutch v.0.8 and have 3 computers. One of my tasktrakers always go down. This occurs during indexing (index crawl/indexes). On server with crashed tasktracker now available 53G of free disk space and used only 11G. How i can decide this problem? Why tasktarcker requires so much free space on HDD? Piece of Log with error: 060613 151840 task_0083_r_01_0 0.5% reduce sort 060613 151841 task_0083_r_01_0 0.5% reduce sort 060613 151842 task_0083_r_01_0 0.5% reduce sort 060613 151843 task_0083_r_01_0 0.5% reduce sort 060613 151844 task_0083_r_01_0 0.5% reduce sort 060613 151845 task_0083_r_01_0 0.5% reduce sort 060613 151846 task_0083_r_01_0 0.5% reduce sort 060613 151847 task_0083_r_01_0 0.5% reduce sort 060613 151847 SEVERE FSError, exiting: java.io.IOException: No space left on device 060613 151847 task_0083_r_01_0 SEVERE FSError from child 060613 151847 task_0083_r_01_0 org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device 060613 151847 task_0083_r_01_0 at org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile Syst 060613 151847 task_0083_r_01_0 at org.apache.hadoop.fs.FSDataOutputStream$Summer.write(FSDataOutputStream.java :69) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre am.j 060613 151847 task_0083_r_01_0 at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) 060613 151847 task_0083_r_01_0 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) 060613 151847 task_0083_r_01_0 at java.io.DataOutputStream.flush(DataOutputStream.java:106) 060613 151847 task_0083_r_01_0 at java.io.FilterOutputStream.close(FilterOutputStream.java:140) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.io.SequenceFile$Sorter$SortPass.close(SequenceFile.java:59 8) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.io.SequenceFile$Sorter.sortPass(SequenceFile.java:533) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:519) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:316) 060613 151847 task_0083_r_01_0 060613 151847 task_0083_r_01_0 at org.apache.hadoop.mapred.TaskTracker$Chi 060613 151847 task_0083_r_01_0 Caused by: java.io.IOException: No space left on device 060613 151847 task_0083_r_01_0 at java.io.FileOutputStream.writeBytes(Native Method) 060613 151847 task_0083_r_01_0 at java.io.FileOutputStream.write(FileOutputStream.java:260) 060613 151848 task_0083_r_01_0 at org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile Syst 060613 151848 task_0083_r_01_0 ... 11 more 060613 151849 Server connection on port 50050 from 10.0.0.3: exiting 060613 151854 task_0083_m_01_0 done; removing files. 060613 151855 task_0083_m_03_0 done; removing files.
Re: No space left on device
The tasktracker require intermediate space while performing the map and reduce functions. Many smaller files are produced during the map and reduce processes that are deleted when the processes finish. If you are using the DFS then more disk space is required then is actually used since disk space is grabbed in blocks. Dennis [EMAIL PROTECTED] wrote: I'm using nutch v.0.8 and have 3 computers. One of my tasktrakers always go down. This occurs during indexing (index crawl/indexes). On server with crashed tasktracker now available 53G of free disk space and used only 11G. How i can decide this problem? Why tasktarcker requires so much free space on HDD? Piece of Log with error: 060613 151840 task_0083_r_01_0 0.5% reduce sort 060613 151841 task_0083_r_01_0 0.5% reduce sort 060613 151842 task_0083_r_01_0 0.5% reduce sort 060613 151843 task_0083_r_01_0 0.5% reduce sort 060613 151844 task_0083_r_01_0 0.5% reduce sort 060613 151845 task_0083_r_01_0 0.5% reduce sort 060613 151846 task_0083_r_01_0 0.5% reduce sort 060613 151847 task_0083_r_01_0 0.5% reduce sort 060613 151847 SEVERE FSError, exiting: java.io.IOException: No space left on device 060613 151847 task_0083_r_01_0 SEVERE FSError from child 060613 151847 task_0083_r_01_0 org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device 060613 151847 task_0083_r_01_0 at org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile Syst 060613 151847 task_0083_r_01_0 at org.apache.hadoop.fs.FSDataOutputStream$Summer.write(FSDataOutputStream.java :69) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre am.j 060613 151847 task_0083_r_01_0 at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) 060613 151847 task_0083_r_01_0 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) 060613 151847 task_0083_r_01_0 at java.io.DataOutputStream.flush(DataOutputStream.java:106) 060613 151847 task_0083_r_01_0 at java.io.FilterOutputStream.close(FilterOutputStream.java:140) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.io.SequenceFile$Sorter$SortPass.close(SequenceFile.java:59 8) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.io.SequenceFile$Sorter.sortPass(SequenceFile.java:533) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:519) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:316) 060613 151847 task_0083_r_01_0 060613 151847 task_0083_r_01_0 at org.apache.hadoop.mapred.TaskTracker$Chi 060613 151847 task_0083_r_01_0 Caused by: java.io.IOException: No space left on device 060613 151847 task_0083_r_01_0 at java.io.FileOutputStream.writeBytes(Native Method) 060613 151847 task_0083_r_01_0 at java.io.FileOutputStream.write(FileOutputStream.java:260) 060613 151848 task_0083_r_01_0 at org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile Syst 060613 151848 task_0083_r_01_0 ... 11 more 060613 151849 Server connection on port 50050 from 10.0.0.3: exiting 060613 151854 task_0083_m_01_0 done; removing files. 060613 151855 task_0083_m_03_0 done; removing files.
RE: No space left on device
Yes, I use dfs. How configure nutch for decide problem with disk space? How control number of smaller files? -Original Message- From: Dennis Kubes [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 14, 2006 5:46 PM To: nutch-dev@lucene.apache.org Subject: Re: No space left on device Importance: High The tasktracker require intermediate space while performing the map and reduce functions. Many smaller files are produced during the map and reduce processes that are deleted when the processes finish. If you are using the DFS then more disk space is required then is actually used since disk space is grabbed in blocks. Dennis [EMAIL PROTECTED] wrote: I'm using nutch v.0.8 and have 3 computers. One of my tasktrakers always go down. This occurs during indexing (index crawl/indexes). On server with crashed tasktracker now available 53G of free disk space and used only 11G. How i can decide this problem? Why tasktarcker requires so much free space on HDD? Piece of Log with error: 060613 151840 task_0083_r_01_0 0.5% reduce sort 060613 151841 task_0083_r_01_0 0.5% reduce sort 060613 151842 task_0083_r_01_0 0.5% reduce sort 060613 151843 task_0083_r_01_0 0.5% reduce sort 060613 151844 task_0083_r_01_0 0.5% reduce sort 060613 151845 task_0083_r_01_0 0.5% reduce sort 060613 151846 task_0083_r_01_0 0.5% reduce sort 060613 151847 task_0083_r_01_0 0.5% reduce sort 060613 151847 SEVERE FSError, exiting: java.io.IOException: No space left on device 060613 151847 task_0083_r_01_0 SEVERE FSError from child 060613 151847 task_0083_r_01_0 org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device 060613 151847 task_0083_r_01_0 at org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile Syst 060613 151847 task_0083_r_01_0 at org.apache.hadoop.fs.FSDataOutputStream$Summer.write(FSDataOutputStream.java :69) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre am.j 060613 151847 task_0083_r_01_0 at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) 060613 151847 task_0083_r_01_0 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) 060613 151847 task_0083_r_01_0 at java.io.DataOutputStream.flush(DataOutputStream.java:106) 060613 151847 task_0083_r_01_0 at java.io.FilterOutputStream.close(FilterOutputStream.java:140) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.io.SequenceFile$Sorter$SortPass.close(SequenceFile.java:59 8) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.io.SequenceFile$Sorter.sortPass(SequenceFile.java:533) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:519) 060613 151847 task_0083_r_01_0 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:316) 060613 151847 task_0083_r_01_0 060613 151847 task_0083_r_01_0 at org.apache.hadoop.mapred.TaskTracker$Chi 060613 151847 task_0083_r_01_0 Caused by: java.io.IOException: No space left on device 060613 151847 task_0083_r_01_0 at java.io.FileOutputStream.writeBytes(Native Method) 060613 151847 task_0083_r_01_0 at java.io.FileOutputStream.write(FileOutputStream.java:260) 060613 151848 task_0083_r_01_0 at org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile Syst 060613 151848 task_0083_r_01_0 ... 11 more 060613 151849 Server connection on port 50050 from 10.0.0.3: exiting 060613 151854 task_0083_m_01_0 done; removing files. 060613 151855 task_0083_m_03_0 done; removing files.
Re: [jira] Resolved: (NUTCH-303) logging improvements
That's a good news. Sami, I have not made changes to web2. Do you want that I switch web2 to Commons Logging? I am allready working with it and unfortunately facing some classloading issues. Hopefully the solution will come up sooner than later. -- Sami Siren
IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?
http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.html
nutch .72 out-of-the-box build issue
Apologies if this is the wrong forum.. Just downloaded the nutch .72 release and tried building, using jdk1.5.0_03 and ant 1.6.5. Core classes compile fine but the build script failed from trying to compile the nutch-extensionpoints plugin: = $ ant Buildfile: build.xml init: [mkdir] Created dir: C:\nutch\nutch-0.7.2\build [mkdir] Created dir: C:\nutch\nutch-0.7.2\build\classes [mkdir] Created dir: C:\nutch\nutch-0.7.2\build\test [mkdir] Created dir: C:\nutch\nutch-0.7.2\build\test\classes compile-core: [javac] Compiling 247 source files to C:\nutch\nutch-0.7.2\build\classes [javac] Note: * uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compile-plugins: deploy: init: [mkdir] Created dir: C:\nutch\nutch-0.7.2\build\nutch-extensionpoints [mkdir] Created dir: C:\nutch\nutch-0.7.2\build\nutch-extensionpoints\classes [mkdir] Created dir: C:\nutch\nutch-0.7.2\build\nutch-extensionpoints\test init-plugin: compile: [echo] Compiling plugin: nutch-extensionpoints BUILD FAILED C:\nutch\nutch-0.7.2\build.xml:76: The following error occurred while executing this line: C:\nutch\nutch-0.7.2\src\plugin\build.xml:9: The following error occurred while executing this line: C:\nutch\nutch-0.7.2\src\plugin\build-plugin.xml:85: srcdir C:\nutch\nutch-0.7.2\src\plugin\nutch-extensionpoints\src\java does not exist! Anyone else experience this? Would appreciate any advice. Thanks - leo
Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?
Heh. Perhaps we should eliminate the default user-agent string? Then he'd have less of a target to aim at... :) On a more serious note, it seems reasonable to require a customized bot URL at least. But publishing an email contact is questionable these days. Neither Y! nor G do it, precisely because it will just get spammed. (Wait until a spam-bot crawls this blogspot page and starts hammering nutch-agent...) On Jun 14, 2006, at 1:03 PM, Doug Cutting wrote: http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much- nutch.html -- Matt Kangas / [EMAIL PROTECTED]
RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?
The 'bot blocker' image server at blogspot is broken so it's impossible to reply to this blog! -Original Message- From: Matt Kangas [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 14, 2006 10:38 AM To: nutch-dev@lucene.apache.org Subject: Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch? Heh. Perhaps we should eliminate the default user-agent string? Then he'd have less of a target to aim at... :) On a more serious note, it seems reasonable to require a customized bot URL at least. But publishing an email contact is questionable these days. Neither Y! nor G do it, precisely because it will just get spammed. (Wait until a spam-bot crawls this blogspot page and starts hammering nutch-agent...) On Jun 14, 2006, at 1:03 PM, Doug Cutting wrote: http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much- nutch.html -- Matt Kangas / [EMAIL PROTECTED]