free disk space

2006-06-14 Thread anton
I'm using nutch v.0.8 and have 3 computers. Two of them have datanode and
tasktracker running, another one has name node and jobtracker running.  Do I
need more disk space with tasktrackers and jobtracker running, as  the
number of pages processed is growing along with the size of database? Would
I be able to add the 3d datanode when I run out of free disk space on those
computers with datanode installed?

How much free disk space do I need in order for task- and jobtrackers to
work properly?




No space left on device

2006-06-14 Thread anton

I'm using nutch v.0.8 and have 3 computers.
One of my tasktrakers always go down. 
This occurs during indexing (index crawl/indexes). On server with crashed
tasktracker now available 53G of free disk space and used only 11G.
How i can decide this problem? Why tasktarcker requires so much free space
on HDD?

Piece of Log with error:

060613 151840 task_0083_r_01_0 0.5% reduce  sort
060613 151841 task_0083_r_01_0 0.5% reduce  sort
060613 151842 task_0083_r_01_0 0.5% reduce  sort
060613 151843 task_0083_r_01_0 0.5% reduce  sort
060613 151844 task_0083_r_01_0 0.5% reduce  sort
060613 151845 task_0083_r_01_0 0.5% reduce  sort
060613 151846 task_0083_r_01_0 0.5% reduce  sort
060613 151847 task_0083_r_01_0 0.5% reduce  sort
060613 151847 SEVERE FSError, exiting: java.io.IOException: No space left on
device
060613 151847 task_0083_r_01_0  SEVERE FSError from child
060613 151847 task_0083_r_01_0 org.apache.hadoop.fs.FSError:
java.io.IOException: No space left on device
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile
Syst
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.fs.FSDataOutputStream$Summer.write(FSDataOutputStream.java
:69)
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre
am.j
060613 151847 task_0083_r_01_0  at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
060613 151847 task_0083_r_01_0  at
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
060613 151847 task_0083_r_01_0  at
java.io.DataOutputStream.flush(DataOutputStream.java:106)
060613 151847 task_0083_r_01_0  at
java.io.FilterOutputStream.close(FilterOutputStream.java:140)
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.io.SequenceFile$Sorter$SortPass.close(SequenceFile.java:59
8)
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.io.SequenceFile$Sorter.sortPass(SequenceFile.java:533)
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:519)
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:316)
060613 151847 task_0083_r_01_0  060613 151847 task_0083_r_01_0
at org.apache.hadoop.mapred.TaskTracker$Chi
060613 151847 task_0083_r_01_0 Caused by: java.io.IOException: No space
left on device
060613 151847 task_0083_r_01_0  at
java.io.FileOutputStream.writeBytes(Native Method)
060613 151847 task_0083_r_01_0  at
java.io.FileOutputStream.write(FileOutputStream.java:260)
060613 151848 task_0083_r_01_0  at
org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile
Syst
060613 151848 task_0083_r_01_0  ... 11 more
060613 151849 Server connection on port 50050 from 10.0.0.3: exiting
060613 151854 task_0083_m_01_0 done; removing files.
060613 151855 task_0083_m_03_0 done; removing files.





Re: No space left on device

2006-06-14 Thread Dennis Kubes
The tasktracker require intermediate space while performing the map 
and reduce functions.  Many smaller files are produced during the map 
and reduce processes that are deleted when the processes finish.  If you 
are using the DFS then more disk space is required then is actually used 
since disk space is grabbed in blocks.


Dennis

[EMAIL PROTECTED] wrote:

I'm using nutch v.0.8 and have 3 computers.
One of my tasktrakers always go down. 
This occurs during indexing (index crawl/indexes). On server with crashed

tasktracker now available 53G of free disk space and used only 11G.
How i can decide this problem? Why tasktarcker requires so much free space
on HDD?

Piece of Log with error:

060613 151840 task_0083_r_01_0 0.5% reduce  sort
060613 151841 task_0083_r_01_0 0.5% reduce  sort
060613 151842 task_0083_r_01_0 0.5% reduce  sort
060613 151843 task_0083_r_01_0 0.5% reduce  sort
060613 151844 task_0083_r_01_0 0.5% reduce  sort
060613 151845 task_0083_r_01_0 0.5% reduce  sort
060613 151846 task_0083_r_01_0 0.5% reduce  sort
060613 151847 task_0083_r_01_0 0.5% reduce  sort
060613 151847 SEVERE FSError, exiting: java.io.IOException: No space left on
device
060613 151847 task_0083_r_01_0  SEVERE FSError from child
060613 151847 task_0083_r_01_0 org.apache.hadoop.fs.FSError:
java.io.IOException: No space left on device
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile
Syst
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.fs.FSDataOutputStream$Summer.write(FSDataOutputStream.java
:69)
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre
am.j
060613 151847 task_0083_r_01_0  at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
060613 151847 task_0083_r_01_0  at
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
060613 151847 task_0083_r_01_0  at
java.io.DataOutputStream.flush(DataOutputStream.java:106)
060613 151847 task_0083_r_01_0  at
java.io.FilterOutputStream.close(FilterOutputStream.java:140)
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.io.SequenceFile$Sorter$SortPass.close(SequenceFile.java:59
8)
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.io.SequenceFile$Sorter.sortPass(SequenceFile.java:533)
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:519)
060613 151847 task_0083_r_01_0  at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:316)
060613 151847 task_0083_r_01_0  060613 151847 task_0083_r_01_0
at org.apache.hadoop.mapred.TaskTracker$Chi
060613 151847 task_0083_r_01_0 Caused by: java.io.IOException: No space
left on device
060613 151847 task_0083_r_01_0  at
java.io.FileOutputStream.writeBytes(Native Method)
060613 151847 task_0083_r_01_0  at
java.io.FileOutputStream.write(FileOutputStream.java:260)
060613 151848 task_0083_r_01_0  at
org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile
Syst
060613 151848 task_0083_r_01_0  ... 11 more
060613 151849 Server connection on port 50050 from 10.0.0.3: exiting
060613 151854 task_0083_m_01_0 done; removing files.
060613 151855 task_0083_m_03_0 done; removing files.



  


RE: No space left on device

2006-06-14 Thread anton
Yes, I use dfs. 
How configure nutch for decide problem with disk space? How control number
of smaller files? 

-Original Message-
From: Dennis Kubes [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 14, 2006 5:46 PM
To: nutch-dev@lucene.apache.org
Subject: Re: No space left on device
Importance: High

The tasktracker require intermediate space while performing the map 
and reduce functions.  Many smaller files are produced during the map 
and reduce processes that are deleted when the processes finish.  If you 
are using the DFS then more disk space is required then is actually used 
since disk space is grabbed in blocks.

Dennis

[EMAIL PROTECTED] wrote:
 I'm using nutch v.0.8 and have 3 computers.
 One of my tasktrakers always go down. 
 This occurs during indexing (index crawl/indexes). On server with crashed
 tasktracker now available 53G of free disk space and used only 11G.
 How i can decide this problem? Why tasktarcker requires so much free space
 on HDD?

 Piece of Log with error:

 060613 151840 task_0083_r_01_0 0.5% reduce  sort
 060613 151841 task_0083_r_01_0 0.5% reduce  sort
 060613 151842 task_0083_r_01_0 0.5% reduce  sort
 060613 151843 task_0083_r_01_0 0.5% reduce  sort
 060613 151844 task_0083_r_01_0 0.5% reduce  sort
 060613 151845 task_0083_r_01_0 0.5% reduce  sort
 060613 151846 task_0083_r_01_0 0.5% reduce  sort
 060613 151847 task_0083_r_01_0 0.5% reduce  sort
 060613 151847 SEVERE FSError, exiting: java.io.IOException: No space left
on
 device
 060613 151847 task_0083_r_01_0  SEVERE FSError from child
 060613 151847 task_0083_r_01_0 org.apache.hadoop.fs.FSError:
 java.io.IOException: No space left on device
 060613 151847 task_0083_r_01_0  at

org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile
 Syst
 060613 151847 task_0083_r_01_0  at

org.apache.hadoop.fs.FSDataOutputStream$Summer.write(FSDataOutputStream.java
 :69)
 060613 151847 task_0083_r_01_0  at

org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStre
 am.j
 060613 151847 task_0083_r_01_0  at
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 060613 151847 task_0083_r_01_0  at
 java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 060613 151847 task_0083_r_01_0  at
 java.io.DataOutputStream.flush(DataOutputStream.java:106)
 060613 151847 task_0083_r_01_0  at
 java.io.FilterOutputStream.close(FilterOutputStream.java:140)
 060613 151847 task_0083_r_01_0  at

org.apache.hadoop.io.SequenceFile$Sorter$SortPass.close(SequenceFile.java:59
 8)
 060613 151847 task_0083_r_01_0  at
 org.apache.hadoop.io.SequenceFile$Sorter.sortPass(SequenceFile.java:533)
 060613 151847 task_0083_r_01_0  at
 org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:519)
 060613 151847 task_0083_r_01_0  at
 org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:316)
 060613 151847 task_0083_r_01_0  060613 151847 task_0083_r_01_0
 at org.apache.hadoop.mapred.TaskTracker$Chi
 060613 151847 task_0083_r_01_0 Caused by: java.io.IOException: No
space
 left on device
 060613 151847 task_0083_r_01_0  at
 java.io.FileOutputStream.writeBytes(Native Method)
 060613 151847 task_0083_r_01_0  at
 java.io.FileOutputStream.write(FileOutputStream.java:260)
 060613 151848 task_0083_r_01_0  at

org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFile
 Syst
 060613 151848 task_0083_r_01_0  ... 11 more
 060613 151849 Server connection on port 50050 from 10.0.0.3: exiting
 060613 151854 task_0083_m_01_0 done; removing files.
 060613 151855 task_0083_m_03_0 done; removing files.



   




Re: [jira] Resolved: (NUTCH-303) logging improvements

2006-06-14 Thread Sami Siren



That's a good news.
Sami, I have not made changes to web2. Do you want that I switch web2 to
Commons Logging?

I am allready working with it and unfortunately facing some classloading 
issues.

Hopefully the solution will come up sooner than later.

--
Sami Siren


IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-14 Thread Doug Cutting

http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.html


nutch .72 out-of-the-box build issue

2006-06-14 Thread Dagum, Leo
Apologies if this is the wrong forum..

 

Just downloaded the nutch .72 release and tried building, using
jdk1.5.0_03 and ant 1.6.5.  Core classes compile fine but the build
script failed from trying to compile the nutch-extensionpoints plugin:

 

=

$ ant

Buildfile: build.xml

 

init:

[mkdir] Created dir: C:\nutch\nutch-0.7.2\build

[mkdir] Created dir: C:\nutch\nutch-0.7.2\build\classes

[mkdir] Created dir: C:\nutch\nutch-0.7.2\build\test

[mkdir] Created dir: C:\nutch\nutch-0.7.2\build\test\classes

 

compile-core:

[javac] Compiling 247 source files to
C:\nutch\nutch-0.7.2\build\classes

[javac] Note: * uses or overrides a deprecated API.

[javac] Note: Recompile with -Xlint:deprecation for details.

[javac] Note: Some input files use unchecked or unsafe operations.

[javac] Note: Recompile with -Xlint:unchecked for details.

 

compile-plugins:

 

deploy:

 

init:

[mkdir] Created dir:
C:\nutch\nutch-0.7.2\build\nutch-extensionpoints

[mkdir] Created dir:
C:\nutch\nutch-0.7.2\build\nutch-extensionpoints\classes

[mkdir] Created dir:
C:\nutch\nutch-0.7.2\build\nutch-extensionpoints\test

 

init-plugin:

compile:

 [echo] Compiling plugin: nutch-extensionpoints

 

BUILD FAILED

C:\nutch\nutch-0.7.2\build.xml:76: The following error occurred while
executing this line:

C:\nutch\nutch-0.7.2\src\plugin\build.xml:9: The following error
occurred while executing this line:

C:\nutch\nutch-0.7.2\src\plugin\build-plugin.xml:85: srcdir
C:\nutch\nutch-0.7.2\src\plugin\nutch-extensionpoints\src\java does
not exist!



 

 

Anyone else experience this?  Would appreciate any advice.

 

Thanks

 

- leo

 

 



Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-14 Thread Matt Kangas
Heh. Perhaps we should eliminate the default user-agent string? Then  
he'd have less of a target to aim at... :)


On a more serious note, it seems reasonable to require a customized  
bot URL at least. But publishing an email contact is questionable  
these days. Neither Y! nor G do it, precisely because it will just  
get spammed. (Wait until a spam-bot crawls this blogspot page and  
starts hammering nutch-agent...)



On Jun 14, 2006, at 1:03 PM, Doug Cutting wrote:

http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much- 
nutch.html


--
Matt Kangas / [EMAIL PROTECTED]





RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-14 Thread Wootton, Alan
The 'bot blocker' image server at blogspot is broken so it's impossible to 
reply to this blog!

-Original Message-
From: Matt Kangas [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 14, 2006 10:38 AM
To: nutch-dev@lucene.apache.org
Subject: Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH
Nutch?


Heh. Perhaps we should eliminate the default user-agent string? Then  
he'd have less of a target to aim at... :)

On a more serious note, it seems reasonable to require a customized  
bot URL at least. But publishing an email contact is questionable  
these days. Neither Y! nor G do it, precisely because it will just  
get spammed. (Wait until a spam-bot crawls this blogspot page and  
starts hammering nutch-agent...)


On Jun 14, 2006, at 1:03 PM, Doug Cutting wrote:

 http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much- 
 nutch.html

--
Matt Kangas / [EMAIL PROTECTED]