I have a problem with partitioning. After selecting phase of generating
which completed succesfully i got a following exception:

Generator: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
Cannot open filename
/tmp/hadoop-nutch/mapred/temp/generate-temp-1194364974836/_task_200711051139_0323_r_000007_0
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:238)
        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

        at org.apache.hadoop.ipc.Client.call(Client.java:482)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
        at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(
RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(
RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(
DFSClient.java:848)
        at org.apache.hadoop.dfs.DFSClient$DFSInputStream.<init>(
DFSClient.java:840)
        at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:285)
        at org.apache.hadoop.dfs.DistributedFileSystem.open(
DistributedFileSystem.java:114)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java
:1356)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java
:1349)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java
:1344)
        at org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(
SequenceFileOutputFormat.java:87)
        at org.apache.nutch.crawl.Generator.generate(Generator.java:429)
        at org.apache.nutch.crawl.Generator.run(Generator.java:563)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
        at org.apache.nutch.crawl.Generator.main(Generator.java:526)


On my client node in taskctracker log i found a following exception

org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
task_200711051139_0322_m_000064_0/file.out.index in any of the configured
local director
ies
        at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead
(LocalDirAllocator.java:327)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(
LocalDirAllocator.java:138)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(
TaskTracker.java:1923)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java
:427)
        at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(
WebApplicationHandler.java:475)
        at org.mortbay.jetty.servlet.ServletHandler.handle(
ServletHandler.java:567)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
        at org.mortbay.jetty.servlet.WebApplicationContext.handle(
WebApplicationContext.java:635)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
        at org.mortbay.http.HttpServer.service(HttpServer.java:954)
        at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
        at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java
:981)
        at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
        at org.mortbay.http.SocketListener.handleConnection(
SocketListener.java:244)
        at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
        at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)


The weird thing is that second time i run the batch it worked ok, i'm
generating 500 000 urls from a db of about 550 000.

Could it have something to do with open files limit ?

-- 
Karol Rybak
Programista / Programmer
Sekcja aplikacji / Applications section
Wyższa Szkoła Informatyki i Zarządzania / University of Internet Technology
and Management
+48(17)8661277

Reply via email to