Hi,
Are you guys able to run step-by-step crawl on 0.8 successfully?
I am using Nutch 0.8 (3/31 build) and using DFS. I followed the 0.8 tutorial
for step-by-step crawling and got errors for updatadb. I used two reduce
tasks and two map tasks. Here are the exact steps that I did:
1. bin/nutch inject test/crawldb urls
2. bin/nutch generate test/crawldb test/segments
3. bin/nutch fetch test/segments/20060415143555
4. bin/nutch updatedb test/crawldb test/segments/20060415143555
Fetch one more round:
5. bin/nutch generate test/crawldb test/segments -topN 100
6. bin/nutch fetch test/segments/20060415150130
7. bin/nutch updatedb test/crawldb test/segments/20060415150130
Fetch one more round:
8. bin/nutch generate test/crawldb test/segments -topN 100
9. bin/nutch fetch test/segments/20060415151309
The the steps above ran successfully and I kept checking the directories in
DFS
and doing nutch readdb and everything appeared to be fine.
Then:
10. bin/nutch updatedb test/crawldb test/segments/20060415151309
It failed with the following error for the two reduce tasks (the following
log was for one
of the two tasks):
java.rmi.RemoteException: java.io.IOException: Cannot create file
/user/root/test/crawldb/670052811/part-00000/data on client
DFSClient_-1133147307 at
org.apache.hadoop.dfs.NameNode.create(NameNode.java:137) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:615) at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237) at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:216) at
org.apache.hadoop.ipc.Client.call(Client.java:303) at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141) at
org.apache.hadoop.dfs.$Proxy1.create(Unknown Source) at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:587)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:554) at
org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:99) at
org.apache.hadoop.dfs.DistributedFileSystem.createRaw(DistributedFileSystem.java:83)
at
org.apache.hadoop.fs.FSDataOutputStream$Summer.(FSDataOutputStream.java:39)
at org.apache.hadoop.fs.FSDataOutputStream.(FSDataOutputStream.java:128) at
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:180) at
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:168) at
org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:96) at
org.apache.hadoop.io.MapFile$Writer.(MapFile.java:101) at
org.apache.hadoop.io.MapFile$Writer.(MapFile.java:76) at
org.apache.hadoop.mapred.MapFileOutputFormat.getRecordWriter(MapFileOutputFormat.java:38)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:265) at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:709)
Anything wrong with my steps? Is this a known bug?
Thank you for your help.
Olive
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general