Can someone help a newbie??!!
I've setup nutch according to the tutorial http://www.nutch.org/docs/en/tutorial.html created a flat file called urls, edited conf/crawl-urlfilter.txt
If that's the file that has the URLs, then you need to change the command you ran. Change:
[EMAIL PROTECTED] nutch-0.5]# bin/nutch crawl urls -dir crawl.test -depth 3TO
[EMAIL PROTECTED] nutch-0.5]# bin/nutch crawl conf/crawl-urlfilter.txt -dir crawl.test -depth 3
(All one line of course.) If you want to see all available arguments or figure out what a particular one is, you can just run the command with no arguments.
$ bin/nutch crawl
Usage: CrawlTool (-local | -ndfs <nameserver:port>) <root_url_file> [-dir d] [-threads n] [-depth i] [-showThreadID]
Luke
but when I run the command I get:
[EMAIL PROTECTED] nutch-0.5]# bin/nutch crawl urls -dir crawl.test -depth 3 041026 102539 loading file:/root/install/nutch-0.5/conf/nutch-default.xml 041026 102539 loading file:/root/install/nutch-0.5/conf/crawl-tool.xml 041026 102539 loading file:/root/install/nutch-0.5/conf/nutch-site.xml 041026 102539 crawl started in: crawl.test 041026 102539 rootUrlFile = urls 041026 102539 threads = 10 041026 102539 depth = 3 Exception in thread "main" java.io.IOException: Invalid argument at sun.nio.ch.FileChannelImpl.lock0(Native Method) at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:490) at net.nutch.db.WebDBWriter.<init>(WebDBWriter.java:1464) at net.nutch.db.WebDBWriter.createWebDB(WebDBWriter.java:1424) at net.nutch.tools.WebDBAdminTool.main(WebDBAdminTool.java:157) at net.nutch.tools.CrawlTool.main(CrawlTool.java:84)
Can anybody highlight the rookie mistake?
Many thanks, Michael.
This email, and any attachment, is confidential to the addressee. If you have received this email and are not an authorised recipient please notify the sender and delete this message from your system. If you are not an authorised recipient you must not use, disclose, distribute, copy, print or rely on this email.
Email transmission cannot be guaranteed to be secure, error-free or virus-free. Although World Markets Research Centre ("WMRC plc") routinely screens for viruses you are responsible for checking this email and any attachments for viruses and WMRC plc accepts no responsibility for any damage caused to your systems or for loss of data caused by any virus. WMRC plc does not accept liability resulting from errors or omissions in the content of this message following email transmission. If verification is required please request a hard copy version.
If this email is of a personal nature any views expressed are solely those of the author and are not made in the course of the author's employment with WMRC.
------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
