[Nutch-dev] Re: bug in bin/nutch?

Andrzej Bialecki Fri, 09 Sep 2005 03:23:22 -0700

Earl Cahill wrote:

Guess I figured as much.  Can I suggest that someone

typing

bin/nutch admin ...


in the mappred branch, should get pointed to the
proper command, or at least a message saying that

There is no separate command - for now the DB is created when you runInjector or Crawl (which calls Injector as the first step). Othercommands from the script should work very similarly, even though theyuse now different implementations:

* inject - runs Injector to add urls from a plaintext file (one url perline, there may be many input files, and they must be placed inside adirectory). This creates the CrawlDB in the destination directory if itdidn't exist before, or updates the existing one. Note that the newCrawlDB does NOT contain links - they are stored separately in a LinkDB,and CrawlDB just stores the equivalents of Page in the former WebDB.


* generate - runs Generate to create new fetchlists to be fetched

* fetch - runs the modified Fetcher to fetch segments

* updatedb - runs CrawlDB.update() to update the CrawlDB with new pageinformation, and to add new unfetched pages.

* invertlinks - creates or updates a LinkDB, containing incoming linkinformation. Note that it takes as an argument the top level dir, wherethe new segments are contained, and not the dir names of segments...

* index - runs the new modified Indexer to create an index of thefetched segments.

The above commands read the mapred configuration, and for now itdefaults to "local", which means that all Jobs execute within the sameJVM, and NDFS also defaults to local. The rest of the commands inbin/nutch have to do with a distributed setup.

admin doesn't exist in the mapred branch, just to save
some confusion.  There is a dumb patch below that
would change the usage line.

I think such differences are all the more reason to
have a nice mapred tutorial, which I would be more
than willing to help with.  I thought I was close, but

Yes, I agree. But there are still some command-line tools missing, ornot yet ported to use mapred. At this point a general tutorial would bedifficult... unless it would be simply "you need to run ./nutch crawl" ...


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: bug in bin/nutch?

Reply via email to