Update of /cvsroot/nutch/nutch/bin
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv2281/bin
Modified Files:
nutch
Log Message:
Added a new command, crawl, that constructs a database, injects a url
file and performs a few rounds of generate/fetch/updatedb. This
simplifies use for intranet sites. Changed some defaults to be
more intranet friendly.
Also fixed a bug where Fetcher.java didn't construct correct relative links
when a page was redirected.
Index: nutch
===================================================================
RCS file: /cvsroot/nutch/nutch/bin/nutch,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** nutch 18 Sep 2003 20:02:55 -0000 1.28
--- nutch 21 Apr 2004 22:51:50 -0000 1.29
***************
*** 30,33 ****
--- 30,34 ----
echo "Usage: nutch COMMAND"
echo "where COMMAND is one of:"
+ echo " crawl one-step crawler for intranets"
echo " admin database administration, including creation"
echo " inject inject new urls into the database"
***************
*** 99,103 ****
# figure out which class to run
! if [ "$COMMAND" = "admin" ] ; then
CLASS=net.nutch.tools.WebDBAdminTool
elif [ "$COMMAND" = "inject" ] ; then
--- 100,106 ----
# figure out which class to run
! if [ "$COMMAND" = "crawl" ] ; then
! CLASS=net.nutch.tools.CrawlTool
! elif [ "$COMMAND" = "admin" ] ; then
CLASS=net.nutch.tools.WebDBAdminTool
elif [ "$COMMAND" = "inject" ] ; then
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Nutch-cvs mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-cvs