[Nutch-dev] Re: mulitple website crawling

Feng (Michael) Ji Thu, 21 Jul 2005 18:53:16 -0700

hi there,

I already edit this file, so it is "*.*", means I
accept any websites.


If I use crawling command as following, where I
specify search depth.

"
bin/nutch admin db1 -create
bin/nutch inject db1 -urlfile urls-full
bin/nutch generate db1 segments1
s1=`ls -d segments1/2* | tail -1`
bin/nutch fetch $s1 >& m1.log
bin/nutch updatedb db1 $s1
bin/nutch generate db1 segments -topN 10000
bin/nutch index $s1
bin/nutch dedup segments1 dedup.tmp
"

thanks,

Michael

--- "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
wrote:

> Please check your crawl-urlfilter.txt. If you use
> older version of nutch 
> (e.g. 0.6 final), there is an entry, that specifies
> that, crawl only 
> from nutch.org.
> 
> Feng (Michael) Ji wrotte:
> 
> >hi there,
> >
> >If I put multiple web URL in the plain text file
> >"urls" in the following command, will it fetch
> >multiple website for me?
> >
> >"
> >bin/nutch crawl urls -dir crawl.test -depth 3 >&
> >crawl.log
> >"
> >
> >I tried it, but didn't get a return search result.
> >Anything I missed?
> >
> >thanks,
> >
> >Michael,
> >
> >
> >__________________________________________________
> >Do You Yahoo!?
> >Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> >http://mail.yahoo.com 
> >
> >
> >  
> >
> 
> 



                
____________________________________________________
Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs 
 


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: mulitple website crawling

Reply via email to