how to parse (only text) web sites while crawling

cefurkan0 cefurkan0 Tue, 06 Apr 2010 08:41:25 -0700

i can succesfully run crawl command via cygwin on windows xp. and i can also
make web search via using tomcat.


but i also want to save parsed pages during crawling event

so when i start crawling with like this

bin/nutch crawl urls -dir crawled -depth 3

i also want save parsed html files to text files

i mean during this period which i started with above command

nutch when fetched a page it will also automaticly save that page parsed
(only text) to text files

these files names could be fetched url

i really need help about this

this will be used at my university language detection project

ty

how to parse (only text) web sites while crawling

Reply via email to