Hii All,

       I tried to crawl my local filesystem and got the following error.
I  am using Windows NT and nutch-0.8.1

       I have modified my  crawl-urlfilter.txt  entry  as  follows:

# skip http:, ftp:, ,:https:& mailto: urls
-^(http|ftp|mailto|https):

# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|png)$

# skip URLs containing certain characters as probable queries, etc.
[EMAIL PROTECTED]

# skip URLs with slash-delimited segment that repeats 3+ times, to break
loops
-.*(/.+?)/.*?\1/.*?\1/

# accept hosts in MY.DOMAIN.NAME
#+^http://([a-z0-9]*\.)*apache.org/

# skip everything else
#-. #Changed

# accept anything else
+.*
-------------------------------------------------------------------------------------------------

      In nutch_site.xml I have added the plug in for file as follows.

<property>
<name>plugin.includes</name>

<value>protocol-file|protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)</value>
</property>
<property>
-------------------------------------------------------------------------------------------------
       my urls containing

file:///C:/check/
-------------------------------------------------------------------------------------------------

         The error is listed below, no protocol found for url=file

Injector: starting
Injector: crawlDb: localfs/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: starting
Generator: segment: localfs/segments/20070126152212
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: localfs/segments/20070126152212
Fetcher: threads: 10
fetching file:///c:/check/
fetch of file:///c:/check/ failed with:
org.apache.nutch.protocol.ProtocolNotFou
nd: protocol not found for url=file
Fetcher: done
---------------------------------------------------------------------------------------------------------

            Please any one help me, Thanks in advance.  Its very urgent
too.  Is there any other things to be done?

Regards,
         Abhilash
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to