Hi Renaud

Firstly Thanx for the reply...

Yes i have read about the issues and did the following....

1) copied JCIFS jar fom protocol-smb to JAVA_HOME/jre/lib/ext 
2) Have set the JVM options to "-Djava.protocol.handler.pkgs=jcifs" in the
profile only

but same error

Skipping smb://192.168.0.1:java.net.MalformedURLException: unknown protocol:
smb

Even the File is not working

file:///root/test.txt failed with:
org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=file

thanx 
Bikram


Renaud Richardet-4 wrote:
> 
> hi Bikram,
> 
> - have you read the issues described in 
> http://issues.apache.org/jira/browse/NUTCH-427?
> - try to increase the log level of the plugin loader, to see if all 
> plugins are loaded successfully
> 
> HTH,
> Renaud
> 
> 
> bikram wrote:
>> Hi all
>>
>> I am new to nutch.. 
>>
>> I have downloaded Nutch 9.0
>>
>>
>> I want to crawl my local network (Windows shares & Linux  shares)
>>
>> tried this link as referance
>> http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch 
>>
>>
>> 1) Downloaded the  protocol-smb
>>
>> http://issues.apache.org/jira/browse/NUTCH-427
>>
>> 2) Made following changes in crawler-urlfilter.txt
>>
>> # skip file:, ftp:, & mailto: urls
>> -^(http|ftp|mailto):
>>
>> # skip image and other suffixes we can't yet parse
>> -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|mpg|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$
>>
>> # skip URLs containing certain characters as probable queries, etc.
>> [EMAIL PROTECTED]
>>
>> # skip URLs with slash-delimited segment that repeats 3+ times, to
>>  break loops
>> [EMAIL PROTECTED]
>>
>> # skip everything else
>> # -.
>>
>> # accept anything else 
>> +.*
>>
>>
>> 3) Made following changes in nutch-site.xml
>>
>> <property>
>>   <name>plugin.includes</name>
>>               
>>  
>> <value>nutch-extensionpoints|protocol-smb|protocol-file|urlfilter-regex|parse-(text|html|js|pdf|msword|zip|mspowerpoint|msexcel)|index-basic|query-(basic|sit
>> e|url)</value>
>>   <description></description>
>> </property>
>>
>>
>>
>> 4) the urls file consists smb:hostnames/shares
>>
>> 5) The windows login details >> username/password/ip address etc are
>>  entered in smb.properties
>>
>> 6) bin/nutch crawl urls -dir localcrawl  give error
>>
>> smb://192.168.0.1/:java.net.MalformedURLException: unknown protocol:
>>  smb
>>
>> 7) Tried crawling Files but got following error
>>
>> file:///var/test.txt failed with:
>>  org.apache.nutch.protocol.ProtocolNotFound: protocol not found for
>> url=file
>>
>> Is the above setting correct to crawl local windows shares
>>
>>                                 
>> Can some one guide me what to do ... where am i wrong???
>>
>> Thanx
>>
>> Bikram
>>   
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Windows-Share-Crawling---searching-tf4277499.html#a12193969
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to