hi..

- try to increase the log level of the plugin loader, to see if all plugins
are loaded successfully

sorry for being so naive 

How to increase the log level of the plugin loader ??

thanx
Bikram



bikram wrote:
> 
> Hi Renaud
> 
> Firstly Thanx for the reply...
> 
> Yes i have read about the issues and did the following....
> 
> 1) copied JCIFS jar fom protocol-smb to JAVA_HOME/jre/lib/ext 
> 2) Have set the JVM options to "-Djava.protocol.handler.pkgs=jcifs" in the
> profile only
> 
> but same error
> 
> Skipping smb://192.168.0.1:java.net.MalformedURLException: unknown
> protocol: smb
> 
> Even the File is not working
> 
> file:///root/test.txt failed with:
> org.apache.nutch.protocol.ProtocolNotFound: protocol not found for
> url=file
> 
> thanx 
> Bikram
> 
> 
> Renaud Richardet-4 wrote:
>> 
>> hi Bikram,
>> 
>> - have you read the issues described in 
>> http://issues.apache.org/jira/browse/NUTCH-427?
>> - try to increase the log level of the plugin loader, to see if all 
>> plugins are loaded successfully
>> 
>> HTH,
>> Renaud
>> 
>> 
>> bikram wrote:
>>> Hi all
>>>
>>> I am new to nutch.. 
>>>
>>> I have downloaded Nutch 9.0
>>>
>>>
>>> I want to crawl my local network (Windows shares & Linux  shares)
>>>
>>> tried this link as referance
>>> http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch 
>>>
>>>
>>> 1) Downloaded the  protocol-smb
>>>
>>> http://issues.apache.org/jira/browse/NUTCH-427
>>>
>>> 2) Made following changes in crawler-urlfilter.txt
>>>
>>> # skip file:, ftp:, & mailto: urls
>>> -^(http|ftp|mailto):
>>>
>>> # skip image and other suffixes we can't yet parse
>>> -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|mpg|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$
>>>
>>> # skip URLs containing certain characters as probable queries, etc.
>>> [EMAIL PROTECTED]
>>>
>>> # skip URLs with slash-delimited segment that repeats 3+ times, to
>>>  break loops
>>> [EMAIL PROTECTED]
>>>
>>> # skip everything else
>>> # -.
>>>
>>> # accept anything else 
>>> +.*
>>>
>>>
>>> 3) Made following changes in nutch-site.xml
>>>
>>> <property>
>>>   <name>plugin.includes</name>
>>>               
>>>  
>>> <value>nutch-extensionpoints|protocol-smb|protocol-file|urlfilter-regex|parse-(text|html|js|pdf|msword|zip|mspowerpoint|msexcel)|index-basic|query-(basic|sit
>>> e|url)</value>
>>>   <description></description>
>>> </property>
>>>
>>>
>>>
>>> 4) the urls file consists smb:hostnames/shares
>>>
>>> 5) The windows login details >> username/password/ip address etc are
>>>  entered in smb.properties
>>>
>>> 6) bin/nutch crawl urls -dir localcrawl  give error
>>>
>>> smb://192.168.0.1/:java.net.MalformedURLException: unknown protocol:
>>>  smb
>>>
>>> 7) Tried crawling Files but got following error
>>>
>>> file:///var/test.txt failed with:
>>>  org.apache.nutch.protocol.ProtocolNotFound: protocol not found for
>>> url=file
>>>
>>> Is the above setting correct to crawl local windows shares
>>>
>>>                                 
>>> Can some one guide me what to do ... where am i wrong???
>>>
>>> Thanx
>>>
>>> Bikram
>>>   
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Windows-Share-Crawling---searching-tf4277499.html#a12210566
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to