Bill 
        Thanx for response. I have some more questions for Nutch geeks out 
there:
  
  1.Can u send me default cofiguration that I need to make in 
crawl-urlfilter.txt for local files spidering ?
  
  file content below:
  
  # skip file:, ftp:, & mailto: urls
  -^(http|ftp|mailto|https):
  
  # skip image and other suffixes we can't yet parse
  
-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|png|PNG)$
  
  # skip URLs containing certain characters as probable queries, etc.
  [EMAIL PROTECTED]
  
  # accept hosts in MY.DOMAIN.NAME
  +^http://([a-z0-9]*\.)*www.mysite.com/
  
  # skip everything else
  -.
Is it correct ? if  not what i need to change.
  
  If I do this I got following error :
  
  "051130 102544 SEVERE org.apache.nutch.plugin.PluginRuntimeException:  
extension point: org.apache.nutch.searcher.QueryFilter does not exist.
  java.lang.ExceptionInInitializerError"
  
  2. I want to crawl both pdf and ms-word files , How I can include  plugins 
for that? What necessary configuration required for that in  nutch-site.xml 
file?
  
    answer awaited anxiously............
  
Bill Goffe <[EMAIL PROTECTED]> wrote:  Arun -

I suspect others will mention this too, but see 
http://wiki.apache.org/nutch/FAQ#head-c721b23b43b15885f5ea7d8da62c1c40a37878e6

          - Bill


>  I want to crawl and index local system files, is there any way to do  this 
> using nutch? What I need to do and what configuration changes are  required? 
> I am very new to nutch so need your help in this regards.
>         thanx in adavance for quick and good response.
>   
> 
> Regards,
>  
> Arun Kumar Sharma (Tech Lead -Java/J2EE)
> Mob: +91.981.529.5761
> 
> 
> 
> 
>   
> ---------------------------------
>  Enjoy this Diwali with Y! India Click here
-- 
         *------------------------------------------------------*
         | Bill Goffe                 [EMAIL PROTECTED]          |
         | Department of Economics    voice: (315) 312-3444     |
         | SUNY Oswego                fax:   (315) 312-5444     |
         | 416 Mahar Hall                  |          
         | Oswego, NY  13126                                    |
*--------*------------------------------------------------------*-----------*
| "He's better about shaving his legs than I am. The pressure's on me to    |
| keep my legs smooth."                                                     |
|  -- Sheryl Crow, on her boyfriend Lance Armstrong. "Crow's Armstrong      |
|     Song: 'Make 'Em Suffer,'" July 15, 2005, CNN.com                      |
*---------------------------------------------------------------------------*





Regards,
 
Arun Kumar Sharma (Tech Lead -Java/J2EE)
Mob: +91.981.529.5761




                
---------------------------------
 Enjoy this Diwali with Y! India Click here

Reply via email to