Bill
Thanx for response. I have some more questions for Nutch geeks out
there:
1.Can u send me default cofiguration that I need to make in
crawl-urlfilter.txt for local files spidering ?
file content below:
# skip file:, ftp:, & mailto: urls
-^(http|ftp|mailto|https):
# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|png|PNG)$
# skip URLs containing certain characters as probable queries, etc.
[EMAIL PROTECTED]
# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*www.mysite.com/
# skip everything else
-.
Is it correct ? if not what i need to change.
If I do this I got following error :
"051130 102544 SEVERE org.apache.nutch.plugin.PluginRuntimeException:
extension point: org.apache.nutch.searcher.QueryFilter does not exist.
java.lang.ExceptionInInitializerError"
2. I want to crawl both pdf and ms-word files , How I can include plugins
for that? What necessary configuration required for that in nutch-site.xml
file?
answer awaited anxiously............
Bill Goffe <[EMAIL PROTECTED]> wrote: Arun -
I suspect others will mention this too, but see
http://wiki.apache.org/nutch/FAQ#head-c721b23b43b15885f5ea7d8da62c1c40a37878e6
- Bill
> I want to crawl and index local system files, is there any way to do this
> using nutch? What I need to do and what configuration changes are required?
> I am very new to nutch so need your help in this regards.
> thanx in adavance for quick and good response.
>
>
> Regards,
>
> Arun Kumar Sharma (Tech Lead -Java/J2EE)
> Mob: +91.981.529.5761
>
>
>
>
>
> ---------------------------------
> Enjoy this Diwali with Y! India Click here
--
*------------------------------------------------------*
| Bill Goffe [EMAIL PROTECTED] |
| Department of Economics voice: (315) 312-3444 |
| SUNY Oswego fax: (315) 312-5444 |
| 416 Mahar Hall |
| Oswego, NY 13126 |
*--------*------------------------------------------------------*-----------*
| "He's better about shaving his legs than I am. The pressure's on me to |
| keep my legs smooth." |
| -- Sheryl Crow, on her boyfriend Lance Armstrong. "Crow's Armstrong |
| Song: 'Make 'Em Suffer,'" July 15, 2005, CNN.com |
*---------------------------------------------------------------------------*
Regards,
Arun Kumar Sharma (Tech Lead -Java/J2EE)
Mob: +91.981.529.5761
---------------------------------
Enjoy this Diwali with Y! India Click here