Re: Getting Nutch up and running

John Martyniak Wed, 11 Jun 2008 18:12:44 -0700

Did you put your crawldb in the tomcat root.

I am a newbie also but had to pit it there in order for it to run.


-John

On Jun 11, 2008, at 7:50 PM, nutch_newbie <[EMAIL PROTECTED]>wrote:

I have Fedora Core 5, and i followed all the tutorials i could findto make

nutch run. the crawler (in a shell) runs just fine, and on my

localhost:8080/nutch everything looks fine too. but when you typesomethingin and click search, nothing comes up! it just says "Hits 0-0 (outof about0 total matching pages): ". And yes, i edited my crawl-urlfilter.txt, here

it is:
# The url filter file used by the crawl command.

# Better for intranet crawling.
# Be sure to change MY.DOMAIN.NAME to your domain name.

# Each non-comment, non-blank line contains a regular expression
# prefixed by '+' or '-'.  The first matching pattern in the file
# determines whether a URL is included or ignored.  If no pattern
# matches, the URL is ignored.

# skip file:, ftp:, & mailto: urls
-^(file|ftp|mailto):

# skip image and other suffixes we can't yet parse

-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|png)$


# skip URLs containing certain characters as probable queries, etc.
[EMAIL PROTECTED]

# skip URLs with slash-delimited segment that repeats 3+ times, tobreak

loops
-.*(/.+?)/.*?\1/.*?\1/

# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
+^http://([a-z0-9]*\.)*http://en.wikipedia.org
+^http://([a-z0-9]*\.)*http://www.google.com
+^http://([a-z0-9]*\.)*http://search.yahoo.com/

# skip everything else
-.
so what am i doing wrong?

Any and all Help would be greatly appreciated. thank you in advance.
--
View this message in context: 
http://www.nabble.com/Getting-Nutch-up-and-running-tp17789747p17789747.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Getting Nutch up and running

Reply via email to