Re: my own crawlscript.sh

2008-12-08 Thread Matthias W.
Dennis Kubes-2 wrote: Just having the urls isn't the same as having an index. You would still need to crawl them. You can inject your url list into a clean crawldb and fetch only those urls with the inject, generate, fetch commands. Then you can use the index command to index them.

does nutch support crawling cold fusion pages?

2008-12-08 Thread Alex Basa
Hi, Does anyone know if there is a plugin for cold fusion pages or if it's supported? I'm trying to crawl http://www.knowitall.org/naturalstate Thanks in advance, Alex

How to use type and date in query

2008-12-08 Thread Davide.D'ALESSANDRO
Hi all, I cannot have this 2 working, and I don't know why. I'm using Nutch 0.8.1 . 1. Added support for type: in queries. Search results are limited/qualified by mimetype or its primary type or sub type. For example, (1) searching with type:application/pdf restricts results to

A simple question

2008-12-08 Thread consultas
I have just installed Nutch 0.9 in a shinning new Kubuntu distribution, single machine.  I have started Tomcat, but, when I try any search, I get the following error message: org.apache.jasper.JasperException: /search.jsp(151,22) Attribute value language + /include/header.html is quoted with

RE: RE : Problem with crawl and recrawl

2008-12-08 Thread José Mestre
Hi again, I have no answer. Why are my documents unfetched when I do a recrawl please ? Thks. José Mestre -Message d'origine- De : José Mestre [mailto:[EMAIL PROTECTED] Envoyé : mardi 2 décembre 2008 14:07 À : nutch-user@lucene.apache.org Objet : RE : RE : Problem with crawl and

Re: RE : Problem with crawl and recrawl

2008-12-08 Thread Julien Nioche
Bonjour Jose, Sorry if I am suggesting something obvious but after you've done the * updateDB* do you call *generate* to get a new segment? If so, do you then call *fetch* on that second segment? Are you getting anything special in the log file? Best, Julien -- DigitalPebble Ltd

RE: RE : Problem with crawl and recrawl

2008-12-08 Thread José Mestre
Hi, Are you getting anything special in the log file? No anything special. Yes I do that. Here is my script: echo Inject /opt/nutch-0.8.1/bin/nutch inject crawl_fetcher/crawldb urlsfetch echo #Fetch1# /opt/nutch-0.8.1/bin/nutch generate crawl_fetcher/crawldb crawl_fetcher/segments -adddays