Dennis Kubes-2 wrote:
Just having the urls isn't the same as having an index. You would still
need to crawl them. You can inject your url list into a clean crawldb
and fetch only those urls with the inject, generate, fetch commands.
Then you can use the index command to index them.
Hi,
Does anyone know if there is a plugin for cold fusion pages or if it's
supported? I'm trying to crawl
http://www.knowitall.org/naturalstate
Thanks in advance,
Alex
Hi all,
I cannot have this 2 working, and I don't know why.
I'm using Nutch 0.8.1 .
1. Added support for type: in queries. Search results are limited/qualified
by mimetype or its primary type or sub type. For example,
(1) searching with type:application/pdf restricts results
to
I have just installed Nutch 0.9 in a shinning new Kubuntu distribution,
single machine. I have started Tomcat, but, when I try any search, I
get the following error message:
org.apache.jasper.JasperException: /search.jsp(151,22)
Attribute value language + /include/header.html is quoted with
Hi again, I have no answer.
Why are my documents unfetched when I do a recrawl please ?
Thks.
José Mestre
-Message d'origine-
De : José Mestre [mailto:[EMAIL PROTECTED]
Envoyé : mardi 2 décembre 2008 14:07
À : nutch-user@lucene.apache.org
Objet : RE : RE : Problem with crawl and
Bonjour Jose,
Sorry if I am suggesting something obvious but after you've done the *
updateDB* do you call *generate* to get a new segment? If so, do you then
call *fetch* on that second segment? Are you getting anything special in the
log file?
Best,
Julien
--
DigitalPebble Ltd
Hi,
Are you getting anything special in the log file? No anything special.
Yes I do that.
Here is my script:
echo Inject
/opt/nutch-0.8.1/bin/nutch inject crawl_fetcher/crawldb urlsfetch
echo #Fetch1#
/opt/nutch-0.8.1/bin/nutch generate crawl_fetcher/crawldb
crawl_fetcher/segments -adddays