Hi,
I am not too sure what you're saying... The ASP pages may be built
from data pulled out from a database, but at the end of the day, what
the browser displays is of text/html content-type, which can be indexed
by Nutch.
Or is your question related to another matter altogether?
Regards,
hi,
sorry it was my fault.
Of course nutch indexes all URLs, pages and reads the file as text/html.
So you are right :-)
I'm quite new to nutch (first day was yesterday :-).
regards
robert
Sébastien LE CALLONNEC [EMAIL PROTECTED]
08.09.2005 12:35
Bitte antworten an nutch-user
I just read that wildcards are not supported? Am I right? Are there any
workarounds for using wildcards or would be the use of Lucene better?
I'm just looking for a search engine for our homepage. (jsp)
regards
Robert
Hi all,
Is possible index some files in a shared folder in pcs connected at a
intranet? Some have idea of how i do that?
Thank you
You should be able to get it to work by changing this:
# skip URLs containing certain characters as probable queries, etc.
[EMAIL PROTECTED]
To this:
# skip URLs containing certain characters as probable queries, etc.
[EMAIL PROTECTED]
Jake.
-Original Message-
From: mu
I merged my index and its off my nutch dir... so i have
index, segments and db if that helps?
On Thu, 8 Sep 2005 09:15:33 -0400
Jay Pound [EMAIL PROTECTED] wrote:
when I merge the index where do I put it? does it still
need to be in the segments folder? I've merged it, and
tried to start
Valmir Macário wrote:
Hi all,
Is possible index some files in a shared folder in pcs connected at a
intranet? Some have idea of how i do that?
Thank you
It is possible to crawl local file, but Nutch 0.7 has a bug for the file
protocol for crawling remote file (url looking like
Ayyanar Inbamohan wrote:
Hi all,
I have some sample powerpoint files, which i am trying
to crawl by nutch, but following is the error i got
050908 152609 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal5.ppt,
reason: failed(2,203): Content-Type is not