RE: Antwort: RE: How can I use Nutch 0.7 to crawl the Dynamic news?

2005-09-08 Thread Sébastien LE CALLONNEC
Hi, I am not too sure what you're saying... The ASP pages may be built from data pulled out from a database, but at the end of the day, what the browser displays is of text/html content-type, which can be indexed by Nutch. Or is your question related to another matter altogether? Regards,

Antwort: RE: Antwort: RE: How can I use Nutch 0.7 to crawl the Dynamic news?

2005-09-08 Thread Robert . Guggenberger
hi, sorry it was my fault. Of course nutch indexes all URLs, pages and reads the file as text/html. So you are right :-) I'm quite new to nutch (first day was yesterday :-). regards robert Sébastien LE CALLONNEC [EMAIL PROTECTED] 08.09.2005 12:35 Bitte antworten an nutch-user

wildcards

2005-09-08 Thread Robert . Guggenberger
I just read that wildcards are not supported? Am I right? Are there any workarounds for using wildcards or would be the use of Lucene better? I'm just looking for a search engine for our homepage. (jsp) regards Robert

File system at a intranet

2005-09-08 Thread Valmir Macário
Hi all, Is possible index some files in a shared folder in pcs connected at a intranet? Some have idea of how i do that? Thank you

RE: RE: Antwort: RE: How can I use Nutch 0.7 to crawl the Dynamic news?

2005-09-08 Thread Vanderdray, Jake
You should be able to get it to work by changing this: # skip URLs containing certain characters as probable queries, etc. [EMAIL PROTECTED] To this: # skip URLs containing certain characters as probable queries, etc. [EMAIL PROTECTED] Jake. -Original Message- From: mu

Re: nutch merge

2005-09-08 Thread quovadis
I merged my index and its off my nutch dir... so i have index, segments and db if that helps? On Thu, 8 Sep 2005 09:15:33 -0400 Jay Pound [EMAIL PROTECTED] wrote: when I merge the index where do I put it? does it still need to be in the segments folder? I've merged it, and tried to start

Re: File system at a intranet

2005-09-08 Thread Robert Chevallier
Valmir Macário wrote: Hi all, Is possible index some files in a shared folder in pcs connected at a intranet? Some have idea of how i do that? Thank you It is possible to crawl local file, but Nutch 0.7 has a bug for the file protocol for crawling remote file (url looking like

Re: Difference between application/vnd.ms-powerpoint and application/powerpoint

2005-09-08 Thread Robert Chevallier
Ayyanar Inbamohan wrote: Hi all, I have some sample powerpoint files, which i am trying to crawl by nutch, but following is the error i got 050908 152609 fetch okay, but can't parse http://localhost:8080/search_sample/kmportal5.ppt, reason: failed(2,203): Content-Type is not