Order is important when defining rules in the urlfilter
files. The url will filtered/unfiltered according to the
first pattern in the file that is encountered.
> I have tried using the crawl-urlfilter.txt.
>
> +^http://([a-z0-9]*\.)*
> -^http://([a-z0-9]*?\.)*remita.net
I think you want
-^ht
Please try "CrawlDbMerger",
This tool merges several CrawlDb-s into one, optionally filtering URLs
through the current URLFilters, to skip prohibited pages.
It's possible to use this tool just for filtering - in that case only one
CrawlDb should be specified in arguments.
-邮件原件-
发件人: o
Guys i have been trying to get this done for weeks now. No progress. Someone
please help me. I am trying to delete a domain already crawled from my
crawldb and index.
I have a list of domains already crawled in my index. How do I exclude or
delete domains from my crawl output folder. I have trie
Hi Otis,
Thanks for the quick response!
[EMAIL PROTECTED] wrote:
Hi Vineet,
No, Nutch API and Lucene API are different. Nutch does use Lucene for
indexing/searching, so you *can* use Lucene and its API for searching an index
you built with Nutch. Just make sure you use the same version of
In which way the nutch informs there are changes? Am I able to know
whether there are changes or not? If nutch knows there are changes
internally, can I know that from outside through API or sonethging?
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Monday, Ma
It's part of Nutch, happens automatically.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Miao Liqiang NCS <[EMAIL PROTECTED]>
> To: nutch-user@lucene.apache.org
> Sent: Sunday, May 4, 2008 8:33:49 PM
> Subject: RE: Unable to tell if whether
What kind of searches does Nutch support?
Is this function provided in the nutch package, can I use it directly
without programming the API?
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Friday, May 02, 2008 8:12 PM
To: nutch-user@lucene.apache.org
Subject: Re: Unable to tell if whether is any changes