RE: 答复: Someone Please respond ... Deleting Urls already crawled from the crawlDB

2008-05-04 Thread Howie Wang
Order is important when defining rules in the urlfilter files. The url will filtered/unfiltered according to the first pattern in the file that is encountered. > I have tried using the crawl-urlfilter.txt. > > +^http://([a-z0-9]*\.)* > -^http://([a-z0-9]*?\.)*remita.net I think you want -^ht

答复: Someone Please respond ... Deletin g Urls already crawled from the crawlDB

2008-05-04 Thread wangkai
Please try "CrawlDbMerger", This tool merges several CrawlDb-s into one, optionally filtering URLs through the current URLFilters, to skip prohibited pages. It's possible to use this tool just for filtering - in that case only one CrawlDb should be specified in arguments. -邮件原件- 发件人: o

Someone Please respond ... Deleting Urls already crawled from the crawlDB

2008-05-04 Thread oddaniel
Guys i have been trying to get this done for weeks now. No progress. Someone please help me. I am trying to delete a domain already crawled from my crawldb and index. I have a list of domains already crawled in my index. How do I exclude or delete domains from my crawl output folder. I have trie

Re: Nutch API and Lucene API are same?

2008-05-04 Thread Vineet Garg
Hi Otis, Thanks for the quick response! [EMAIL PROTECTED] wrote: Hi Vineet, No, Nutch API and Lucene API are different. Nutch does use Lucene for indexing/searching, so you *can* use Lucene and its API for searching an index you built with Nutch. Just make sure you use the same version of

RE: Unable to tell if whether is any changes for the same webpage

2008-05-04 Thread Miao Liqiang NCS
In which way the nutch informs there are changes? Am I able to know whether there are changes or not? If nutch knows there are changes internally, can I know that from outside through API or sonethging? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, Ma

Re: Unable to tell if whether is any changes for the same webpage

2008-05-04 Thread ogjunk-nutch
It's part of Nutch, happens automatically. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Miao Liqiang NCS <[EMAIL PROTECTED]> > To: nutch-user@lucene.apache.org > Sent: Sunday, May 4, 2008 8:33:49 PM > Subject: RE: Unable to tell if whether

What kind of searches does Nutch support?

2008-05-04 Thread Miao Liqiang NCS
What kind of searches does Nutch support?

RE: Unable to tell if whether is any changes for the same webpage

2008-05-04 Thread Miao Liqiang NCS
Is this function provided in the nutch package, can I use it directly without programming the API? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, May 02, 2008 8:12 PM To: nutch-user@lucene.apache.org Subject: Re: Unable to tell if whether is any changes