Re: Nutch - Filtering (REGEX)

Marcin Okraszewski Fri, 04 May 2007 14:09:59 -0700

How about  conf/crawl-urlfilter.txt  ??

Marcin


On 5/4/07, simon_ece <[EMAIL PROTECTED]> wrote:


hi all,
i am new to Nutch. I would like to crawl a particular site and get the
result in the following pattern.I dont want to list other urls from the
Crwaled site.

Site to be Crwal :eg" www.example.com
^http://([a-z0-9]*\.)example.com/([a-zA-Z]*)-\([a-z0-9]*\)-.*-\([0-9]*-[A-Za-z0-9]*\)\.html$

i can crawl and geting all the matching urls from the site,
i dont know how to filterout the urls and get only the particular urls,
kindly post the suggestions
Thanks & Regards
Simon

--
View this message in context: 
http://www.nabble.com/Nutch---Filtering-%28REGEX%29-tf3690583.html#a10318059
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch - Filtering (REGEX)

Reply via email to