How about  conf/crawl-urlfilter.txt  ??

Marcin

On 5/4/07, simon_ece <[EMAIL PROTECTED]> wrote:
>
> hi all,
> i am new to Nutch. I would like to crawl a particular site and get the
> result in the following pattern.I dont want to list other urls from the
> Crwaled site.
>
> Site to be Crwal :eg" www.example.com
> ^http://([a-z0-9]*\.)example.com/([a-zA-Z]*)-\([a-z0-9]*\)-.*-\([0-9]*-[A-Za-z0-9]*\)\.html$
>
> i can crawl and geting all the matching urls from the site,
> i dont know how to filterout the urls and get only the particular urls,
> kindly post the suggestions
> Thanks & Regards
> Simon
>
> --
> View this message in context: 
> http://www.nabble.com/Nutch---Filtering-%28REGEX%29-tf3690583.html#a10318059
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to