hi all, i am new to Nutch. I would like to crawl a particular site and get the result in the following pattern.I dont want to list other urls from the Crwaled site.
Site to be Crwal :eg" www.example.com ^http://([a-z0-9]*\.)example.com/([a-zA-Z]*)-\([a-z0-9]*\)-.*-\([0-9]*-[A-Za-z0-9]*\)\.html$ i can crawl and geting all the matching urls from the site, i dont know how to filterout the urls and get only the particular urls, kindly post the suggestions Thanks & Regards Simon -- View this message in context: http://www.nabble.com/Nutch---Filtering-%28REGEX%29-tf3685035.html#a10300328 Sent from the Nutch - Dev mailing list archive at Nabble.com. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers