hi, I have got 30,000 urls during my crawl cycle which not unique , so i had to find out that how many are the unquie URL's as i know that "dedup" will remove the duplicate urls, but i am trying to remove the duplicate links using dedup command ,still it is not removing them can any one please help me out . The command which i had tried is below
$ bin/nutch dedup <path of index dir>/indexes thanks in advance -- View this message in context: http://www.nabble.com/dedup-is-not-removing-duplicate-record-tf3389001.html#a9434335 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
