-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 There are quite a few ways to do this. In fact, Google's PageRank is one such approach. Text classification (as done in spam filters, for example) is another. It just depends on what you are going to do.
d e wrote: > We plan to index many websites. Got any suggestions on how to drop > the junk > without having to do too much work for each such site? Know anyone > who has a > background on doing this sort of thing? What sorts of approaches > would you > recommend? - -- Best regards, Bjoern Wilmsmann -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iD8DBQFF812mgz0R1bg11MERAqXCAKCVTfLN7KXJYdAqLGWMI57ChKaM8QCfdQBc 1CyrQfD+5vCzSBvYbviX17o= =+TK/ -----END PGP SIGNATURE----- ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers