-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

There are quite a few ways to do this. In fact, Google's PageRank is  
one such approach. Text classification (as done in spam filters, for  
example) is another. It just depends on what you are going to do.

d e wrote:

> We plan to index many websites. Got any suggestions on how to drop  
> the junk
> without having to do too much work for each such site? Know anyone  
> who has a
> background on doing this sort of thing? What sorts of approaches  
> would you
> recommend?

- --
Best regards,
Bjoern Wilmsmann



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFF812mgz0R1bg11MERAqXCAKCVTfLN7KXJYdAqLGWMI57ChKaM8QCfdQBc
1CyrQfD+5vCzSBvYbviX17o=
=+TK/
-----END PGP SIGNATURE-----

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to