Dear Nutch Users,

web spam is a serious issue also for nutch, but in the moment we known only a little bit about the problem and how we can work around. Please invest some time to help the research community by building a collection for future research work.
Details see below.

Thank you.
Stefan

VOLUNTEERS - WEB SPAM CLASSIFICATION

At the Algorithmic Engineering group at Universita' di Roma "La Sapienza", we are currently building a reference collection for testing Web Spam detection algorithms. While similar collections for research on e-mail spam filtering exist, there are no publicly available collections for testing Web Spam detection techniques.

This collection will be freely available once it is completed. We are currently tagging a large subset of 8,000 .UK domains. The objective is to classify every domain as spam, normal or suspicious. We are 12 volunteers at this moment and we want to have at least two judges per each classified domain, also, having an heterogeneous group of judges makes the collection more valuable.

The working time for classifying 100 domains is of about 2 to 3 hours. We provide guidelines and examples for the classification task, and an easy to use web-based interface for the volunteers:

http://aeserver.dis.uniroma1.it/webspam/

If you, or a colleague or student, can help us in this task, please contact: [EMAIL PROTECTED]

Thank you very much,

--
Carlos Castillo, Ph.D.
Dipartimento di Informatica e Sistemistica Università degli Studi di Roma "La Sapienza"

Via Salaria 113, II floor
00198 Rome, ITALY
Tel: +39 06 4991 8344
Fax: +39 06 8530 0849


Reply via email to