Dear Nutch Users,
web spam is a serious issue also for nutch, but in the moment we
known only a little bit about the problem and how we can work around.
Please invest some time to help the research community by building a
collection for future research work.
Details see below.
Thank you.
Stefan
VOLUNTEERS - WEB SPAM CLASSIFICATION
At the Algorithmic Engineering group at Universita' di Roma "La
Sapienza", we are currently building a reference collection for
testing Web Spam detection algorithms. While similar collections
for research on e-mail spam filtering exist, there are no publicly
available collections for testing Web Spam detection techniques.
This collection will be freely available once it is completed. We
are currently tagging a large subset of 8,000 .UK domains.
The objective is to classify every domain as spam, normal or
suspicious. We are 12 volunteers at this moment and we want to have
at least two judges per each classified domain, also, having an
heterogeneous group of judges makes the collection more valuable.
The working time for classifying 100 domains is of about 2 to 3
hours. We provide guidelines and examples for the classification
task, and an easy to use web-based interface for the volunteers:
http://aeserver.dis.uniroma1.it/webspam/
If you, or a colleague or student, can help us in this task, please
contact: [EMAIL PROTECTED]
Thank you very much,
--
Carlos Castillo, Ph.D.
Dipartimento di Informatica e Sistemistica Università degli Studi
di Roma "La Sapienza"
Via Salaria 113, II floor
00198 Rome, ITALY
Tel: +39 06 4991 8344
Fax: +39 06 8530 0849