** apologies for cross-posting **

==== Call for Challenge: Semantic Publishing ====

Challenge Website: https://github.com/ceurws/lod/wiki/SemPub2016
Challenge hashtag: #SemPub2016
Challenge Chairs:
- Angelo Di Iorio (Department of Computer Science and Engineering, University 
of Bologna, IT)
- Anastasia Dimou (Data Science Lab, Ghent University, BE) 
- Christoph Lange (Enterprise Information Systems, University of Bonn / 
Fraunhofer IAIS, DE)
- Sahar Vahdati (Enterprise Information Systems, University of Bonn, DE)
Challenge Coordinator: Stefan Dietze​ (L3S, Germany) and Anna Tordai (Elsevier, 
Netherlands)

13th Extended Semantic Web Conference (ESWC) 2016
Dates: May 29th - June 2nd, 2016
Venue: Heraklion, Crete, Greece
Hashtag: #eswc2016
Feed: @eswc_conf
Site: http://2016.eswc-conferences.org
General Chair: Harald Sack (Hasso Plattner Institute (HPI), Germany​)

MOTIVATION AND OBJECTIVES

This is the next iteration of the successful Semantic Publishing Challenge of 
ESWC 2014 and 2015. We continue pursuing the objective of assessing the quality 
of scientific output, evolving the dataset bootstrapped in 2014 and 2015 to 
take into account the wider ecosystem of publications. To achieve that, this 
year’s challenge focuses on refining and enriching an existing linked open 
dataset about workshops, their publications and their authors. Aspects of 
“refining and enriching” include extracting deeper information from the HTML 
and PDF sources of the workshop proceedings volumes and enriching this 
information with knowledge from existing datasets. Thus, a combination of 
broadly investigated technologies in the Semantic Web field, such as 
Information Extraction (IE), Natural Language Processing (NLP), Named Entity 
Recognition (NER), link discovery, etc., is required to deal with the 
challenge’s tasks.


TARGET AUDIENCE
The Challenge is open to everyone from industry and academia.

TASKS
We ask challengers to automatically annotate a set of multi-format input 
documents and to produce a LOD that fully describes these documents, their 
context, and relevant parts of their content. The evaluation will consist of 
evaluating a set of queries against the produced dataset to assess its 
correctness and completeness. The primary input dataset is the LOD that has 
been extracted from the CEURWS.org workshop proceedings using the winning 
extraction tools of the 2014 and 2015 challenges, plus its full original HTML 
and PDF source documents. In addition, the challenge uses (as linking targets) 
existing LOD on scholarly publications. The input dataset will be split in two 
parts: a training dataset and an evaluation dataset, which will disclosed a few 
days before the submission deadline. Participants will be asked to run their 
tool on the evaluation dataset and to produce the final Linked Dataset and the 
output of the queries on that dataset. 

The Challenge includes three tasks:

= Task 1: Extraction and assessment of workshop proceedings information in HTML 
=
Participants are required to extract information from a set of HTML tables of 
contents published in CEUR-WS.org workshop proceedings. The extracted 
information is expected to answer queries about the quality of these workshops, 
for instance by measuring growth, longevity, etc. The task is an extension of 
the Task 1 of the 2014 and 2015 Challenge: we will reuse the most challenging 
quality indicators from last year’s challenge, others will be defined more 
precisely, others will be completely new. Last years’ results, with an 
F-measure of 0.66 in 2015 and 0.64 in 2014 for the winning solutions, show 
improvement but there is a lot of room for ameliorating information extraction.
 
= Task 2: Extracting information from the PDF full text of the papers =
Participants are required to extract information from the textual content of 
the papers (in PDF). That information should describe the organization of the 
paper and should provide a deeper understanding of the context in which it was 
written. In particular, the extracted information is expected to answer queries 
about the internal organization of sections, tables, figures and about the 
authors’ affiliations and research institutions, and fundings source. The task 
mainly requires PDF mining techniques and some NLP processing.

= Task 3: Interlinking =
Participants are required to interlink the CEUR-WS.org linked dataset with 
relevant datasets already existing in the LOD cloud. Task 3 can be accomplished 
as an entity interlinking/instance matching task that aims to address both 
interlinking data from the output of the other tasks as well as interlinking 
CEUR-WS.org linked dataset to external datasets. Moreover, as triples are 
generated from different sources and due to different activities, tracking 
provenance information becomes increasingly important.

EVALUATION
In each task, the participants will be asked to refine and extend the initial 
CEUR-WS.org Linked Open Dataset, by information extraction or link discovery, 
i.e. they will produce an RDF graph. To validate the RDF graphs produced, a 
number of queries in natural language will be specified, and their expected 
results in CSV format. Participants are asked to submit both their dataset and 
the translation of the input (natural language queries) to work on that 
dataset. A few days before the deadline, a set of query will be specified and 
be used for the final evaluation. Participants are asked then to run these 
queries on their dataset and to submit the produced output in CSV. Precision, 
recall and F-measure will be calculated by comparing each query’s result set 
with the expected query result from a gold standard built manually. 
Participants’ overall performance in a task will be defined as the average 
F-measure over all queries of the task, with all queries having equal weight. 
For computing precision and recall, an automated tool developed for the 2015 
challenge will be used; this tool will be publicly available during the 
training phase. 

FEEDBACK AND DISCUSSION
A discussion group is open for participants to ask questions and to receive 
updates about the challenge: mailto:[email protected]. 
Participants are invited to subscribe to this group as soon as possible and to 
communicate their intention to participate. They are also invited to use this 
channel to discuss problems in the input dataset and to suggest changes.

HOW TO PARTICIPATE
Participants are required to submit:
* Abstract: no more than 200 words. 
* Description: It should explain the details of the automated annotation 
system, including why the system is innovative, how it uses Semantic Web 
technology, what features or functions the system provides, what design choices 
were made and what lessons were learned. The description should also summarize 
how participants have addressed the evaluation tasks. An outlook towards how 
the data could be consumed is appreciated but not strictly required. Papers 
must be submitted in PDF format, following the style of the Springer's Lecture 
Notes in Computer Science (LNCS) series 
(http://www.springer.com/computer/lncs/lncs+authors), and not exceeding 12 
pages in length. Submissions in RASH format 
(http://cs.unibo.it/save-sd/rash/documentation/index.html) and Linked Research 
(https://github.com/csarven/linked-research) are also accepted as long as the 
final camera-ready version conforms to Springer's requirements.
* The Linked Open Dataset produced by their tool on the evaluation dataset (as 
a file or as a URL, in Turtle or RDF/XML). 
* A set of SPARQL queries that work on that LOD and correspond to the natural 
language queries provided as input
* The output of these SPARQL queries on the evaluation dataset (in CSV format)

Participants will also be asked to submit their tool (source and/or binaries, 
or a link these can be downloaded from, or a web service URL) for verification 
purposes. Further submission instructions will be published on the challenge 
wiki.

All submissions should be provided via the submission system linked from the 
homepage.

JUDGING AND PRIZES
After a first round of review, the Program Committee and the chairs will select 
a number of submissions conforming to the challenge requirements that will be 
invited to present their work. Submissions accepted for presentation will 
receive constructive reviews from the Program Committee, they will be included 
in the Springer CCIS series. The selection of the best challenge papers will be 
published in the Satellite Event proceedings (a separate Springer LNCS Volume) 
of ESWC2016. 

Six winners will be selected. For each task we will select:
* best performing tool, given to the paper which will get the highest score in 
the evaluation
* most original approach, selected by the Challenge Committee with the 
reviewing process

IMPORTANT DATES
* January 20, 2016: Publication of the full description of tasks, rules and 
queries; publication of the training dataset
* February 28, 2016: Publication of the evaluation tool
* March 11, 2016: Paper submission
* March 31, 2016: Deadline for making remarks to the training dataset and the 
evaluation tool
* April 8, 2016: Notification and invitation to submit task results;    
* April 24, 2016: Conference camera-ready
* May 11, 2016: Publication of the evaluation dataset details
* May 13, 2016: Results submission
* May 29 - June 2, 2016: Challenge days

NOTE: Accepted papers will be included in the Conference USB stick. After the 
conference, participants will be able to add data about the evaluation and to 
finalize the camera-ready for the final proceedings.


PROGRAM COMMITTEE
* Aliaksandr Birukou, Springer Verlag, Heidelberg, Germany
* Lukasz Bolikowski, University of Warsaw, Poland
* Kai Eckert, University of Mannheim, Germany
* Maxim Kolchin, ITMO University, SaintPetersburg, Russia
* Phillip Lord, Newcastle University, UK
* Philipp Mayr, GESIS, Germany
* Jodi Schneider, University of Pittsburgh, USA
* Selver Softic, Graz University of Technology, Austria
* Ruben Verborgh, Ghent university – iMinds
* Michael Wagner, Schloss Dagstuhl, LeibnizZentrum für Informatik, German

We are inviting further members.




Reply via email to