Call for Shared Task Participation SemEval 2017 Task 1
Semantic Textual Similarity (STS) Semantic Textual Similarity (STS) measures the degree of equivalence in the underlying semantics of paired snippets of text. While making such an assessment is trivial for humans, constructing algorithms and computational models that mimic human level performance represents a difficult and deep natural language understanding problem. STS is very much related to both machine translation evaluation and quality estimation. We actively encourage participation from both of these communities. STS evaluations have seen significant progress in methods targeted at a specific language such as English or Spanish. For the 2017 shared task, the emphasis is on building multilingual textual similarity models that are capable of assessing both same language and cross-lingual sentence pairs. The primary evaluation for the shared task assesses methods over a combination of same language text snippet pairs in Arabic, English and Spanish as well as cross-lingual Arabic-English and Spanish-English pairs. To encourage the development of methods that can be readily applied or adapted to new languages, we also provide an optional evaluation track with a surprise language that will only be announced at the beginning of the evaluation period. This optional track provides an opportunity to explore STS models capable of zero-shot learning via mechanisms such as multilingual embeddings. In addition to the multilingual primary evaluation and the surprise language track, a number of language and language pair specific tracks are also provided. We hope that these tracks will provide participants with particular linguistic expertise a chance to excel as well as provide an opportunity to compare performance differences between multilingual and language specific methods. Task Definition =============== Given two sentences, participants are asked to produce a continuous valued similarity score on a scale from 0 to 5, with 0 indicating that the semantics of the sentences are completely independent and 5 signifying semantic equivalence. Performance is assessed by computing the Pearson correlation between machine assigned semantic similarity scores and human judgments. Following the emphasis on building multilingual and cross-lingual models, the 2017 shared task is organized into the following seven multilingual and cross-lingual tracks: Track 0 - Primary: Combined evaluation of all announced monolingual and cross-lingual language pairings explored by the 2017 task: ar-ar, ar-en, en-en, es-en, and es-es. The primary track will not include the surprise language evaluation data. Track 1 - Arabic-Arabic: Evaluation only on ar-ar pairs. Track 2 - Arabic-English: Evaluation only on ar-en pairs. Track 3 - Spanish-Spanish: Evaluation only on es-es pairs Track 4 - Spanish-English: Evaluation only on es-en pairs. Track 5 - English-English: Evaluation only on en-en pairs. Track 6 - Surprise language track (announced during the evaluation period) For all language pairings, participants will be provided with two sentence length snippets of text, s1 and s2. The two snippets will then be used to compute and return a continuous valued semantic similarity score. The cross-lingual language pairings (ar-en, es-en) only differ from the monolingual language pairings (ar-ar, en-en, es-es) in that the two text snippets in each pair are written in different languages. The inclusion of cross-lingual STS pairs follows a successful pilot in 2016 that paired English and Spanish sentences. Depending on the approach being used to compute the similarity scores this may present different degrees of difficulty in adapting the underlying model to handle the cross-lingual pairs. Participants are encouraged to review the successful approaches to monolingual and cross-lingual STS from prior years of the STS shared task (Agirre et al. 2016; Agirre et al. 2015; Agirre et al. 2014; Agirre et al. 2013; Agirre et al. 2012) 2017 Data ========= This year's shared task includes one evaluation set for each of the seven tracks described above. Each evaluation set consists of between 200 to 250 sentence pairs. Within each evaluation set, we will attempt to approximately balance the distribution of STS scores. For training data, participants are encouraged to make use of all existing English, Spanish and cross-lingual English-Spanish data sets from prior STS evaluations. This includes all previously released trial, training and evaluation data. Since this is the first year that we will include Arabic as part of an STS evaluation, we will release training data for both monolingual Arabic and cross-lingual Arabic-English. Each training set will consist of approximately 14,000 pairs sourced from prior English STS evaluations. As with the 2016 evaluation, participants are allowed and very much encouraged to train purely unsupervised models and model components on arbitrary data (e.g., unsupervised word embeddings). Participation ============= [Register] To register, please complete the following form: https://docs.google.com/forms/d/e/1FAIpQLScXnt7qeioCPyxu6dv9wrSD YaF04bRgVBFCUbahxsAG6F43Sg/viewform <https://docs.google.com/forms/d/1HTRtP7B94gqdW5YuRfRh5pEBhukuRIh5hXR1nOEib90/viewform?usp=send_form> [Website and trial data] For more details, including trial data, see the STS SemEval 2017 Task 1 webpage at: http://alt.qcri.org/semeval2017/task1/ <http://alt.qcri.org/semeval2016/task1/> [Mailing List] Join the mailing list for task updates and discussion at: http://groups.google.com/group/STS-semeval. Important dates =============== Trail data ready: Wed 21 Sep 2016 Training data ready: Mon 24 Oct 2016 Evaluation start: Mon 09 Jan 2017 Evaluation end: Mon 30 Jan 2017 Results posted: Mon 06 Feb 2017 Paper submissions due: Mon 27 Feb 2017 Author notifications: Mon 03 Apr 2017 Camera ready submissions due: Mon 17 Apr 2017 SemEval workshop: Summer 2017 Organizers (alpha. order) ========== Eneko Agirre, Daniel Cer, Mona Diab, Lucia Specia References ========== Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Rada Mihalcea, German Rigau, Janyce Wiebe. SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation. Proceedings of SemEval 2016. Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria and Janyce Wiebe. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. Proceedings of SemEval 2015. Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau and Janyce Wiebe. SemEval-2014 Task 10: Multilingual Semantic Textual Similarity. Proceedings of SemEval 2014. Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre and WeiWei Guo. *SEM 2013 shared task: Semantic Textual Similarity. Proceedings of *SEM 2013. Eneko Agirre, Daniel Cer, Mona Diab and Aitor Gonzalez-Agirre. SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity. Proceedings of SemEval 2012.
_______________________________________________ Mt-list site list Mt-list@eamt.org http://lists.eamt.org/mailman/listinfo/mt-list