Final CFP: Multilingual Automatic Clinical Corpus Generation  and Entity
Extraction

https://temu.bsc.es/MultiClinAI/

*Updates:   Evaluation evaluation library released


MultiClinAI is the first shared task focused on (1) the automatic creation
of comparable multilingual corpora and (2) the automatic detection of key
clinical concepts (diseases, symptoms, and procedures) in seven languages:
Spanish, English, Italian, Dutch, Romanian, Swedish and Czech. MultiClinAI
will be held as part of the #SMM4H-HeaRD Workshop at the ACL 2026
conference (online).

Key information:

   -

   Web: https://temu.bsc.es/MultiClinAI/
   
<https://mailtrack.io/trace/link/b82969cdff5285721c515b5fe3b2af2c28abd34e?url=https%3A%2F%2Ftemu.bsc.es%2Fdistemist%2F&userId=20950&signature=6d043df376d10380>
   -

   Data:
   
<https://mailtrack.io/trace/link/5ccb81bce7b07971c4c79b11f735aa6eee7cb0f8?url=https%3A%2F%2Fdoi.org%2F10.5281%2Fzenodo.6408476&userId=20950&signature=92636c599716c6f5>
   https://zenodo.org/records/18772832
   -

   Annotation guidelines:
   
<https://mailtrack.io/trace/link/92a08d209c6530f84b412c380ae19e8760b8d31b?url=https%3A%2F%2Fdoi.org%2F10.5281%2Fzenodo.6458078&userId=20950&signature=3e0d63151f201b2b>
   https://zenodo.org/records/13151040
   -

   Registration:
   
<https://mailtrack.io/trace/link/0dfb76782b3e7e5cd3ca2e60e694cabccb150191?url=https%3A%2F%2Ftemu.bsc.es%2Fdistemist%2Fregistration%2F&userId=20950&signature=2bfdd9856c072a69>
   https://temu.bsc.es/MultiClinAI/registration/
   -

   *Evaluation Library:
   
<https://mailtrack.io/trace/link/0dfb76782b3e7e5cd3ca2e60e694cabccb150191?url=https%3A%2F%2Ftemu.bsc.es%2Fdistemist%2Fregistration%2F&userId=20950&signature=2bfdd9856c072a69>
   https://github.com/nlp4bia-bsc/MultiClinAIEval

Motivation

Despite recent progress in clinical language technology solutions there are
few high-quality annotated corpora, datasets, and annotation guidelines
available for the training/evaluation of NLP- or LLM-based clinical entity
recognition systems beyond English.

There is a need to foster the generation of annotated datasets in multiple
languages ensuring also that they align in terms of annotation criteria to
generate comparable labeled datasets across languages and promote comparable
entity extraction systems. Developing multilingual models helps reduce
linguistic bias and improves the global applicability of clinical language
technologies. Such models enable more equitable AI deployment across
different regions and healthcare systems.

Multilingual clinical NLP has numerous important use cases including:

- In international clinical trials, it can be used to extract structured
data from trial sites across different countries and to ensure consistent
outcome definitions across languages.

- For cohort identification, it enables the identification of eligible
patients from unstructured electronic health records (EHRs) and the
extraction of phenotypes for observational studies.

-In disease surveillance, multilingual systems can help detect rare
diseases or emerging health trends and identify post-marketing drug safety
signals.

In this context, the MultiClinAI (Multilingual Clinical Entity Annotation
Projection and Extraction) shared task addresses the creation and
evaluation of comparable multilingual clinical resources across seven
languages, focusing on three key entity types: diseases, symptoms, and
procedures.

   -

   MultiClinNER subtask: multilingual clinical named entity recognition
   across expert-annotated gold-standard datasets.
   -

   MultiClinCorpus subtask: automatic generation of comparable multilingual
   clinical corpora through annotation projection techniques.

 This setup will enable a robust benchmarking scenario for multilingual
clinical NLP approaches.

Schedule

   -

   MultiClinAI Shared Task –  training set release (February 6, 2026)
   -

   MultiClinNER test set release  (March 18, 2026)
   -

   MultiClinNER test set prediction submissions (March 25, 2026)
   -

   MultiClinCorpus test set release (March 27, 2026)
   -

   MultiClinCorpus test set prediction submissions (April 9, 2026)
   -

   Result / evaluation returned to teams (April 14, 2026)
   -

   Participant proceedings due (April 24, 2026)
   -

   Notification of acceptance   (May 15, 2026)
   -

   Camera-ready papers due  (May 25, 2026)
   -

   ACL Proceedings due (hard deadline)  (June 1, 2026)
   -

   Workshop (online)  (July 2–3, 2026)

Publications and SMM4H-HeaRD in the ACL 2026 workshop

Teams participating in MultiClinAI will be invited to contribute a systems
description paper for the ACL 2026 Working Notes proceedings and a short
online presentation of their approach at the ACL 2026 workshop (online).

Main Organizers

   -

   Salvador Lima-López, Barcelona Supercomputing Center (BSC), Spain.
   -

   Fernando Gallego-Donoso, Barcelona Supercomputing Center (BSC), Spain.
   -

   Jan Rodríguez-Miret, Barcelona Supercomputing Center (BSC), Spain.
   -

   Judith Rosell, Barcelona Supercomputing Center (BSC), Spain.
   -

   Martin Krallinger, Barcelona Supercomputing Center (BSC), Spain.

Scientific Committee

   -

   Francisco M. Couto, Universidade de Lisboa, Portugal.
   -

   Ulf Leser, Humboldt-Universität zu Berlin, Germany.
   -

   Guergana Savova, Boston Children’s Hospital, United States.
   -

   Lourdes Araujo, Universidad Nacional de Educación a Distancia, Spain.
   -

   Pavel Pecina, Institute of Formal and Applied Linguistics, Faculty of
   Mathematics and Physics, Charles University, Czech Republic.
   -

   Halil Kilicoglu, University of Illinois at Urbana-Champaign, United
   States.
   -

   Rodrigo Agerri, HiTZ Centre of the University of the Basque Country,
   Spain.
_______________________________________________
scikit-learn mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/scikit-learn.python.org
Member address: [email protected]

Reply via email to