Hi, Making my first cpe, I am wondering how to deal with NLP tasks which aims at processing several documents (i.e. pair of collection of documents considered as a single entity) in a time. I am thinking about applications such as (multilingual) text alignement, or term extraction based on measures over a corpus, or text clustering (how to compare one document with a set of documents)... Such applications requires handling CAS over a kind of "collection artefact".
I saw it exists only the concepts of Annotation (inner document description) and DocumentAnnotation. I can imagine that some solutions can be possible thanks to CAS Consumers or CAS Multipliers to deal with my problem but its only hacking UIMA. Does someone have got some experiences with such similar aims using UIMA ? How do you handle them ? Does it exist something dedicated in UIMA to work with a "collection artefact" ? Thanks /Nicolas -- [EMAIL PROTECTED] -- # Laboratoire LINA-TALN CNRS UMR 6241 tel. +33 (0)2 51 12 58 55 # Institut Universitaire de Technologie de Nantes - Département Informatique tel. +33 (0)2 40 30 60 67
