Re: RUTA: Copy features into new annotation

Peter Klügl Wed, 13 Jan 2021 01:55:50 -0800

Hi,

Am 11.01.2021 um 08:13 schrieb Erik Fäßler:
> Hello Peter,
>
> thank you again that you put so much thought it in.
> I am a bit embarrassed to say that I already had the solution in my script 
> when I just opened Eclipse again. I think I just didn’t really try it because 
> I didn’t expect it to work.
> This works now, thank you!
>
> In order to better understand my case, here some details:
> My type system is indeed the JCoRe TS.
> And I am not working with Person annotations but with Organism mentions, but 
> I wanted to keep things simple. Organism mentions are extended from 
> ConceptMentions:
> https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-semantics-mention-types.xml#L125
>  
> <https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-semantics-mention-types.xml#L125>
>
> Those have the “resourceEntryList” feature which is an FSArray of 
> ResourceEntry instances:
> https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-basic-types.xml#L44
>  
> <https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-basic-types.xml#L44>
>
> The ResourceEntry, finally, has a feature named “entryId”.



:-)

I was looking the the Person definition there, but didn't find matching
features.



>
> The entryIds are set in a separate annotator (JCoRe Linneaus annotator). And 
> my goal is to connect multiple mentions of Organisms ("mouse and human”) into 
> a single expression for a downstream annotator that is checking the Organism 
> mentions directly in front of gene mentions. However, in the example “mouse 
> and human” it would always detect “human” but disregard “mouse”. So I thought 
> I would create new annotations to “merge” the originals.
>
> Is this how you would do it? Alternatively, I could also have merged the two 
> existing Organism annotations. I would even prefer that. But I would not know 
> how to organize this so that, in the end, instead of two single Organism 
> annotations with two resourceEntries there would be only one Organism 
> annotation with both resourceEntries.


It hard to tell without taking a closer look.

In general, I find it better to create additional annotations for
complex structures instead of merging the information in an existing
annotation, simple due to maintainability reasons. It's easier to
inspect unintended behavior several month later that way ...


>
> So actually, there is one step missing now: I need to replace merged Organism 
> entries with the covering OrganismEnumeration (Person and PersonEnumeration 
> in my example).


I am not sure what the input/output behavior should be. Don't you have
two separate annotations and isn't the enum the merge of the semantic?

If you can give me an example, I'll write a rule for you :-)



> Is there a way to do this better in RUTA? I have to say that I have not yet 
> fully penetrated the syntax, I would have not been able to come up with the
> // collect ids of all covered Persons using a extra list
> STRINGLIST ids;
> pe:PersonEnumeration{-> pe.personIds = ids}
>     <-{p:Person{-> ADD(ids,p.ids.personId)};};


Labels and inlined rules are the two best language features I added in
Ruta, really useful. Let me know if you want to learn more about them
and if there is information missing in the documentation.



Best,


Peter



>
> construction so this enumeration-annotation-merging might actually be easy 
> and I just don’t see it.
>
> Thank you so much!
>
> Erik
>
>> On 10. Jan 2021, at 16:21, Peter Klügl <peter.klu...@averbis.com> wrote:
>>
>> Hi,
>>
>>
>> Am 07.01.2021 um 14:55 schrieb Erik Fäßler:
>>> Hi Peter and thank you once again for your excellent support of your 
>>> excellent RUTA software!
>>
>> You are welcome :-)
>>
>>
>>> Your second example was very much what I needed. Thank you so far!
>>> I have one last bump in the road:
>>>
>>> My Person#id feature is an FSArray with ID annotations instead of a plain 
>>> uima.cas.String. So, one Person annotation might have multiple IDs per the 
>>> type system.
>>> The ID type has a feature “entryId”.
>>> In my particular case I actually have only one entry in the id array. 
>>> Still, I need to access this entry somehow.
>>> Is that at all possible in RUTA? I would need something like
>>>
>>>
>>> // collect ids of all covered Persons using an extra list
>>> STRINGLIST ids;
>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>    <-{p:Person{-> ADD(ids,p.id <http://p.id/> <http://p.id/ 
>>> <http://p.id/>>[0].entryId)};};
>>>
>>> This does not seem to be covered by the FeatureExpression grammar in RUTA. 
>>> Is there a work around? Otherwise I will have to solve it some other way.
>>
>> there are actual "indexed" expressions like Person.ids[0] but it's not
>> yet an "official" and stable feature. However, I think it's not even
>> necessary.
>>
>>
>> Is your typesystem available somewhere? JCoRe?
>>
>> Is this a solution for you?
>>
>>
>> PACKAGE uima.ruta;
>>
>> // mock types
>> DECLARE CC, EnumCC;
>> DECLARE Person (FSArray ids);
>> DECLARE PersonId (String personId);
>> DECLARE PersonEnumeration (StringArray personIds);
>>
>> // mock annotations
>> "Trump" -> Person;
>> "Biden" -> Person;
>> "and" -> CC;
>> INT counter = 1;
>> p:Person{-> pid:CREATE(PersonId, "personId" = "id_" + (counter)),
>> counter = counter +1, p.ids = pid};
>>
>> (COMMA? @CC){-> EnumCC};
>>
>> // identify enum span
>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>
>> // collect ids of all covered Persons using a extra list
>> STRINGLIST ids;
>> pe:PersonEnumeration{-> pe.personIds = ids}
>>     <-{p:Person{-> ADD(ids,p.ids.personId)};};
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>>
>>> Many thanks,
>>>
>>> Erik
>>>
>>>> On 7. Jan 2021, at 10:47, Peter Klügl <peter.klu...@averbis.com 
>>>> <mailto:peter.klu...@averbis.com>> wrote:
>>>>
>>>> Hi Erik,
>>>>
>>>>
>>>> it depends on how you want to represent the information of the ids of
>>>> the covered Person annotations. You somehow need to represent the values
>>>> in the PersonEnumeration annotation. I assume that the ID feature of
>>>> Person is uima.cas.String? PersonEnumeration could either use one String
>>>> Feature, a StringArray feature or a FSArray feature (pointing to the
>>>> Person annotation which provide the IDs).
>>>>
>>>> Here are two examples:
>>>>
>>>>
>>>> PACKAGE uima.ruta;
>>>>
>>>> // mock types
>>>> DECLARE CC, EnumCC;
>>>> DECLARE Person (STRING id);
>>>> DECLARE PersonEnumeration (FSArray persons);
>>>>
>>>> // mock annotations
>>>> "Trump" -> Person ("id" = "1");
>>>> "Biden" -> Person ("id" = "2");
>>>> "and" -> CC;
>>>>
>>>> COMMA? @CC{-> EnumCC};
>>>>
>>>> // identify enum span
>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>
>>>> // collect all covered Persons
>>>> pe:PersonEnumeration{-> pe.persons = Person};
>>>>
>>>> ########################
>>>>
>>>> ########################
>>>>
>>>> PACKAGE uima.ruta;
>>>>
>>>> // mock types
>>>> DECLARE CC, EnumCC;
>>>> DECLARE Person (STRING id);
>>>> DECLARE PersonEnumeration (StringArray personIds);
>>>>
>>>> // mock annotations
>>>> "Trump" -> Person ("id" = "1");
>>>> "Biden" -> Person ("id" = "2");
>>>> "and" -> CC;
>>>>
>>>> COMMA? @CC{-> EnumCC};
>>>>
>>>> // identify enum span
>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>
>>>> // collect ids of all covered Persons using an extra list
>>>> STRINGLIST ids;
>>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>>    <-{p:Person{-> ADD(ids,p.id)};};
>>>>
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>> Am 06.01.2021 um 08:29 schrieb Erik Fäßler:
>>>>> Hello everyone (and a happy new year :-)),
>>>>>
>>>>> I have been working on the following issue: Whenever there is conjunction 
>>>>> in text of two entities (e.g. [...]Biden and Trump ran for president […]) 
>>>>> I create a new annotation spanning both entities and the conjunction 
>>>>> ([Biden and Trump]_coordination). I can do this fine.
>>>>> However, my entities - Biden and Trump - also have the ID feature. The 
>>>>> new annotation should receive both IDs from the Biden and Trump 
>>>>> annotations. But I couldn’t manage to do this.
>>>>>
>>>>> I have rules like this:
>>>>>
>>>>> (Person (
>>>>>   ",” (Person)
>>>>>    ","? PennBioIEPOSTag.value=="CC"
>>>>> Person
>>>>> ) {->MARK(PersonEnumeration)};
>>>>>
>>>>> So an enumeration of Persons are covered with a new annotation of type 
>>>>> “PersonEnumeration”. And now “PersonEnumeration” should receive all the 
>>>>> ID features from the covered Person annotations. How can I do this?
>>>>>
>>>>> Best,
>>>>>
>>>>> Erik
>>>> -- 
>>>> Dr. Peter Klügl
>>>> Head of Text Mining/Machine Learning
>>>>
>>>> Averbis GmbH
>>>> Salzstr. 15
>>>> 79098 Freiburg
>>>> Germany
>>>>
>>>> Fon: +49 761 708 394 0
>>>> Fax: +49 761 708 394 10
>>>> Email: peter.klu...@averbis.com
>>>> Web: https://averbis.com
>>>>
>>>> Headquarters: Freiburg im Breisgau
>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>>
>> -- 
>> Dr. Peter Klügl
>> Head of Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Salzstr. 15
>> 79098 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: peter.klu...@averbis.com <mailto:peter.klu...@averbis.com>
>> Web: https://averbis.com <https://averbis.com/>
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.klu...@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: RUTA: Copy features into new annotation

Reply via email to