Hi Timothy Thanks for the prompt reply. Is it possible to use IdentifiedAnnotation in CPE? I saw IdentifiedAnnotation in CVD which select one concept among the collections. I would like to run CPE since I need to run for many documents. I believe that I could not run CVD for many documents, am I right?
Regards, Shrestha On Tue, Aug 2, 2016 at 3:11 PM, Miller, Timothy < [email protected]> wrote: > I don't know if there is a single pipeline that does concepts and > relations, if not you will have to use UIMAFit calls to add additional > extractors to the fast pipeline descriptor you are currently getting. > > You may want "IdentifiedAnnotation" and its subclasses as your type > because it has a definite span. Each IA may correspond to a number of > different concepts in the UMLS dictionary, so we have a data structure > that contains all the matches for a given span. That is the FSArray (It > is a UIMA data type, stands for FeatureStructureArray). The UMLS > dictionary annotators will create UmlsConcept instances in that array > based on the results of the dictionary lookup. > Finding the "best" one for any span is not something that cTAKES will do > for you, it probably depends on your application. Sometimes we output > them all, sometimes we output the first one, you may need to dig in to > see how many of them are relevant and filter against a subset of things > you are looking for. > > > Looks like the word "kidney" is indeed in the input: > > > human embryo kidney 293T cells > > ctakes will find mentions even as modifiers inside larger phrases. > > > Finally, I would not try to interpret Uima xml manually, I would use the > UIMA CVD (visual debugger) to read the .xmi files that ctakes outputs. > (I believe they should be xmi). > > Tim > > > On Tue, 2016-08-02 at 14:13 +0200, Niraj Shrestha wrote: > > Dear Sir > > I am trying to extract named entities and their relations from medical > > document. If I understood correctly concepts are basically entities. > > I have used two different analysis engines: > > AggregatePlaintextFastUMLSProcessor.xml for concept extraction > > and > > > > RelationExtractorAggregate for relation extraction. > > > > > > My first question is how can I combined both engine to obtain concept > > and relations in single file. > > > > > > If I understood correctly, If I need to extract all the entities > > (concepts) then I need to get all the nodes > > "org.apache.ctakes.typesystem.type.refsem.UmlsConcept" from output xml > > file. But how can I choose the single entities or concept from list of > > many concepts? > > > > > > and What is FSArray in which all concept ids are listed. > > > > > > I found some concepts are not mentioned on input data but it appeared > > in the output data for example, when I use following engine in > > "note.txt" file > > > > > > <import > > location="../analysis_engine/AggregatePlaintextFastUMLSProcessor.xml"/> > > > > output file is "note.txt4.xml" (attached here) > > > > > > One of the concept is following, where "kidney" is mentioned as > > preferredText but the word "kidney" is not found in the input data. > > > > > > <org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="4503" > > codingScheme="SNOMEDCT" code="64033007" oid="64033007#SNOMEDCT" > > score="0.0" disambiguated="false" cui="C0022646" tui="T023" > > preferredText="Kidney"/> > > <org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="4493" > > codingScheme="SNOMEDCT" code="17373004" oid="17373004#SNOMEDCT" > > score="0.0" disambiguated="false" cui="C0227665" tui="T023" > > preferredText="Both kidneys"/> > > <org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="4483" > > codingScheme="SNOMEDCT" code="181414000" oid="181414000#SNOMEDCT" > > score="0.0" disambiguated="false" cui="C1278978" tui="T023" > > preferredText="Entire kidney"/> > > <uima.cas.FSArray _id="4513" size="3"> > > <i>4483</i> > > <i>4493</i> > > <i>4503</i> > > </uima.cas.FSArray> > > > > > > > > > > ************************************ > > My next query concern with relation extraction for which I use > > following engine. > > > > > > <import > > > location="../../../ctakes-relation-extractor/desc/analysis_engine/RelationExtractorAggregate.xml"/> > > > > output file is "note.txt_relation.xml" (attached here) > > > > > > I am not able to interpret the output file (note.txt_relation.xml) in > > which relation and their location is mentioned but could not figure > > out which entities and what relation between those entities in terms > > of words. > > > > > > For eg: > > > > > > <org.apache.ctakes.typesystem.type.relation.RelationArgument > > _indexed="1" _id="12422" id="0" _ref_argument="10680" > > role="Argument"/> > > <org.apache.ctakes.typesystem.type.relation.RelationArgument > > _indexed="1" _id="12427" id="0" _ref_argument="10989" > > role="Related_to"/> > > <org.apache.ctakes.typesystem.type.relation.RelationArgument > > _indexed="1" _id="12446" id="0" _ref_argument="10680" > > role="Argument"/> > > . > > . > > . > > . > > <org.apache.ctakes.typesystem.type.relation.RelationArgument > > _indexed="1" _id="12851" id="0" _ref_argument="12181" > > role="Related_to"/> > > <org.apache.ctakes.typesystem.type.relation.LocationOfTextRelation > > _indexed="1" _id="12432" id="0" category="location_of" > > discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0" > > conditional="false" _ref_arg1="12422" _ref_arg2="12427"/> > > <org.apache.ctakes.typesystem.type.relation.LocationOfTextRelation > > _indexed="1" _id="12456" id="0" category="location_of" > > discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0" > > conditional="false" _ref_arg1="12446" _ref_arg2="12451"/> > > <org.apache.ctakes.typesystem.type.relation.LocationOfTextRelation > > _indexed="1" _id="12480" id="0" category="location_of" > > discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0" > > conditional="false" _ref_arg1="12470" _ref_arg2="12475"/> > > <org.apache.ctakes.typesystem.type.relation.LocationOfTextRelation > > _indexed="1" _id="12508" id="0" category="location_of" > > discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0" > > conditional="false" _ref_arg1="12498" _ref_arg2="12503"/> > > > > > > > > > > Sorry for long and many queries at once. > > > > > > Thanks a lot in advance for your suggetions. > > > > > > With regards, > > Shrestha > > > > > > > > > >
