Re: Running an AnalysisEngine on part of a document

Richard Eckart de Castilho Tue, 16 Feb 2016 05:42:16 -0800

Yes, most of the DKPro Core components rely on token/sentence annotations.

You can find a list of the types and which components create/consume them
here https://dkpro.github.io/dkpro-core/documentation/ under the section
"DKPro Core 1.8.0-SNAPSHOT" -> "Typesystem reference".


Best,

-- Richard

> On 16.02.2016, at 13:13, Nils Reiter <[email protected]> wrote:
> 
> Hi Richard,
> 
> thanks for your reply and don’t worry, I am planning on using DKpro 
> components :)
> 
> So if I get you correctly, all DKpro components rely on token/sentence 
> annotations and ignore the rest, right?
> 
> Best regards,
> Nils
> 
>> On 16 Feb 2016, at 12:18, Richard Eckart de Castilho <[email protected]> wrote:
>> 
>> Ok, sorry, the answer below would assume you are using DKPro Core components 
>> ;)
>> 
>> Sorry Nils, I didn't notice you were posting to the Apache UIMA list.
>> 
>> So for UIMA in general, I am not aware of a solution other that what you 
>> describe. So it would depend on the components / component collection that 
>> you are using.
>> 
>> Cheers,
>> 
>> -- Richard
>> 
>>> On 16.02.2016, at 12:17, Richard Eckart de Castilho <[email protected]> wrote:
>>> 
>>> The easiest would be to remove the token/sentence annotations of those 
>>> parts of the text that you do not care about.
>>> Or alternatively - if you have annotations that specifically mark the text 
>>> sections, then configure the segmenter component to create sentences/tokens 
>>> only within the boundaries of these annotations using PARAM_ZONE_TYPES and 
>>> PARAM_STRICT_ZONING.
>>> 
>>> Cheers,
>>> 
>>> -- Richard
>>> 
>>>> On 16.02.2016, at 12:02, Nils Reiter <[email protected]> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> is there a way to run an analysis engine on only a part of the CAS?
>>>> 
>>>> I have UIMA annotations over all the substrings that I want to process. 
>>>> The only way I could think of is creating new views or CASs for each 
>>>> string, but that would result in > 100 views. Is there a more 
>>>> straightforward way?
>>>> 
>>>> Background:
>>>> Only part of the CAS contains natural language, other parts are lists, 
>>>> names and headers. I would like to POS-tag the text, but not the rest.
>>>> 
>>>> Thanks in advance for any pointers or suggestions,
>>>> Nils
>>> 
>> 
>

Re: Running an AnalysisEngine on part of a document

Reply via email to