On Thu, Feb 6, 2014 at 2:18 PM, Richard Eckart de Castilho
<r...@apache.org>wrote:
>
> Yep - sounds right. I'd probably do the naming a bit different though.
> What is input and output changes from component to component. So I'd used
> other names on the pipeline level, e.g.
>
>  RAW_DATA
>  EXTRACTED_TEXT
>  TRANSLATED_TEXT
>
> I'd hardcode simple names like "INPUT" and "OUTPUT" in the components
> and then map these to the pipeline-level-names:
>
> builder.add(TextExtractor,
>   "INPUT",  "RAW_DATA",
>   "OUTPUT", "EXTRACTED_TEXT");
> builder.add(Parser,
>   CAS.NAME_DEFAULT_SOFA, "EXTRACTED_TEXT");
> builder.add(Translator,
>   "INPUT",  "EXTRACTED_TEXT",
>   "OUTPUT", "TRANSLATED_TEXT");
>
>
That's sounds a good idea indeed.


> > I have to say this feature is quite interesting but in fact the type
> > systems are the components generating the real pain… :)
>
> What's the pain with the type systems?
>

In my experience of these weeks, I've noticed the most difficult part to
overcome when you want to integrate a new annotator are the parameters and
the type systems. Understand the type systems and how they are used is not
always straightforward. In fact, for experienced developer is not such a
big problem, and UIMA fit helps quite a lot. ;-)

Cheers and thanks again!
-- 
Luca Foppiano

Software Engineer
+31615253280
l...@foppiano.org
www.foppiano.org

Reply via email to