2009/5/19 Manuel Fiorelli <[email protected]>

> I would like to see a well-established way to analyze semi-structured
> documents, such as (X)HTML pages. UIMA shouldn't provide its own
> parser, but at least a type system (like uima.cas) to represent a DOM
> Document within a CAS instance (the simplest solution is to represent
> element nodes as feature structures and text nodes as annotations on
> the plain text, but I suspect there are more convenient solutions).
>

I do agree with this.
Tommaso

Reply via email to