I believe it should be extended since I think that a RUTA user would expect that the MARKUP annotation indeed captures at least XML and HTML markup properly. The examples are from a Pub Med Central XML file that follows the NISO JATS specification so I will assume it is proper formatted XML without knowing all the details of the spec.
We have managed to implement a crude workaround for now but let us know when an improved version becomes available. Cheers Mario > On 20 Oct 2015, at 17:56 , Peter Klügl <[email protected]> wrote: > > Hi Mario, > > yes, and the different quote also causes problems (are these valid?). > > The MARUP annotation is not created by jflex like the other annoations, > but by a postprocessing step using an regular epression. This expression > does not cover theses cases (markupPattern in DefaultSeeder.java). > > Should we extend it? > > Best, > > Peter > > Am 20.10.2015 um 17:26 schrieb Mario Gazzo: >> Hi Peter, >> >> RUTA doesn’t seem to capture some XML markup with attributes. Here are some >> examples: >> >> <xref ref-type="bibr" rid="b35-ehp0113-000220”> >> <sec sec-type="methods”> >> >> The above markup examples are totally missing in the TokenSeed annotations. >> I wonder whether it is related to the dash in the attribute names since >> other markup without this appear to be captured. >> >> Can you confirm that the dash could cause the problem? >> >> Cheers >> Mario >
