Hi Manuel,

oh yes, forgot about the element name. Thank you for the patch, I will
integrate it.
The common procedure would be to attach the patch to a jira issue. I
will take care of it, but you are of course also welcome to attach it :-)

Best,

Peter

Am 22.10.2015 um 12:43 schrieb Manuel Ciosici:
> Hello Peter,
> I looked a bit a the new regular expression and there are still some
> cases that aren’t caught. More specifically, it won’t annotate XML
> tags that have a dash in their name, so tags such as:
> <first-name>
> aren’t caught by the current regular expression. I’ve changed the
> expression so that it works. What I did was change the \w+ part from
> the tag name into \w[\w-]* since XML tag names can contain dashes, but
> cannot start with dashes. I’ve also updated the unit test so that
> there are tags with dashes and underscores and also one non-tag.
> I’m attaching the SVN patch to this email.
> Manuel
>
>
> >Thanks Peter, > >The quotes are just normal quotes in the original source 
> >but the
> mail software must have changed >this. Sorry about that
> misunderstanding. > >Cheers >Mario > >> On 21/10/2015, at 16.03, Peter
> Klügl <[email protected] <mailto:[email protected]>>
> wrote: >> >> Hi, >> >> I extended the pattern to support dashes, but
> not the other quotes. This >> can get arbitrary complex (and slow) if
> any combination of unicode >> characters that look like quotes should
> be supported. I still think that >> this is not valid xml. Can you
> give me a link to the standard? >> >> It's maybe better to solve this
> in a specific use case before applying >> the seeder. >> >> Best, >>
> >> Peter >> >>> Am 20.10.2015 um 19:22 schrieb Mario Gazzo: >>> I
> believe it should be extended since I think that a RUTA user would
> expect that >the MARKUP annotation indeed captures at least XML and
> HTML markup properly. The examples >are from a Pub Med Central XML
> file that follows the NISO JATS specification so I will assume >it is
> proper formatted XML without knowing all the details of the spec. >>>
> >>> We have managed to implement a crude workaround for now but let us
> know when an improved >version becomes available. >>> >>> Cheers >>>
> Mario >>> >>>> On 20 Oct 2015, at 17:56 , Peter Klügl
> <[email protected] <mailto:[email protected]>> wrote:
> >>>> >>>> Hi Mario, >>>> >>>> yes, and the different quote also causes
> problems (are these valid?). >>>> >>>> The MARUP annotation is not
> created by jflex like the other annoations, >>>> but by a
> postprocessing step using an regular epression. This expression >>>>
> does not cover theses cases (markupPattern in DefaultSeeder.java).
> >>>> >>>> Should we extend it? >>>> >>>> Best, >>>> >>>> Peter >>>>
> >>>>> Am 20.10.2015 um 17:26 schrieb Mario Gazzo: >>>>> Hi Peter,
> >>>>> >>>>> RUTA doesn’t seem to capture some XML markup with
> attributes. Here are >some examples: >>>>> >>>>> <xref ref-type="bibr"
> rid="b35-ehp0113-000220”> >>>>> <sec sec-type="methods”> >>>>> >>>>>
> The above markup examples are totally missing in the TokenSeed
> annotations. >I wonder whether it is related to the dash in the
> attribute names since other markup without >this appear to be
> captured. >>>>> >>>>> Can you confirm that the dash could cause the
> problem? >>>>> >>>>> Cheers >>>>> Mario >> >

Reply via email to