Hi Gareth,

I am not real knowledgeable on the xerces code base; I'm just a consumer
of xerces, but I monitor this list.  If xerces uses GNU regex (i.e.,
regcomp(3)), the problem may be in the call to regexec() itself.

We had a similar problem trying to use a regexp to parse the body of
emails, allowing all the <CR><LF> terminated lines that were not
followed by the .<CR><LF> termination sequence.  It worked with short
emails, but not with long ones.

As nearly as I can tell (having followed the regex code in gdb as a
regex non-expert), every time the regex parser matches a substring
subject to the final "*", it has to push a jumpback point onto its
internal stack.  Eventually, the stack becomes too big and regexec()
gets unpredictable.  Can't remember whether it was a stack overflow, a
hardcoded limit in the lib, or something else.

I know this is not a very satisfactory observation from the standpoint
of the fix.  We ultimately refactored the regex in the email case to
just finding each <CR><LF> terminated line; that is, we got rid of the
final "*".  Your bug reporter could implement a similar workaround by
making each of his ";" terminated lines a node.

Hope this helps avoid the chase of the wild goose.

Thanks for the great product!

Regards,
Mark

On Mon, 17 May 2004 01:01:27 -0700 (PDT)
Gareth Reakes <[EMAIL PROTECTED]> wrote:

> Hey,
>        this is in my court. I have a minimal sample that reproduces
>        the
> problem. I have had a quick look at the code and saw nothing obvious. 
> I have some time scheduled for xerces work today and tomorrow. This is
> after the element from the wrong document being returned bug. If I
> cant fix it in that time then I will commit a bug with the minimal
> sample to see if anyone else wants a go.
> 
> Gareth
> 
> 
> On Mon, 17 May 2004, Heeg, Michael wrote:
> 
> > Hi everybody,
> >
> > has anyone found a solution for my "pattern" problem?
> >
> > Regards,
> > Michael
> >
> > > -----Ursprüngliche Nachricht-----
> > > Von: Heeg, Michael
> > > Gesendet: Donnerstag, 15. April 2004 09:08
> > > An: '[EMAIL PROTECTED]'
> > > Betreff: Strange problem with pattern (Xerces 2.5.0 crashes)
> > >
> > >
> > > Hi,
> > >
> > > I am using Xerces-C 2.5.0 in my MS Visual C++ application.
> > > When validating
> > > XML files against a specified schema, the parser sometimes
> > > crashes with an
> > > "unexpected exception". I found out that the reason for the
> > > crashes is the
> > > following restriction of the schema (see "Body" element):
> > >
> > > <xsd:complexType name="InputFileType">
> > >   <xsd:sequence>
> > >           <xsd:element name="Head" type="HeadType"/>
> > >           <xsd:element name="Body">
> > >                   <xsd:simpleType>
> > >                           <xsd:restriction base="xsd:string">
> > >                                   <xsd:pattern
> > > value="(\n*[0-9]*,[0-9]*,(\-*[0-9]*\.*[0-9]*,)*\-*[0-9]+\.*[0-
> > > 9]*;\n*)*"/>
> > >                           </xsd:restriction>
> > >                   </xsd:simpleType>
> > >           </xsd:element>
> > >   </xsd:sequence>
> > > </xsd:complexType>
> > >
> > > The restriction is defined to validate <Body> tags like the
> > > following:
> > >
> > > <Body>
> > > 0,10,0.199,10.199,0.008;
> > > 1,20,0.389,20.389,0.059;
> > > 2,30,0.565,30.565,0.180;
> > > 3,40,0.717,40.717,0.369;
> > > 4,50,0.841,50.841,0.596;
> > > 5,60,0.932,60.932,0.810;
> > > ....
> > > </Body>
> > >
> > > The strange thing is: when the <Body> tag contains a large
> > > amount of data,
> > > the validation of the restriction leads to the unexpected
> > > exception. But:
> > > with a small amount of data, everything works fine. (Also:
> > > when I delete the
> > > restriction from the schema, everything works fine.)
> > >
> > > For me this looks like a Xerces bug?! Am I wrong? Any suggestions
> > > or comments?
> > >
> > > Best regards,
> > > Michael
> > >
> > >
> > > P.S.: I know that the way we use this <Body> tag is not the
> > > best way to
> > > handle csv-like data, but I had to do this because of an existing
> > > file format.
> > >
> > > -----------------------------------------------------------------
> > > ---- To unsubscribe, e-mail:
> > > [EMAIL PROTECTED] For additional commands,
> > > e-mail: [EMAIL PROTECTED]
> > >
> >
> >
> 
> -- 
> Gareth Reakes, Managing Director      Parthenon Computing
> +44-1865-811184                  http://www.parthcomp.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to