Hi Gareth, I am not real knowledgeable on the xerces code base; I'm just a consumer of xerces, but I monitor this list. If xerces uses GNU regex (i.e., regcomp(3)), the problem may be in the call to regexec() itself.
We had a similar problem trying to use a regexp to parse the body of emails, allowing all the <CR><LF> terminated lines that were not followed by the .<CR><LF> termination sequence. It worked with short emails, but not with long ones. As nearly as I can tell (having followed the regex code in gdb as a regex non-expert), every time the regex parser matches a substring subject to the final "*", it has to push a jumpback point onto its internal stack. Eventually, the stack becomes too big and regexec() gets unpredictable. Can't remember whether it was a stack overflow, a hardcoded limit in the lib, or something else. I know this is not a very satisfactory observation from the standpoint of the fix. We ultimately refactored the regex in the email case to just finding each <CR><LF> terminated line; that is, we got rid of the final "*". Your bug reporter could implement a similar workaround by making each of his ";" terminated lines a node. Hope this helps avoid the chase of the wild goose. Thanks for the great product! Regards, Mark On Mon, 17 May 2004 01:01:27 -0700 (PDT) Gareth Reakes <[EMAIL PROTECTED]> wrote: > Hey, > this is in my court. I have a minimal sample that reproduces > the > problem. I have had a quick look at the code and saw nothing obvious. > I have some time scheduled for xerces work today and tomorrow. This is > after the element from the wrong document being returned bug. If I > cant fix it in that time then I will commit a bug with the minimal > sample to see if anyone else wants a go. > > Gareth > > > On Mon, 17 May 2004, Heeg, Michael wrote: > > > Hi everybody, > > > > has anyone found a solution for my "pattern" problem? > > > > Regards, > > Michael > > > > > -----Ursprüngliche Nachricht----- > > > Von: Heeg, Michael > > > Gesendet: Donnerstag, 15. April 2004 09:08 > > > An: '[EMAIL PROTECTED]' > > > Betreff: Strange problem with pattern (Xerces 2.5.0 crashes) > > > > > > > > > Hi, > > > > > > I am using Xerces-C 2.5.0 in my MS Visual C++ application. > > > When validating > > > XML files against a specified schema, the parser sometimes > > > crashes with an > > > "unexpected exception". I found out that the reason for the > > > crashes is the > > > following restriction of the schema (see "Body" element): > > > > > > <xsd:complexType name="InputFileType"> > > > <xsd:sequence> > > > <xsd:element name="Head" type="HeadType"/> > > > <xsd:element name="Body"> > > > <xsd:simpleType> > > > <xsd:restriction base="xsd:string"> > > > <xsd:pattern > > > value="(\n*[0-9]*,[0-9]*,(\-*[0-9]*\.*[0-9]*,)*\-*[0-9]+\.*[0- > > > 9]*;\n*)*"/> > > > </xsd:restriction> > > > </xsd:simpleType> > > > </xsd:element> > > > </xsd:sequence> > > > </xsd:complexType> > > > > > > The restriction is defined to validate <Body> tags like the > > > following: > > > > > > <Body> > > > 0,10,0.199,10.199,0.008; > > > 1,20,0.389,20.389,0.059; > > > 2,30,0.565,30.565,0.180; > > > 3,40,0.717,40.717,0.369; > > > 4,50,0.841,50.841,0.596; > > > 5,60,0.932,60.932,0.810; > > > .... > > > </Body> > > > > > > The strange thing is: when the <Body> tag contains a large > > > amount of data, > > > the validation of the restriction leads to the unexpected > > > exception. But: > > > with a small amount of data, everything works fine. (Also: > > > when I delete the > > > restriction from the schema, everything works fine.) > > > > > > For me this looks like a Xerces bug?! Am I wrong? Any suggestions > > > or comments? > > > > > > Best regards, > > > Michael > > > > > > > > > P.S.: I know that the way we use this <Body> tag is not the > > > best way to > > > handle csv-like data, but I had to do this because of an existing > > > file format. > > > > > > ----------------------------------------------------------------- > > > ---- To unsubscribe, e-mail: > > > [EMAIL PROTECTED] For additional commands, > > > e-mail: [EMAIL PROTECTED] > > > > > > > > > -- > Gareth Reakes, Managing Director Parthenon Computing > +44-1865-811184 http://www.parthcomp.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]