The following comment has been added to this issue:
Author: Michael Windsor
Created: Wed, 14 Jul 2004 5:35 AM
Body:
Having done some further investigation (and limited testing), I believe I have located
the cause of this problem.
The function normalizeWhiteSpace() within SchemaValidator.cpp takes "chunks" of the
input stream (it's split into these chunks elsewhere) and does its work on them. The
last activity is to record whether whitespace is present at the end of the chunk by
setting a boolean (fTrailing) to be true if there is. This is then used in any
subsequent call to this function to establish how whitespace at the head of the next
chunk should be processed.
The problem is that this flag is set if there is a trailing space but is not cleared
if there is not, although it is cleared when reset() or certain other functions within
this class are invoked. There are only certain circumstances when this will be
important because in most situations, all the text between a pair of tags will be
processed as a single chunk and the flag is reset between tags. One of the reasons
that a chunk may end before the start of a new tag is that an entity is used within
the element and this was the case when I noticed the error.
This error will be quite rare because data between two tags must be split up into at
least three chunks and there must be whitespace after one but not after some
subsequent chunk (which is not the last one).
The fix is to add an "else" to the "if" statement at the end of the
normalizeWhiteSpace() function:
if (fCurReader->isWhitespace(*(srcPtr-1)))
fTrailing = true;
else
fTrailing = false;
---------------------------------------------------------------------
View this comment:
http://issues.apache.org/jira/browse/XERCESC-1239?page=comments#action_36655
---------------------------------------------------------------------
View the issue:
http://issues.apache.org/jira/browse/XERCESC-1239
Here is an overview of the issue:
---------------------------------------------------------------------
Key: XERCESC-1239
Summary: Schema length validation error in unions
Type: Bug
Status: Unassigned
Priority: Major
Project: Xerces-C++
Components:
Validating Parser (Schema) (Xerces 1.5 or up only)
Versions:
2.4.0
2.5.0
Assignee:
Reporter: Michael Windsor
Created: Fri, 2 Jul 2004 5:24 AM
Updated: Wed, 14 Jul 2004 5:35 AM
Environment: Tested on released Win32 execs and new 2.5.0 exec created with VC++ 7 on
WinNT SP6a and Win XP SP1
Description:
In certain circumstances, schema validation fails to correctly calculate the length of
a string with & (and possibly other) elements in it. The following schema and XML
produce the error in the Sax2Print example, although I first noticed the error when
using Xerces as a validator from within Xalan-C, so it is unlikely to be a problem
with this example only.
Test.xml:
=========
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="Test.xsd">
<flibble>curiouser & curiouser&curiouser</flibble>
</root>
Test.xsd:
=========
<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
attributeFormDefault="unqualified">
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="flibble">
<xs:simpleType>
<xs:union memberTypes="TextString
Null"/>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:simpleType name="TextString">
<xs:restriction base="xs:string">
<xs:minLength value="1" />
<xs:maxLength value="31" />
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="Null">
<xs:restriction base="xs:string">
<xs:length value="0" />
</xs:restriction>
</xs:simpleType>
</xs:schema>
Error message:
==============
Error at file F:\My Documents\Mike\Visual Studio
Projects\xerces-c-src_2_5_0\Build\Win32\VC7\Debug/Test.xml, line 3, char 60
Message: Datatype error: Type:InvalidDatatypeValueException, Message:Value
'curiouser & curiouser &curiouser' does not match any member types (of the union) .
There are a few things to note:
+ As you can see by counting the letters, the input string should fit the first member
of the union but an extra space has been put in before the second ampersand.
+ I have not determined the exact pattern within the string that causes this, but it
seems to require two ampersands and that the second not have a space before it
+ I do not know if this is restricted to & or is general to any other type of
escape sequence or a combination thereof (since more than one appears to be necessary.
+ This only happens for a union. If the schema simply provides a straight restriction
on the length of the string, there is no complaint from validation.
+ Running Sax2Print with -s (i.e. no validation) prints the input document with the
string processed correctly (i.e. the correct number of characters). It is only when
the validator is switched on that the extra space is produced. This is also the case
from XSLT operations within Xalan: the validator complains but if switched off, the
string is output correctly to the correct length.
I have spent some time trying to figure out what is going on in order to produce a
patch. I will continue to do so, but at the moment, I am not having much luck. If
anyone else with a better understanding of the code wants to jump in and steal my
thunder, I won't be at all offended.
---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]