I'm not sure I see any problems. The generated DTD, though syntactically a
little different, should have the same effects. There is no such thing as a
single '#PCDATA'. If you give #PCDATA as the content model, then all that's
saying is that it can have all the text it wants. There shouldn't be any
difference between (#PCDATA) and (#PCDATA)*, or between (Foo*) and (Foo)*
really. Are you saying it won't go back through the parser as a legal file?
If you give (Foo) as then content model, then that should come back out as
(Foo) again, which would be a single instance of a the Foo element.
----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
[EMAIL PROTECTED]
Serge Krepak <[EMAIL PROTECTED]> on 04/02/2000 06:17:03 PM
Please respond to [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
cc:
Subject: May be an element content declaration bug
G'day all,
Recently downloaded xerces 1.1.0 library to play with and to find the way
to embed it into existing project. Great lib really, thank's to all who put
their
efforts in development.
I have been trying to regenerate DTD from parser using validator that is a
part
of the library. This is necessarily for me to get control on DTD header
creation
process. Little problem I faced with is the SAX parser does not process
correct
some element's content declarations. Have a look at the following XML DTD
code:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FORM
[
<!ELEMENT FORM (ORIGIN, FIELD_LIST)>
<!ATTLIST FORM TYPE CDATA #REQUIRED>
<!ATTLIST FORM CONTENT CDATA #REQUIRED>
<!ELEMENT ORIGIN (#PCDATA)>
<!ELEMENT FIELD_LIST (FIELD*)>
<!ELEMENT FIELD (VALUE?, EDIT_STATE?)>
<!ATTLIST FIELD ID CDATA #REQUIRED>
<!ELEMENT VALUE (#PCDATA)>
<!ELEMENT EDIT_STATE (#PCDATA)>
]>
<FORM TYPE="QN1-9.1999" CONTENT="data">
<ORIGIN>Office</ORIGIN>
<FIELD_LIST>
<FIELD ID="A1">
<VALUE>123456</VALUE>
<EDIT_STATE>readonly</EDIT_STATE>
</FIELD>
</FIELD_LIST>
</FORM>
This code is complient with current XML standard, isn't it? After I fed it
to SAX parser
and regenerated it from validator I've got the following:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FORM [
<!ELEMENT FORM (ORIGIN,FIELD_LIST)>
<!ATTLIST FORM CONTENT CDATA #REQUIERED>
<!ATTLIST FORM TYPE CDATA #REQUIERED>
<!ELEMENT ORIGIN (#PCDATA)*>
<!ELEMENT FIELD_LIST (FIELD)*>
<!ELEMENT FIELD (VALUE?,EDIT_STATE?)>
<!ATTLIST FIELD ID CDATA #REQUIERED>
<!ELEMENT VALUE (#PCDATA)*>
<!ELEMENT EDIT_STATE (#PCDATA)*>
]>
<FORM TYPE="QN1-9.1999" CONTENT="data">
<ORIGIN>Office</ORIGIN>
<FIELD_LIST>
<FIELD ID="A1">
<VALUE>123456</VALUE>
<EDIT_STATE>readonly</EDIT_STATE>
</FIELD>
</FIELD_LIST>
</FORM>
You can see the difference in #PCDATA and FILED* declarations before and
after. DTD validator that is coming with library does not process asterisk
as part of name in content declaration (that's correct form lexical point
of
view)
and all the time uses mixed model because of #PCDATA. But I can't recreate
DTD in original state. Can I generate content declaration for element that
may
appear only once with no changes in parser code? Can you give me a tip
how to path this problem?
I'm using the following environment:
Version number of Xerces-C: 1.1.0
OS platform/version: NT4+SP5
Compiler/version: MSVC5
Sample XML file that causes the bug: not required
Sample Schema file: not required
Sample DTD file: see above
Kind regards,
---------------------------------------------------------------
Serge V. Krepak
Software Engineer ...currently for
Quicken Australia
Unit 4, 7-26 Bridge Road
Stanmore NSW 2048
PH: +61 2 9519 3216
FX: +61 2 9519 3459
It's Quicken.Easy at <http://www.quicken.com.au>
---------------------------------------------------------------
This e-mail is for its intended recipient only. If this e-mail has been
sent
to you in error, or contains privileged or confidential information, or the
contact details of other persons, then you must not copy or distribute this
information and you must delete the e-mail message and notify the sender.
Liability is expressly excluded in the event of viruses accompanying this
e-mail or any attachment.