Alain Pierrot wrote:
> I'm working on an implementation of the ONIX standard
> http://www.editeur.org/
> (international standard for representing and communicating book industry
> product information, in XML ).
> The ONIX DTD
> http://www.editeur.org/onix/2.1/reference/onix-international.dtd
> is ambiguous; this feature results in an awkward situation when trying
> to use it in XXE:
>
> valid instances of ONIX files parse as valid and load;
>
> invalid instances are duly recognised but the error could (ideally)
> be more precisely spotted,
> and editing has to be done from the whole set elements from the DTD;
Note that as soon as document becomes valid, XXE automatically switches
back from lenient to strict mode.
> inserting elements from an ambiguous sub-tree (into a valid node) is
> simply impossible inside a valid file.
> These elements are ignored in the edition elements list.
> You can for instance try and build a second <Product> with the same
> structure as the first one in the attached file: impossible to append an
> <ISBN> element after <NotificationType>.
>
> Is there a (best practice) workaround?
>
> It would anyway be 'cool', for instance, to
> 1. be warned that the DTD is non-deterministic;
> 2. be offered the valid choices even in ambiguous nodes;
>
> or to be able to turn off validation in such situations? :-(
I don't see any problem with the DTD.
By the way, XXE does not care about the fact the DTD is deterministic or
non-deterministic. (XXE is implemented using a non-deterministic
automaton which backtracks if needed to.)
I see a user interface problem. XXE GUI is too simple to cope with
content models such as the one listed below.
Workaround #1 (ugly):
~~~~~~~~~~~~~~~~~~~~~
[1] In Product, use Force deletion to delete mandatory child
NotificationType. Your document is now temporarily invalid.
[2] Insert RecordSourceIdentifierType followed by
RecordSourceIdentifier (like the ISBN problem but simpler).
[3] Insert before these 2 elements, a NotificationType. Your document is
now valid again.
Workaround #2 (nice):
~~~~~~~~~~~~~~~~~~~~~
[1] Create an XXE configuration for ONIX.
[2] Create several named element templates for commonly used Product
elements:
---
PS: Help|Show Content Model, which is really great to have, showed me
this for element Product:
---
Element Product
~~~~~~~~~~~~~~~
Content model
~~~~~~~~~~~~~
(RecordReference ,
NotificationType ,
DeletionCode? ,
DeletionText? ,
RecordSourceType? ,
(RecordSourceIdentifierType , RecordSourceIdentifier)? ,
RecordSourceName? ,
(((ISBN , EAN13? , UPC? , PublisherProductNo? , ISMN? ,
DOI? , ProductIdentifier*) |
(EAN13 , UPC? , PublisherProductNo? , ISMN? , DOI? ,
ProductIdentifier*) |
(UPC , PublisherProductNo? , ISMN? , DOI? , ProductIdentifier*) |
(PublisherProductNo , ISMN? , DOI? , ProductIdentifier*) |
(ISMN , DOI? , ProductIdentifier*) |
(DOI , ProductIdentifier*) |
ProductIdentifier+) ,
Barcode* ,
ReplacesISBN? ,
ReplacesEAN13? ,
ProductForm ,
ProductFormDetail* ,
ProductFormFeature* ,
BookFormDetail* ,
ProductPackaging? ,
ProductFormDescription? ,
NumberOfPieces? ,
TradeCategory? ,
ProductContentType* ,
ContainedItem* ,
ProductClassification* ,
(EpubType ,
EpubTypeVersion? ,
EpubTypeDescription? ,
(EpubFormat , EpubFormatVersion?)? ,
EpubFormatDescription? ,
(EpubSource , EpubSourceVersion?)? ,
EpubSourceDescription? ,
EpubTypeNote?)? ,
((SeriesISSN? ,
PublisherSeriesCode? ,
SeriesIdentifier* ,
((TitleOfSeries , Title*) |
Title+) ,
Contributor* ,
NumberWithinSeries? ,
YearOfAnnual?) |
Series+ |
NoSeries)? ,
((ISBNOfSet? ,
EAN13OfSet? ,
ProductIdentifier* ,
((TitleOfSet , Title*) |
Title+) ,
SetPartNumber? ,
SetPartTitle? ,
ItemNumberWithinSet? ,
LevelSequenceNumber? ,
SetItemTitle?) |
Set+)? ,
TextCaseFlag? ,
((((DistinctiveTitle ,
(TitlePrefix , TitleWithoutPrefix)?) |
(TitlePrefix , TitleWithoutPrefix)) ,
Subtitle? ,
TranslationOfTitle? ,
FormerTitle* ,
Title*) |
Title+) ,
WorkIdentifier* ,
Website* ,
(ThesisType , ThesisPresentedTo? , ThesisYear?)? ,
((Contributor+ , ContributorStatement?) |
NoContributor)? ,
(ConferenceDescription |
(ConferenceRole? , ConferenceName , ConferenceNumber? ,
ConferenceDate? , ConferencePlace?) |
Conference+)? ,
((EditionTypeCode* , EditionNumber? , EditionVersionNumber? ,
EditionStatement?) |
NoEdition) ,
ReligiousText? ,
LanguageOfText* ,
OriginalLanguage? ,
Language* ,
NumberOfPages? ,
PagesRoman? ,
PagesArabic? ,
Extent* ,
NumberOfIllustrations? ,
IllustrationsNote? ,
Illustrations* ,
MapScale* ,
(BASICMainSubject , BASICVersion?)? ,
(BICMainSubject , BICVersion?)? ,
MainSubject* ,
Subject* ,
PersonAsSubject* ,
CorporateBodyAsSubject* ,
PlaceAsSubject* ,
AudienceCode* ,
Audience* ,
USSchoolGrade? ,
InterestAge? ,
AudienceRange* ,
AudienceDescription? ,
Complexity* ,
Annotation? ,
MainDescription? ,
OtherText* ,
ReviewQuote* ,
(CoverImageFormatCode , CoverImageLinkTypeCode , CoverImageLink)? ,
MediaFile* ,
ProductWebsite* ,
(PrizesDescription | Prize+)? ,
ContentItem* ,
((ImprintName , Imprint* , PublisherName? , Publisher*) |
(Imprint+ , PublisherName? , Publisher*) |
(PublisherName , Publisher*) |
Publisher+) ,
CityOfPublication* ,
CountryOfPublication? ,
CopublisherName* ,
SponsorName* ,
OriginalPublisher? ,
(PublishingStatus , PublishingStatusNote?)? ,
AnnouncementDate? ,
TradeAnnouncementDate? ,
PublicationDate? ,
(CopyrightStatement+ | CopyrightYear)? ,
YearFirstPublished* ,
(SalesRights ,
(SalesRights , SalesRights?)?)? ,
NotForSale* ,
SalesRestriction*)? ,
(((((Height , Width? , Thickness? , Weight?) |
Weight |
Measure+) ,
Dimensions?) |
Dimensions)?)? ,
(ReplacedByISBN? , ReplacedByEAN13? , AlternativeFormatISBN? ,
AlternativeFormatEAN13? , AlternativeProductISBN? ,
AlternativeProductEAN13? , RelatedProduct* , OutOfPrintDate?)? ,
(SupplyDetail*)? ,
(PromotionCampaign? , PromotionContact? , InitialPrintRun? ,
CopiesSold? , BookClubAdoption?)?)
---