Re: [xml] Potential wrong usage of xmlIsID() in tree.c

Kasimier Buchcik Thu, 23 Feb 2006 04:05:26 -0800

Hi

On Wed, 2006-02-22 at 19:39 -0500, Rob Richards wrote:
> Kasimier Buchcik wrote:
> > Hi,
> >
> > On Tue, 2006-02-21 at 16:06 +0100, Kasimier Buchcik wrote:
> >   
> >> Hi,
> >>
> >> On Tue, 2006-02-21 at 08:56 -0500, Rob Richards wrote:
> >>     
> >>> Kasimier Buchcik wrote:
> >>>       
> >>>>> As far as copy/import/adopt goes. I have found a couple of conflicting 
> >>>>> opinions for this. One is that the importing doc should determine if an 
> >>>>> attribute is added as an ID. The other (which is also how it is 
> >>>>> implemented in Xerces) is that an attribute should be copied/imported 
> >>>>> with the same ID-ness as in the source document. The latter would be in 
> >>>>>     
> >>>>>           
> >>>> I agree with the latter mechanism. This looks like the only
> >>>> sane/efficient mechanism to me currently (this is also the reason why
> >>>> I'm not confident with automatic IDness detection).
> >>>>   
> >>>>         
> >>> I agree on the latter method as well. I do see some potential issues 
> >>> with the automatic detection after thinking about it more. From a purely 
> >>> DTD view, if the document is parsed without validation or attribute 
> >>> handling enabled, the current implementation will still create IDs 
> >>> modiying the tree. There is no indication in the document that
> >>> this was not desired. So the question comes to mind should IDs be 
> >>> created or not?
> >>>       
> >> Ah, good point. I think it should not automatically create IDs in this
> >> case, since we don't know if the attribute's element is valid, and thus
> >> if the element/attribute is valid at that position of the tree. So we
> >> could run into creating IDness for attributes which wouldn't become IDs
> >> if processed by the validator.
> >>     
> >
> > I noticed that this is actually not an answer for your question :-) but
> > writing this down had the nice side-effect that I realized that the
> > current detection of IDness could mark attributes as IDs, even if they
> > are not IDs, but invalid attributes according to a DTD; i.e., to query
> > the DTD for an element/attribute combination is not enough to evaluate
> > if an attribute is an ID. So one could argue that the existing detection
> > is a bug.
> >   
> How do you figure (or are you referring to the case of a document not 
> parsed in validating mode)? A DTD doesn't allow a redefinition of an 
> element and an element can have only a single ID defined, so the query 
> for an element/attr combo should be enough. XML Schemas would be a whole 
> different story though.


The problem occurs regardless of a validation being performed or not.

Example:

<!DOCTYPE foo [
  <!ELEMENT foo (bar)>
  <!ELEMENT bar>
  <!ATTLIST bar myId ID #IMPLIED>
]>

Assume we parse and validate the following XML:
<foo>
  <bar myID="1"/>
</foo>

The API wouldn't prevent us to add a <bar> to the <bar>:
<foo>
  <bar myID="1">
    <bar/>
  </bar>
</foo>

If we now add @myID to the newly created <bar>, then the current
ID-autodetection would mark it as of type ID:
<foo>
  <bar myID="1">
    <bar myID="2"/>
  </bar>
</foo>

The problem I see here, is that the second <bar> is not valid according
to the DTD, and thus its @myID shouldn't become an ID; that's why
querying the DTD for <bar>/@myID is not enough to evaluate if @myID is
of type ID.
This in mind, I think the DOM people had a good reason not to address
any schema/DTD based automatic ID-detection.

> > To your question: Ah, good point :-) I think this should either be made
> > settable or be avoided. The latter being the solution I would prefer.
> >   
> I haven't scoured the XInclude code, but when a copy is being performed 
> from there, has the attribute being copied already been determined to be 

I tried to test this with the following scenario:

"xinclude-test.xml": the including doc
--------------------------------------
<?xml version="1.0"?>
<foo xmlns:xi="http://www.w3.org/2001/XInclude";>        
        <xi:include href="xinc.xml" xpointer="id-01/1"/>        
        <xi:include href="xinc.xml" xpointer="id-02/1"/>
</foo>

"xinc.xml": the included doc
----------------------------
<?xml version="1.0"?>
<!DOCTYPE items SYSTEM "xinc.dtd">
<items xml:lang="en-us">
        <item id="id-01">
                <value>first</value>
        </item>
        <item id="id-02">
                <value>second</value>
        </item>
</items>

"xinc.dtd": the DTD
-------------------
<!ELEMENT item (value)>
<!ATTLIST item id ID #IMPLIED>

Note that this DTD does only reflect the <item>/@id
relationship.

xmllint produces the following result:

>xmllint --xinclude xinclude-test.xml
<?xml version="1.0"?>
<!DOCTYPE foo>
<foo xmlns:xi="http://www.w3.org/2001/XInclude";>
        <value>first</value>
        <value>second</value>
</foo>

So the IDs have been detected, otherwise the XPointer expression
wouldn't work. But the XIncluded-doc does not seem to have been
validated, since the "xinc.xml" is clearly invalid wrt to the
DTD "xinc.dtd".

I looked at Libxml2's XInclude code and found that
in xmlXIncludeParseFile(), XML_PARSE_DTDLOAD and XML_DETECT_IDS
are hard-coded to be set. So the reasons for the observed behaviour
are visible here. XML_PARSE_DTDVALID (switch on validation) is expected
to be set by the user.

Hmm, is this the correct behaviour? Can we query for IDness without
knowing if the doc is valid?

> an ID? If so, I would not be adverse to having the copied attribute an 
> ID as well in all cases. I assume that the ID behavior when copying an 
> attribute is not going to be removed due to the use from XInclude, but 
> if the atype could be tested rather than xmlIsID, then the negative 
> impact would at least be less than it currently is.

I 100% agree.

> When creating a new property is it possible that an ID check is only 
> performed if the attribute is being created in the scope of an element 
> and NULL is passed as the parent element to xmlIsID? This would at least 
> only create an ID if it is a proper xml:id.

I agree. The automatic IDness detection I'm trying to get rid of
is the one based on DTDs/schemata. xml:id should be detected.

+ 1) If creating a new attribute, I see the need to evaluate if the
  element of the attribute is inside the doc's tree and only then to
  create an ID.
+ 2) Hmm, nasty, this would mean: a branch created outside the doc's
  tree needs to be looked up for xml:id attributes if such a branch
  is attached to the doc's tree. Is this correct? Plus, the other
  way round: remove all IDs if a branch is detached from the doc's
  tree. But I guess you have that already in mind.

What would become with attributes previously being IDs based on a
DTD, if we detach them from the doc's tree and attach them back again?
Would such attrs loose their IDness? I think yes; this sounds saner than
trying to preserve an atype == XML_ATTRIBUTE_ID on detached attributes,
and then add them blindly as IDs at places where they potentially are
no IDs according to a DTD.

So maybe:
1) if an ID-attr is detached, the ID is removed from the doc's list of
   IDs and the attr looses any trace of IDness.
2) if an attr is added to the doc's tree, then it can become an ID, if:
  a) it is an xml:id
  b) it is make an ID explicitely via the API (this means adjust
    attr->atype and call xmlAddID())
3) attrs which were IDs based on a DTD, can become IDs again
   if the doc is re-validated. By the way, this would reflect DOM's way.

> The setprop function should be fine. I have a scaled down patch almost 
> ready that addresses that function correctly.
> 
> Other than completely removing automatic detection after the fact, I 
> think this may be the best bet for now. Anything more looks like will 
> require a bit of refactoring and probably a good bit of discussion.
> 
> Rob

Regards,

Kasimier
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] Potential wrong usage of xmlIsID() in tree.c

Reply via email to