Re: [xml] Potential wrong usage of xmlIsID() in tree.c

Kasimier Buchcik Thu, 23 Feb 2006 06:06:05 -0800

Hi,

On Thu, 2006-02-23 at 08:08 -0500, Rob Richards wrote:
> Kasimier Buchcik wrote:
> > Hi
> >
> > On Wed, 2006-02-22 at 19:39 -0500, Rob Richards wrote:
> >   
> >>
> >> How do you figure (or are you referring to the case of a document not 
> >> parsed in validating mode)? A DTD doesn't allow a redefinition of an 
> >> element and an element can have only a single ID defined, so the query 
> >> for an element/attr combo should be enough. XML Schemas would be a whole 
> >> different story though.
> >>     
> >
> > The problem occurs regardless of a validation being performed or not.
> >
> > Example:


[ ... cut an invalid example ... ]

According to Daniel, the IDness based on DTDs can really be queried by a
element/attribute combination, without caring if at overall valid
position.

So maybe we could advance by looking at the second option: making the
DTD based detection optional.

> Thanks for the clarification. After your follow up message I thought you 
> meant that detection in general was broken.
> 
> I just saw your latest message and I was about to propose something 
> exactly along those lines. This way if it defaulted to enabled then 
> libxml2 could operate as intended. In the case using the lib to 
> implement DOM, it could be disabled and ID detection work as we have 
> kind of layed out in these messages. Is it possible to extend the 
> document node to include flags? This way it might also serve any future 
> need to provide some instructions or indications on the state of the 
> document?

Sounds great.
Daniel, could we have that flag field on xmlDoc?

> >>> To your question: Ah, good point :-) I think this should either be made
> >>> settable or be avoided. The latter being the solution I would prefer.
> >>>   
> >>>       
> >> I haven't scoured the XInclude code, but when a copy is being performed 
> >> from there, has the attribute being copied already been determined to be 
> >>     
> >
> > I tried to test this with the following scenario:

[ ... cut a scenario for XInclude with a DTD's IDs ... ]

> >
> > So the IDs have been detected, otherwise the XPointer expression
> > wouldn't work. But the XIncluded-doc does not seem to have been
> > validated, since the "xinc.xml" is clearly invalid wrt to the
> > DTD "xinc.dtd".
> >
> > I looked at Libxml2's XInclude code and found that
> > in xmlXIncludeParseFile(), XML_PARSE_DTDLOAD and XML_DETECT_IDS
> > are hard-coded to be set. So the reasons for the observed behaviour
> > are visible here. XML_PARSE_DTDVALID (switch on validation) is expected
> > to be set by the user.
> >
> > Hmm, is this the correct behaviour? Can we query for IDness without
> > knowing if the doc is valid?
> >   
> Not sure. I need to do more in-depth reading on XInclude to even come 
> close to answering this.

Daniel informed us that this is a valid behaviour.

> >   
> >> an ID? If so, I would not be adverse to having the copied attribute an 
> >> ID as well in all cases. I assume that the ID behavior when copying an 
> >> attribute is not going to be removed due to the use from XInclude, but 
> >> if the atype could be tested rather than xmlIsID, then the negative 
> >> impact would at least be less than it currently is.
> >>     
> >
> > I 100% agree.
> >
> >   
> >> When creating a new property is it possible that an ID check is only 
> >> performed if the attribute is being created in the scope of an element 
> >> and NULL is passed as the parent element to xmlIsID? This would at least 
> >> only create an ID if it is a proper xml:id.
> >>     
> >
> > I agree. The automatic IDness detection I'm trying to get rid of
> > is the one based on DTDs/schemata. xml:id should be detected.
> >
> > + 1) If creating a new attribute, I see the need to evaluate if the
> >   element of the attribute is inside the doc's tree and only then to
> >   create an ID.
> > + 2) Hmm, nasty, this would mean: a branch created outside the doc's
> >   tree needs to be looked up for xml:id attributes if such a branch
> >   is attached to the doc's tree. Is this correct? Plus, the other
> >   way round: remove all IDs if a branch is detached from the doc's
> >   tree. But I guess you have that already in mind.
> >   
> I would say a branch created outside of a doc's tree would not deal with 
> IDs. If there is no document, there is no ID (In reality I dont think 
> branches should be created at all without a document - maybe a node. 

I don't mean branches with xmlNode->doc == NULL. I mean branches which
are not inside the node-tree of the doc, i.e., which cannot be reached
if traversing the whole tree, starting with the document-node.

> SetTreeDoc or a node adoption function would need to handle settings the 
> IDness. This would also fix the ID table (remove them) for a doc whose 
> branching is being adpoted as well. Yes, nasty. Removing a branch would 
> not necessarily remove IDness. Unless the branch is being removed from 
> the tree and its document set to NULL, I don't think it is necessary to 
> remove IDness (unless the doc was normalized).

I'm not sure here: if we don't remove the IDs of a detached branch, then
Document.getElementById() would return elements which are not in the
doc's tree. I cannot give an argument based on the spec, since I don't
find it mentioned, that this method works on the doc's tree only, but
nodes which are removed from the doc's tree are a bit dangerous:
Example (pseudo code):

function myfunc() {
  Node node;
  Element elem;

  node = doc.documentElement().firstChild();
  doc.documentElement.removeChild(node);
  // is "node" still existent?
  elem = doc.getElementById('1');
}

Based on what programming language we use, the "node" might get
destroyed before or after the call to doc.getElementById(), resulting
in different results. So I don't think this method was designed
to handle nodes outside of the doc's tree.

> > What would become with attributes previously being IDs based on a
> > DTD, if we detach them from the doc's tree and attach them back again?
> > Would such attrs loose their IDness? I think yes; this sounds saner than
> > trying to preserve an atype == XML_ATTRIBUTE_ID on detached attributes,
> > and then add them blindly as IDs at places where they potentially are
> > no IDs according to a DTD.
> >   
> Yes they should lose the IDness. The only case where they probably 
> should not lose it if the attribute is replacing another existing 
> matching attribute within the same function call. (i.e. xmlSetProp). 

I found a DOM tread related to this:
http://lists.w3.org/Archives/Public/www-dom/2002OctDec/0111.html

Le Hegaret indicates that in this case the IDness is lost as well.
The thread also indicates that multiple IDs can exist on one element
in DOM, plus if we rename the attribute, then it still keeps on being an
ID. Sigh.

> Adding one back in for other cases should probably only create IDness in 
> the case of xml:id.
> > So maybe:
> > 1) if an ID-attr is detached, the ID is removed from the doc's list of
> >    IDs and the attr looses any trace of IDness.
> > 2) if an attr is added to the doc's tree, then it can become an ID, if:
> >   a) it is an xml:id
> >   b) it is make an ID explicitely via the API (this means adjust
> >     attr->atype and call xmlAddID())
> > 3) attrs which were IDs based on a DTD, can become IDs again
> >    if the doc is re-validated. By the way, this would reflect DOM's way.
> >   
> This is exactly how I was thinking it should probably work assuming for 
> 1 you mean an attribute is directly detached from its parent element and 
> not a branch containing the attribute.
> Rob

[See the example for doc.getElementById() above]

Regards,

Kasimier
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] Potential wrong usage of xmlIsID() in tree.c

Reply via email to