Re: [xml] redicting parts of trees

Daniel Veillard Mon, 16 May 2005 07:34:24 -0700

On Mon, May 16, 2005 at 03:47:20PM +0200, [EMAIL PROTECTED] wrote:
> Hi,
> 
> Obviously it's not enough to say that it is a sketch when Daniel is around
> ;-)


  don't get me wrong, I just wanted to provide some comments :-)

> > Von: Daniel Veillard <[EMAIL PROTECTED]>
> > Datum: Mon, 16 May 2005 08:28:27 -0400
> > 
> > On Mon, May 16, 2005 at 02:04:44PM +0200, [EMAIL PROTECTED] wrote:
> [...]
> >   Quick comments on it:
> > > [...]
> > > static int
> > > xmlDOMWrapAdoptNode(void *ctxt, xmlDocPtr sourceDoc, xmlDocPtr destDoc,
> > >           xmlNodePtr node, xmlNodePtr parent, int unlink)
> > > {
> > 
> >   sourceDoc is redundant, can be extracted from node->doc
> 
> OK. node->doc can be NULL if created with xmlNewNode.

  Hum, but if plugged in the doc it should have been updated, the underlying
question is how permissive the function should be with respoect to the input
document, I tend to agree with you it should try to cope even with weird
input.

> >   parent should be optional NULL would be similar to the real DOM function
> 
> @parent is not needed; it's a left-over from xmlStaticCopyNode(), which I
> used as the starting point.

  I would keep it (as optional) to allow better ns integration if possible.

> >   Error handling should be designed. A simple -1 error code back is not
> > really
> > suitable for the kind of complex operation that is being designed here.
> 
> OK.
> 
> > >   case XML_DOCUMENT_NODE:
> > >         case XML_HTML_DOCUMENT_NODE:
> > 
> >   XML_HTML_DOCUMENT_NODE and XML_DOCUMENT_NODE may not generate an
> > error...
> > I could think of a semantic for this, need to be checked against DOM.
> 
> Document nodes cannot be adopted as per DOM spec.

  okay, that clear :-)

> > >     sameDict = ((sourceDoc->dict == destDoc->dict) &&
> > >   (destDoc->dict != NULL)) ? 1 : 0;
> > >     cur = node;
> > 
> >    if parent != NULL collect existing inscope namespaces
> 
> @parent will not be used.

  I would really keep it if possible.

> > >           if (! sameDict) {
> > 
> > 
> >    Wrong you need to check xmlDictOwns(sourceDoc->dict, cur->name)
> > too or you are gonna leak cur->name if the node was added manually
> 
> So we always need a xmlDictOwns check? Can you see a constellation where we
> can avoid this?

  not really, but it's a really fast call, usually it's a call with 2
just 2 pointer comparison.

> > >               if (destDoc->dict)
> > >                   cur->name = xmlDictLookup(destDoc->dict, cur->name, -1);
> > >               else if (sourceDoc->dict)
> > >                   cur->name = BAD_CAST xmlStrdup(cur->name); 
> > >               /*
> > >               * TODO: Are namespace declarations ever in a dict?
> > >               */
> 
> Are dicts ever used for namespace declarations?

  of course, SAX2 always pass namespaces related strings from a the document
parser dictionary. It's the norm rather than the exception.

> > >           }
> > >           /*
> > >           * Adopt out-of-scope namespace declarations.
> > >           */
> > >           if (cur->ns != NULL) {
> > >               int i, j;
> > 
> >   I would rather use a hash table than comparing all namespaces string
> 
> Namespace strings are not compared. Only the pointers to xmlNs.

  But at some point you will need to compare strings, for example when
updating that list. There is 2 aspects, speed and code maintainance. I
can't tell about the speed but from a code cleanness POV it seems a more
abstract based view using hashes would make the code less confusing. But
it's all implementations details nothing critical.

> > >               } else {
> > >                   /*
> > >                   * User-defined behaviour.
> > >                   */
> > 
> >    you can't do that. ctxt need to be refined to be actually useful, a
> > void * won't work. And adding 2 args might be just a bit too much, this
> > need
> > more thinking
> 
> Ctxt will not be void* in the end. We have to design a nice struct for it.

  okay :-)

> > 
> >    I would really rather use a dictionnary for nsList it would be way
> > cleaner.
> > the only problem is that it would require a trick like a function
> > recursion
> > when encountering a namespace deactivation like xmlns="" or xmlns:foo=""
> > or namespace redefinition to a diferent value but that quite unfrequent.
> 
> xmlns:foo="" is not an allowed namespace declaration as far as I know.

  yes in namespace-1.1

> xmlns="" should not be referenced by an node->ns entry, since it is just a
> machanism to disable the default namespace.

  yes but it's on the nsDef, that's the only way to implement this.

> Can you clarify why you want a hash here? The mechanism just assures that

  see before.

> references (node->ns) to the xmlNs entries will be valid, thus picked from
> or created in "oldNs" if the original xmlNs entries are out-of-scope. It is
> not a namespace reconciliation mechanism; it just unlinks the branch and
> keeps namespace references alive; nsDef entries are not touched.
> 
> > >       case XML_ENTITY_REF_NODE:
> > >           /*
> > >           * TODO: Remove entity child nodes.
> > >           */
> > >           goto internal_error;
> > >           break;
> > 
> >   forces a recursion see other examples of recursive tree walk with 
> > entities references. Potentially a lookup of the entity being ref'ed
> > >from the target document. XInclude has a semantic for such entities 
> > remapping might use the same.
> 
> OK. The spec wants the referenced entity to be discarded; an entity of the
> destination document will be assigned if available.

  if the spec has a clear behaviour for this then let's follow it :-)

> > 
> > 
> >   Hum, I seems to have missed handling XML_ELEMENT_NODE especially the
> > part handling nsDef on those.
> 
> It's the first switch-case :-) Do we have to touch nsDef entires here?

  :-)
  touch no, read yes, I probably missed it. Currently the list based code
is a bit frightening, using dict might improve that aspect, if only in the
design/review phase.

> > 
> >   Obviously lot of thinking and testing need to be carried on. I would
> > really
> > like to get something we can finally rely on and not half of solutions.
> 
> Well, who doesn't? No offense: Daniel, if you can conjure a working solution
> out of the box, then just do it. I'd love to see this issues being solved by

  Hum, there is a misunderstanding :-)
  By half solutions I meant the couple of existing non-complete function
which are already in tree.c and which I pointed to in earlier parts of this
thread. I was not complaining about your attempt !

> others, really. You could add the entity refernce stuff and fix the dict
> parts, for I don't have the big picture in mind regarding the string dicts.
> We'll surely need some thinking on the context struct, so it won't be
> finished and maby should not be finished that fast.

  There is no hurry, I just want to end up with a complete solution this time
even if it takes a month before everybody agree the resulting code is usable
for them.

> >   Thanks a lot for starting the effort though there is obviously some work
> > left :-)
> > 
> > Daniel
> 
> Proposal for an initial function head and return values:
> 
> /**
>  * xmlDOMWrapAdoptNode:
>  * @ctxt: the optional wrapper context
>  * @destDoc: the adopting document
>  * @node: the node to be adopted
>  *
>  * Unlinks and adopts a node.
>  *
>  * Returns: 0 in case of success
>  *          1 if @node cannot be adopted
>  *          -1 in case of an API or internal error.
>  */
> static int
> xmlDOMWrapAdoptNode(void *xmlSomeStructNamePtr,
>                     xmlDocPtr destDoc,
>                     xmlNodePtr node)
> 

Still open at the API level :
  - following your comment sourceDoc might be added to cope with
    node->doc == NULL, but this opens the door to more problem like
    node->doc != NULL and node->doc != sourceDoc :-\
  - I still think an optional parent node to be able to make namespace 
    integration, it's incredible how much people have been complaining
    for example in XSLT when cut and pasting trees and extra ns definition
    were generated. Sometimes DTD are defined to allow the namespaces
    only on the root element for example.
  - more complex error handling. Potentially this could be options passed
    to the function on how to process default edge cases.

 thanks again,

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] redicting parts of trees

Reply via email to