RE: cherry picking nodes [LONG and NOT urgent]

Swanson, Brion Fri, 20 Jul 2001 06:30:30 -0700
I think I heard (a rumor?) that Xerces2 will use a lighter-weight DOM (based
on, or using DTM?) versus the larger, clunkier DOM that Xerces1 uses.  And
then again, I might just be talking out of wishful thinking.

Could someone acknowledge or refute my statement?

Brion Swanson

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Friday, July 20, 2001 3:41 AM
To: [EMAIL PROTECTED]
Subject: RE: cherry picking nodes [LONG and NOT urgent]



Thanks for your reply, it always feels good to know there is life out
there.

Anyway, reading your solution, and re-reading my question again, I think I
am chasing a wild goose.

Your solution is similar to mine, tho you use getElementByTagName() where I
use XPath (I do have a good reason for that).

In fact I think what I want is to be able to apply XPath to something
lighter (in bytecode) than a DOM, if there is such a thing. But I guess the
complexity of XPath queries require something like the DOM. Another
hypothetical solution would be the ability to pre-serialize the Nodes I am
happy to ignore, assuming that a serialized Node takes up less memory than
the Node itself.

Maybe I ought to look into the code itself to find out what exactly is a
DOM document (but I am too new to Java and doubt I could make sense of it)

Is it a heavy-duty lists of heavy-weight Nodes that contain their
attributes and values and heavy-duty lists objects of children and sibling
Nodes?
Or is it a light-weight linked list of light-weight Nodes that only point
to a single common binary representation of the XML?
Or something else?

Does anyone know of a good resource about these deep meaningfull questions?

Thanks for your comments.




"Swanson, Brion" <[EMAIL PROTECTED]> on 19/07/2001 20:53:02

Please respond to [EMAIL PROTECTED]

To:   "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
cc:
Subject:  RE: cherry picking nodes [LONG and NOT urgent]


I'm not sure my solution is exactly what you're looking for, but at least
it
SOUNDS different from what you're currently doing.

First of all, you still need to parse the XML document into a DOM (if your
getting the NodeSet from a file, otherwise you can probably just pass in an
already-build DOM tree).

Second, use the org.w3c.dom.Document.getElementsByTagName(String name) to
get a NodeList of all of your 'target' nodes.  In this way, you've only
selected exactly those nodes that you wanted (and each of them know about
their parent and children).  If you need sibling information, simply get
the
node's parent and traverse it's children.

This way, it saves you having to hardcode (or to know in any manner) the
xpath of the target node beyond it's name.  It also returns you a nice neat
list of nodes that you can convert into a hashtable if you want for fast
lookup.

The changes you make to those nodes are 'live' changes, meaning you are
changing the actual DOM tree since Java passes most everything by
reference.

Finally, when you're ready to write it all out, you simply have to get the
document element (if you haven't already) and serialize it!  Voila!

A code snippet might look similar to the following:

  NodeList targetNodes = myDocument.getElementsByTagName("target");
  Hashtable nodeTable = new Hashtable();
  for (int i=0; i<targetNodes.getLength(); i++) {
    Node target = targetNodes.item(i);
    String id = ((Element)target).getAttribute("id");

    nodeTable.put(id, target);
  }

  for (Enumeration keys = nodeTable.keys();  keys.hasMoreElements();) {
    Node currentNode = (Node)nodeTable.get(keys.nextElement());
    // ... do something with this target node ...
  }

    // ... now we're done, serialize the Document ...

Hope this helps!

Brion Swanson

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, July 18, 2001 5:30 AM
To: [EMAIL PROTECTED]
Subject: cherry picking nodes [LONG and NOT urgent]



Hiya,

Can anyone think of a nice alternative to my newbie solution to the
following problem? All comments will be greatly appreciated.

Starting from something like :

<A>
  <B>
   <C/>
   <target id="one"/>
   <D>
    <E/>
    <target id="two"/>
   </D>
  </B>
  <target id="three"/>
  <E>
    <F>
      <G>...</G>
   </F>
  </E>
</A>

I want to build some sort of hash table {"one": node1, "two":node2,
"three": node3} as soon as possible. I really don't care about other nodes,
but I'd like the resulting object to be as light weight as possible. The
aim is to latter come back and replace the <target> nodes with specific
data, without having to traverse the DOM (or whatever memory representation
of the parsed xml) again.

Right now, I have taken the heavy weight approach:

1) Parse the document as a DOM
2) Find my <target> nodes (using XPath) and add them to my map.
3) use the map to setup the content of the target nodes
4) keep the (DOM+map_ object around for writing to file, or further use of
the map...

The serialized object (DOM+hashmap) is about 14kb for a 4kb xml source.
Considering that most of this is information about nodes I don't care about
(tho I can't discard them because I need them to create the final xml
document), I am looking for an alternative approach, using a lighter
representation of the DOM.

A good example is the <E> node. Once I know that it doesn't contain any
targets, I don't need to know about its children or siblings. If fact, to
make latter serialization (to a string) faster, I'd like to keep it as a
string.

But, I do care about the children or siblings of the <target> nodes,
because I do some processing on them (checking out attributes and cloing
nodes for example) before putting them in my map.

Finally I do not want (not that I have the ability to anyway) to reinvent
the wheel, therefore I do not really fancy using SAX to build my own
personal DOM.

Conclusion, here is my wish list:

1) Need to start from a text source.
2) Need a XPath like way to find my nodes
3) Need something very similar to org.w3c.dom.Node for my target nodes
4) Need something very light, String like for all other nodes.
5) Need to be able to serialize the result back to text.

Any thought?


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]








---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
RE: cherry picking nodes [LONG and NOT urgent]

Reply via email to