Hi Lenny,
If possible, I would prefer to avoid implicit conversion operators as they
can lead to unexpected conversions and hard-to-find errors. If people
really feel the implicit conversion of a smart-pointer to bool is
necessary, than operator bool() is a better choice than operator int.
operator! is also a possibility and is even less likely to cause problems.
If DOMString stays around, I'd also prefer an explicit "operator" to const
XMLCh*, rather than an implicit conversion. I don't find calling rawBuffer
() or something similar to be such a burden.
I'm curious as to how the smart-pointer implementation will work without
reference counts, and how those references counts will be maintained in a
thread-safe manner without explicit synchronization. Does your
implementation use reference-counting? If so, are you doing simple integer
increments? If the implementation is not reference-counted, how does it
work?
Thanks!
Dave
"Lenny Hoffman"
<lennyhoffman@ear To: <[EMAIL PROTECTED]>
thlink.net> cc: (bcc: David N
Bertoni/Cambridge/IBM)
Subject: RE: Call for Vote: which one
to be the Xerces-C++ public supported W3C DOM
04/29/2002 05:37 interface
PM
Please respond to
xerces-c-dev
Hi Markus,
Thank you very much for the insight.
Note that simply accessing the IDOM implementation via handles does not
affect its thread safety-ness, thus your application is safe.
if (pm_Element)
pm_Element->getAttribute(...);
How can I do this with references?
You do it with the current handles like this:
if (!pm_Element.isNull())
pm_Element.getAttribute(...);
Adding an int operator to DOM_Node would allow even more friendly syntax;
e.g.
if (pm_Element)
pm_Element.getAttribute(...);
This could be easily added.
In fact, an -> operators could be added to the DOM_Node classes and get
this:
if (pm_Element)
pm_Element->getAttribute(...);
This is now exactly what you started out with, thus is completely backward
compatible with your current use of the IDOM.
XMLCh* are easier to handle as DOMString-Objects in ATL : CComBSTR cBstr
= pm_Element->getAttribute(...);
Good point, the current DOMString class does not have an XMLCh* operator,
which if it did would solve your problem. I pretty much gutted the
original DOMString class to make it a simple wrapper around an XMLCh*
returned from IDOM implementations, in lieu of suffering the costs of a the
cross document string management of the original DOM. As far as I can tell
the only reason the original DOMString did not have an XMLCh* operator was
because there was no guarantee that its internal XMLCh* was null
terminated; well, that guarantee does now exist and the operator can be
added -- I will do that. So your example remains:
CComBSTR cBstr = pm_Element->getAttribute(...);
Note that string classes are convenient way to perform various operations
on a string without using the static (read functional) methods provided by
XMLString. I even implemented COW (copy on write) behavior in the new
DOMString class, so that you can feel free to modify a string returned from
a node without having to manually make a copy.
If folks don't find the DOMString wrapper to be that important, that frees
me up to simplify the handle classes and address one of Tinny's concerns.
Tinny pointed out that while the new design hides dual interfaces (DOM and
IDOM) from users, it does not hide them from DOM developers; as DOM 3
support is added, each interface change would have to be made to both DOM
and IDOM classes. The only reason I went with complete interface
replication instead of simple smart pointers for the handle classes was to
be able to translate XMLCh pointers returned from IDOM nodes into
DOMStrings. If I am allowed to get rid of DOMString altogether I can make
the handle classes simple smart pointers that do not replicate IDOM
interfaces, and thus the duplication of effort is eliminated.
Lenny
-----Original Message-----
From: Markus Fellner [mailto:[EMAIL PROTECTED]]
Sent: Monday, April 29, 2002 6:17 PM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: AW: Call for Vote: which one to be the Xerces-C++ public supported
W3C DOM interface
O.k the main reaseon for my IDOM flirtation is...
I've chosen IDOM cause of its thread-safeness. And now I have several
thousands lines of code using IDOM interface.
Some other reasons are...
I have many IDOM_Element* members (pm_Elem) in my classes. After
parsing they will be assigned one time and than many times checked if
they are really assigned and used for reading and writing attributes.
if (pm_Element)
pm_Element->getAttribute(...);
How can I do this with references?
XMLCh* are easier to handle as DOMString-Objects in ATL : CComBSTR
cBstr = pm_Element->getAttribute(...);
...
Sorry for my short answer. I go on holiday tomorrow and i have to
pack up!
I'm back in 2 weeks and looking forward to see the results of this
voting.
It's a pitty to go during a hot discussion on this list.
Markus
-----Urspr�ngliche Nachricht-----
Von: Lenny Hoffman [mailto:[EMAIL PROTECTED]]
Gesendet: Montag, 29. April 2002 23:54
An: [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Betreff: RE: Call for Vote: which one to be the Xerces-C++
public supported W3C DOM interface
Hi Markus,
To be clear, the fix I created for the IDOM was to recycle
memory once a node or string is no longer needed. To know
when a node is no longer needed I used the original DOM
interface, but have them wrapping up the IDOM as the
implementation. IDOM performance is maintained, but ease of
use is greatly improved. Without using the DOM handles to know
when an IDOM node is in use or not, application code will be
drawn into explicitly stating when a node is no longer needed
and can be recycled, which is yet another thing to be
documented and to for application developers to get wrong and
suffer consequences for.
If you love and use the IDOM for its performance, you want the
memory problem fixed so that it is really fixed, not a
workaround that only works if your application does everything
right, then you will love what I have done with combining DOM
classes as handles, and IDOM classes as bodies.
If what you love is working with pointers instead of with
objects, please let me know why.
One thing I have found harder with objects vs.. pointers is
down casting from node to derived objects like element. The
syntax is a bit cleaner with pointers; e.g.:
DOM_Node node = ...
DOM_Element elem = (const DOM_Element&)node;
vs:
IDOM_Node* node = ..
IDOM_Element* elem = (IDOM_Element*)node;
It is easy to forget to add the const in the first case, and is
somewhat non-intuitive because slicing can happen, though it is
not problem in this case.
To solve this problem I have thought of adding overloaded
constructors and assignment operators that take a DOM_Node to
DOM_Node derived classes like DOM_Element. Thus the first
example becomes:
DOM_Node node = ...
DOM_Element elem = node;
Not only is this code more succinct, but it is safer, as the
overloaded constructor and assignment operator can check for
node compatibility via the getNodeType call.
Again, please let me know what other aspects of points make
things easier for you.
> Hope your fix has no effects on thread-safe-ness!
No affect whatsoever.
Lenny
-----Original Message-----
From: Markus Fellner [mailto:[EMAIL PROTECTED]]
Sent: Monday, April 29, 2002 4:15 PM
To: [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: AW: Call for Vote: which one to be the
Xerces-C++ public supported W3C DOM interface
Hi Lenny,
I hope your fix of the IDOM memory problem goes into the
next official release. But I use and love the IDOM
interface.
It's really easier for an old C++ programmer like me! And
I use IDOM cause of its threadsafe properties. Hope your
fix has no effects on thread-safe-ness!
Markus
-----Urspr�ngliche Nachricht-----
Von: Lenny Hoffman
[mailto:[EMAIL PROTECTED]]
Gesendet: Montag, 29. April 2002 17:57
An: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Betreff: RE: Call for Vote: which one to be the
Xerces-C++ public supported W3C DOM interface
Hi Markus,
The memory management problem solved by recycling
no longer used nodes and strings. The only clean
way I know to know when nodes and strings are being
used is to use the handle/body pattern, which is
what is used by the original DOM. What I have done
is use the original DOM handles and the IDOM
implementation, but fixed the IDOM memory problem.
Lenny
-----Original Message-----
From: Markus Fellner
[mailto:[EMAIL PROTECTED]]
Sent: Monday, April 29, 2002 10:54 AM
To: [EMAIL PROTECTED]
Subject: AW: Call for Vote: which one to be
the Xerces-C++ public supported W3C DOM
interface
If the memory management problem is solved, I
prefer IDOM!!!
-----Urspr�ngliche Nachricht-----
Von: Tinny Ng
[mailto:[EMAIL PROTECTED]]
Gesendet: Montag, 29. April 2002 17:08
An: [EMAIL PROTECTED]
Betreff: Call for Vote: which one to be
the Xerces-C++ public supported W3C DOM
interface
Hi everyone,
I've reviewed Andy's design objective
of IDOM, Lenny's view of old DOM and
his proposal of redesign, and some
users feedback. Here is a "quick"
summary and I would like to call for a
VOTE about the fate of these two
interfaces.
1.0 Objective
==========
1. Define the strategy of Xerces-C++
public DOM interface. Decide which one
to keep, old DOM interface or new IDOM
interface
2.0 Motivation
===========
1. As a long term strategy, Xerces-C++
shouldn't define two W3C DOM interfaces
which simply confuses users.
=> We've already got many users'
questions about what the difference,
which one to use ... etc.
2. With limited resource, we should
focus our development on ONE stream, no
more duplicate effort
=> New DOM Level 3 development
should be done on one interface, not
both.
=> No more dual maintenance: two
set of samples (e.g. DOMPrint vs
IDOMPrint), two parsers (DOMParser vs
IDOMParser)
3. To better place Apache Xerces-C++ in
the market, we should have our Apache
Recommended DOM C++ Binding in
http://www.w3.org/DOM/Bindings
=> To encourage more users to
develop DOM application AND
implementation based on this binding.
=> Such binding should just define
a set of abstract base classes (similar
to JAVA interface) where no
implementation model is assumed
3.0 History
=========
'DOM' was the initial "W3C DOM
interface" developed by Xerces-C++.
However the performance of its
implementation is not quite
satisfactory.
Last year, Andy Heninger came up with a
new design with faster performance, and
such implementation came with a new set
of interface => 'IDOM'.
Currently both 'DOM' and 'IDOM' are
shipped with Xerces-C++. 'IDOM' is
claimed as experimental (like a
prototype) and is subject to change.
More information can be found in :
http://xml.apache.org/xerces-c/program.html
http://www.apache.org/~andyh/
http://marc.theaimsgroup.com/?t=101650188300002&r=1&w=2
http://marc.theaimsgroup.com/?w=2&r=1&s=Proposal%3A+C%2B%2B+Language+Binding+for+DOM+L&q=t
4.0 IDOM
=========
4.1 Interface
==========
4.1.1 Features of IDOM Interface
--------------------------------------------------
e.g. virtual IDOM_Element*
IDOM_Document::createElement(const
XMLCh* tagName) = 0;
1. Define as abstract base classes
2. Use normal C++ pointers.
=> So that abstract base class is
possible.
=> Make it more C++ like. Less Java
like.
4.1.2 Pros and Cons of IDOM Interface
----------------------------------------------------------
Pros:
1. Abstract base classes that
correspond to the W3C DOM interfaces
=> Can be recommended as Apache DOM
C++ Binding
=> More standard like, no
implementation assumed as they are just
abstract interfaces using pure virtual
functions
2. (Depends on users' preference)
- someone prefers C++ like style
Cons:
1. IDOM_XXX - weird prefix 'I'
Solution:
- Proposed to rename to DOMXXXX
which also matches the DOM Level 3
naming convention
2. (Depends on users' preference)
- someone does not like pointers,
and wants Java-like interface for ease
to use, ease to learn and ease to port
(from Java).
3. As the old DOM interface has been
around for a long time, majority of
current Xerces-C++ still uses the old
DOM interface, significant migration
impact
Solution:
- Announce the deprecation of
old DOM interface for a couple of
releases before removal
4.2 Implementation
===============
4.2.1 Features of IDOM Implementation
-----------------------------------------------------------
1. Use an independent storage allocator
per document. The advantage here is
that allocation would require no
synchronization
=> Fast, good scalability, reduced
memory footprint
2. Use plain, null-terminated (XMLCh *)
utf-16 strings.
=> No DOMString class overhead
which is another performance
contributor that makes IDOM faster
4.2.2 Downside of IDOM Implementation
-------------------------------------------------------------
1. Manual memory management
- If document comes from parser,
then parser owns the document. If
document comes from DOMImplementation,
then users are responsible to delete
it.
Solution:
- Provide a means of
disassociating a document from the
parser
- Add a function "Node::release
()", similar to the idea of
"Range::detach", which allows users to
indicate the release of the Node.
- From C++ Binding abstract
interface perspective, it's up to
implementation how to handle this
"release()" function.
- With Xerces-C++ IDOM
implementation, the release() function
will delete the 'this' pointer if it is
a document, else no-op.
2. Memory retained until the document
is deleted.
- If you change the value of an
attribute or call removeNode many
times, the memory of the old value is
not deallocated for reuse and the
document grows and grows
Solution:
- This in fact is a tradeoff
for the fast performance offered by
independent storage allocator.
- There is no immediate good
solution in place
5.0 old DOM
==========
5.1 Interface
==========
5.1.1 Features of old DOM Interface
-----------------------------------------------------
e.g. DOM_Element
DOM_Document::createElement(const
DOMString tagName);
1. Use smart pointers - Java-like
5.1.2 Pros and Cons of old DOM
Interface
--------------------------------------------------------------
Pros:
1. DOM_XXX - reasonable name
2. (Depends on users' preference)
- someone wants Java-like interface
for ease to use, ease to learn and ease
to port (from Java).
3. Not that many users have migrated to
IDOM yet, so migration impact is
minimal.
Cons:
1. Not abstract base class
- Cannot be recommended as Apache
DOM C++ Binding
- Implementation (smart pointer
indirection) is assumed
Solution:
- This in fact is a tradeoff
for the ease of use of smart pointer
design
- No solution.
2. (Depends on users' preference)
- someone wants C++-like as this is
C++ interface
5.2 Implementation
===============
5.2.1 Features of old DOM
Implementation
----------------------------------------------------------------
1. Automatic memory management
- Memory is released when there is
no more handles pointing to it
- Use reference count to keep track
of handles
2. Use thread-safe DOMString class
5.2.2 Downside of old DOM
Implementation
--------------------------------------------------------------------
1. Performance is slow
- Memory management is the biggest
time consumer, and a lot of memory
footprint.
- There are a whole lot of blocks
allocated when creating a document and
then freed when finished with it. Each
and every node requires at least one
and sometimes several separately
allocated blocks. DOMString take three.
It adds up.
Solution:
- Lenny suggests to use IDOM
interface internally in DOM
implementation, patch in Bugzilla 5967
- Then the performance benefits
of IDOM is gained but the memory
retained problem in IDOM implementation
still remains to address.
- And internally, we will have
dual interface maintenance model as
IDOM interface is then used by DOM
internally.
Vote Question:
============
I would like to call for a vote:
==> Which INTERFACE should be the
Xerces-C++ public supported W3C DOM
Interface, DOM or IDOM? <===
Note:
1. The question is asking which
"interface" to be officially supported.
Once the choice of interface is chosen,
we can discuss how to solve the
downside of implementation as the next
topic.
2. The one being voted will become the
ONLY Xerces-C++ supported public W3C
DOM Interface, and is where the DOM
Level 3 being implemented.
3. The API of the other interface will
be deprecated. And its samples, and
associated Parser will eventually be
removed from the distribution
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]