On 2 Aug 2005, at 13:19, Shuai Zhang wrote:
Thank you for your solution, it does work :)
The purpose that I use such an ugly format is because the original
xml is generated line by line manfully and are not well formatted.
In that case, an XSLT running ahead of the XML is a good way of
allowing yourself a set of "cleanup rules", if you chose to do so, to
clean up the incoming XML into a more normalised form; if you're
really gung-ho about making the manual->automatic process
bulletproof, consider:
Original XML Source -> JTidy to clean up broken/stupid tags -> XSL to
catch missing content and common usage issues -> Castor.
We're not yet doing this, but we're looking at doing this now for
some of our content; we're also looking at using jtidy on-the-fly at
castor load time from the DB.
On a similar vein: We like to ensure that, in the database, content
is never entity-encoded; we always want to store the proper UTF8/
UTF16 representation; however, when the content is used, we always
want to ensure that we use the entity-encoded representation. As
we're using manually created value objects, we use a nio translator
to translate the content being set on the object into UTF characters,
and on write, we normalise all hand-written entity tags into our
preferred format.
While this could all be done with FieldHandlers, I'm sure (actually,
can it? I suppose it could in XML, but will it also do so on the JDO
side?) we prefer to do it on the objects so that clients don't need
to round-trip to the database to see the correct encoding in live
previews.
Cleaning up data at load time, or ensuring that representations in
the database match, or do not match, the representations of the data
during use, are useful tools that can help you ensure that content is
stored in a normalised, predictable manner.
Running a JTidy to catch bad XML formatting, then doing an XSL to
catch (and fix) structural errors can allow you to build a simpler
mapped-representation of the content, while still getting good
performance through the system, and allow you to handle content with
a variety of errors, but from which the content is perfectly
recoverable.
Such a system, in reverse, can even update the content in a fixed
form for the next time such changes get made, ensuring a cleaner
content cycle.
Anyways, happy to see my advice was useful, and I wish you lots of
luck. :)
-------------------------------------------------
If you wish to unsubscribe from this list, please
send an empty message to the following address:
[EMAIL PROTECTED]
-------------------------------------------------