On 2 Aug 2005, at 13:19, Shuai Zhang wrote:
Thank you for your solution, it does work :)
The purpose that I use such an ugly format is because the original xml is generated line by line manfully and are not well formatted.

In that case, an XSLT running ahead of the XML is a good way of allowing yourself a set of "cleanup rules", if you chose to do so, to clean up the incoming XML into a more normalised form; if you're really gung-ho about making the manual->automatic process bulletproof, consider:

Original XML Source -> JTidy to clean up broken/stupid tags -> XSL to catch missing content and common usage issues -> Castor.

We're not yet doing this, but we're looking at doing this now for some of our content; we're also looking at using jtidy on-the-fly at castor load time from the DB.


On a similar vein: We like to ensure that, in the database, content is never entity-encoded; we always want to store the proper UTF8/ UTF16 representation; however, when the content is used, we always want to ensure that we use the entity-encoded representation. As we're using manually created value objects, we use a nio translator to translate the content being set on the object into UTF characters, and on write, we normalise all hand-written entity tags into our preferred format.

While this could all be done with FieldHandlers, I'm sure (actually, can it? I suppose it could in XML, but will it also do so on the JDO side?) we prefer to do it on the objects so that clients don't need to round-trip to the database to see the correct encoding in live previews.


Cleaning up data at load time, or ensuring that representations in the database match, or do not match, the representations of the data during use, are useful tools that can help you ensure that content is stored in a normalised, predictable manner.

Running a JTidy to catch bad XML formatting, then doing an XSL to catch (and fix) structural errors can allow you to build a simpler mapped-representation of the content, while still getting good performance through the system, and allow you to handle content with a variety of errors, but from which the content is perfectly recoverable.

Such a system, in reverse, can even update the content in a fixed form for the next time such changes get made, ensuring a cleaner content cycle.


Anyways, happy to see my advice was useful, and I wish you lots of luck. :)

-------------------------------------------------
If you wish to unsubscribe from this list, please send an empty message to the following address:

[EMAIL PROTECTED]
-------------------------------------------------

Reply via email to