Hello Torquies,

  I had been trying to get Torque's XML dump/load tasks to work.  In the
process, I discovered several deficiencies:

1. XML output stores column values as attributes. Long data columns always
have to be escaped with XML entities (i.e. CDATA is not allowed for
attribute values)  It's convenient when editing the XML data by hand to have
data between elements (as body data) and the option to use CDATA.
2. Existing code does not escape illegal attribute chars, e.g. '<' or
embedded double quotes.
3. The "project-datasql" task loads the entire XML data file in memory as a
big honkin' DOM object then passes it off to a bunch of velocity scripts for
generating the SQL.  For any moderately sized data, this DOM load takes too
much time and creates a "VM nightmare on earth" for my poor desktop.
4. Most importantly and most difficult was that since the "project-datasql"
task generated actual SQL that must in-turn be loaded with some kind of
DB-specific command line util (e.g. mysql) all string-ish datatypes must be
escaped properly in whatever DB-specific syntax is required.  (e.g. using a
\ to escape embedded quotes or doubling them or whatever)

  So, I rolled my own Ant tasks that use JDBC->XML (DatabaseMetaData with
generic SELECT queries) for the dump and SAX XML->JDBC _directly_ (no
intermediate SQL file, generic INSERTS).  This solution is nice and simple,
flexible and not so coupled with Torque's "quirks" errr, functionality.

1.  all columns become first-class elements (along with tables of course)
2.  Escaping for all column values is handled.  If a column value's length
exceeds a certain threshold and it contains chars that must be escaped, it
will simply wrap the value in a CDATA.
3.  Uses event-driven SAX API which does not require the whole XML file to
get loaded into some data structure.... much faster on my poor little
desktop.
4.  Loads data directly through JDBC so that the driver handles DB-specific
escaping issues.  Strings are just  set as column values on the insert state
ment with JDBC preparedstatements.

  My offering is this: I will gladly donate this code to the
Apache/Jakarta/Torque as a replacement for the
"project-datadump/project-datasql" tasks if it seems sensible by the
community.  Lemme know what ya'll think.

russ


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to