On Fri, 2004-10-08 at 03:53, Giovanni A. Cignoni wrote:
> The gridded text is useful for readability for those who like to
> look in the XML code using a text editor. Readability that, for
> all the rest of DAVE-ML items, is anyway disturbed by the verbosity
> of the XML tags.
>
> If you open the file with a general XML viewer the gridded text
> is usually put on a single line. Tables are unreadable.
>
> A tagged solution is terrible looking in the code. Viewed
> with a standard XML viewer is a little better, but still not very
> readable (columns will be listed one by one). Try for instance to
> read the following with MS IE:
>
>
>
> 00 01 02 03
> 04 05 06 07
>
>
> 00 01 02 03
> 04 05 06 07
>
>
>
> The tagged format can be useful if we want to process the XML
> to convert the format (for instance in a HTML readable form).
>
> Another thing that, for the XMLers, may result odd is the
> different handling of values: for variables they are tagged
> inside a specific attribute, for table defined functions
> they are untagged.
>
> XML is for exchanging data. The goal of XML is to identify
> the relevant items of a set of data and give them a semantics.
> We can't consider the same a table of n x m values and
> a textual description. I'm afraid the "right" thing to do for
> an XML format is to eventually go tagged (and, of course,
> provide a viewer).
>
It seems to me that the most important aspect is not how the XML looks
in either a text viewer or an XML viewer (although in passing I'd say
that the current format looks quite reasonable in Mlview, and not too
bad in XMLSpy). We can use XSLT and stylesheets to make it look any way
we like in IE or Mozilla, if necessary.
The significant aspect is how parsers (or just our programs if we're
reading it directly and doing our own parsing) deal with the XML data.
I've been using the Xerces-C parser to load a DOM, and this results in a
tree structure whose node attributes and content are all XMLStrings. A
string-based structure will result no matter which XML format we use,
but the current DAVE-ML format will have many less nodes than one with
each gridded data point like data, although the DAVE-ML table
nodes will have longer content strings.
This means that any subsequent program access to the DOM contents, which
involves traversing the tree structure, will have many more nodes to
process and is therefore likely to take longer. It doesn't provide any
great accessibility benefit, since a computational model is unlikely to
want to access just one point somewhere in a gridded data table.
Also, since we can't perform flight mechanics computations using
strings, at some point we're going to have to transform all these DOM
table node contents to a numeric array. At present I do that with a
loop over the data table string contents, like:
char* next = strpbrk( XMLString::transcode(
dataTableItem->getNodeValue() ), digits );
while ( NULL != next ) {
dataTable_[i][ia] = strtod( next, NULL );
ia = ia + 1;
next = strpbrk( next, delimiters );
if ( NULL != next ) {
next = strpbrk( next, digits );
}
}
If we tagged the gridded data points individually, this would change to
make the loop traverse the individual elements and convert each of them
individually, which would add somewhat to model initialisation
computational overhead. The run-time computational load for tabular
data would be unchanged.
Summarising, the current DAVE-ML gridded table structure is valid XML
but the gridded tables are not as fine-grained as they could be.
However, the current structure seems to me a reasonable balance between
size of the resulting DOM and accessibility of data points within it.
Division into individual data points will not significantly improve
accessibility for the calling program, and will add both memory and
initialisation time overhead.
I guess the point I'm really making here is that the appearance of the
data to the program accessing a DOM is probably more important than its
appearance to humans using various XML viewers.
Dan Newman