Developer Tip: Resistance is futile! You will be XML-ated!

search390.com Wed, 20 Jun 2001 10:05:19 -0700
Search390.com
Developer Tip
June 20, 2001

========================================================
SPONSORED BY: Postmaster Direct
========================================================
What do you like? Networks? Computer Games? Downloads?  How about
Free Stuff? 
Search390.com can get you FREE info on the topics that interest you
most - and there are so many to choose from! We'll find related news,
information and special offers and deliver them directly to your
e-mailbox, all at no charge! 
Sign up here today http://search390.techtarget.com/postmasterDirect/
========================================================

=======================================================
HURRY AND GET IN ON THE ACTION!
=======================================================
Our Tip of the Month contest for June is still going strong, but will
end soon. The good news is, it's not too late to enter to win this
month's prize -- A FABULOUS Palm Vx Ultra Slim Handheld!  
To check out existing tips, this month's prize, or submit a tip of
your own, go to:
http://search390.techtarget.com/tips/0,289484,sid10_tax1642_prz_cts,00.html
=======================================================

Resistance is futile! You will be XML-ated!
Jim Keohane ---with apologies to Locutus and Seven of Nine

You know things have gone too far when supermarket checkout lines are
festooned with tabloid headlines about XML (eXtensible Markup
Language). There are now proposals for legal contracts in XML. Pretty
soon the legalese on a YODLE's wrapper will require an XML viewer. 

There's a Geography Markup Language (GML).
Visit <http://www.opengis.net/gml/01-029/GML2.html>. 

There's a MathML.
Visit <http://www.w3.org/TR/MathML2/>.

There's a FAQ (Frequently Asked Questions) Markup Language QAML.
Visit <http://www.ascc.net/xml/en/utf-8/qaml-index.html>. 

There's even an XML-variant called ComicsML for online comic strips! 
Visit <http://www.jmac.org/projects/comics_ml/about.html>.

Enough! Let me try to delve into XML from a mainframer's perspective.

How many remember the old OS/VS COBOL "EXHIBIT NAMED" facility? 

The following code: 

MOVE 25 TO HOURLY-RATE.
MOVE "JIM" TO FNAME.
MOVE "KEOHANE" TO LNAME.
EXHIBIT NAMED FNAME LNAME HOURLY-RATE.

Results in something like the following output: 

FNAME = 'JIM' LNAME = 'KEOHANE' HOURLY-RATE = 25.00

This "KEYWORD = VALUE" format was a simple way to produce debugging
output. PL/I had its PUT DATA(FNAME, LNAME, HOURLY_RATE) which gave
similar results. Importantly, PL/I also had the reverse GET
DATA(FNAME, LNAME, HOURLY_RATE) which could read back the results of
a prior PUT DATA. It was a simple way to transfer information from
one program to another without having to define record field layouts
for an intermediate file. 

PL/I also had the PUT DATA() and GET DATA() where the absence of
variable names in DATA list meant ALL variables! This was very
convenient for debugging. You could not only see all variable
contents at time of a problem you could also make code changes and
retest with identical state! Admit it, you PL/I aficionados. How many
think debugging consists entirely of the "ON ERROR PUT DATA"
statement? 

PUT/GET DATA was one of many PL/I facilities that were strongly
discouraged due to understandable performance concerns. 

XML sometimes engenders those same performance concerns. XML has same
keyword=value but also has an equivalent <keyword>value</keyword>
notation which involves more processing to parse. XML can also nest a
hierarchy of information like: 
<family>
   <father>
      <name>Jim_Keohane</name>
      <height>79</height>
   </father>
   <mother>
      <name>Rae_Keohane</name>
      <height>61</height>
   </mother>
   <daughter>
      <name>Jo_Keohane</name>
      <height>65</height>
   </daughter>
   <daughter>
      <name>Meg_Keohane</name>
      <height>65</height>
   </daughter>
</family>

PL/I PUT DATA will also output such structure/hierarchy information
if the variables are so organized. 

This is obviously just the tip of the iceberg as far as XML's
features. There are a zillion articles already extant for those
interested in such. I'm narrowly focusing on the mainframe
performance concerns. 

There are 2 mechanisms for parsing an XML document, SAX & DOM: 

SAX (Simple API for XML) parses the string of characters in an XML
document. You can have SAX grab only certain keyword/values and
ignore the rest. This can be relatively fast for extracting a small
number of items out of a large document. 

DOM (Document Object Module) parses the whole XML source into a
complex internal structure. DOM is the better choice if you require
access to a lot of fields and/or need to build/modify a complex XML
document. 

In both cases, if you encounter invalid XML documents, you'll need
that extra level of syntax checking with the additional cpu hit it
implies. 

What is now being investigated as a performance boost is a binary XML
format. You give up some of the human-readable friendliness of the
XML document in return for faster parsing and simplified syntax
checks. Call it a compiled XML document. 

I recall a simple tweak done to text files on a Mac to store them in
memory as string resources. You throw away the line breaks (<CR> on
Mac) and replace them with a line length at start of each line.
Conceptually it is analogous to OS/390 RECFM V files. The Mac took it
further by reformatting lines into tokens preceded by length fields.
Something like that may have above XML "family" example start off
thusly: 

[1][6]family[4][6]father[2][4]name[1][11]Jim_Keohane[1][6]height[2]79[6]mother[2][4]name[1][11]Rae?.

Keywords and values are preceded by length byte (i.e. "[4]name" and
"[11]Jim_Keohane") and subdivisions are preceded by repetition factor
(i.e. "[2]" following "father" and preceding "[4]name?[6]height"). It
can be read as 1 division, "family", with four subdivisions,
"father", "mother", etc. each with 2 subsubdivisions, "name" and
"height." 

Above is grossly oversimplified since there are issues of conversions
from ASCII<->EBCDIC to the more involved UniCode. Likewise a byte is
insufficient as a length field. The example serves only as a
suggestion for faster parsing/reading. What if you wish to modify or
create an XML document? There's another old Mac trick I recall that
may be useful. You tokenize as above into buffer 1. You also have
buffer 2 initially empty. Into a 3rd buffer, tiny at first, you have
alternating length values for buffer 1 and 2 fields. 

Buffer 3 starts off as simply "[1][X]" where [1] says there is only
one subdivision and [X] is the length of that field in buffer 1. If
you change my name from Jim to James you get buffer 1 unchanged,
buffer 2 is "[13]James_Keohane" and buffer 3 is "[3][A][14][B]" where
[A] is length of buffer 1 preceding "[11]Jim", [14] is entire length
of buffer 2 and [B] is buffer 1 contents following "Jim_Keohane".

The contrivance above serves to avoid possible massive and frequent
data shifts as portions of XML documents are removed, modified or
added . Periodic reconstitution of buffer 1 can avoid ever-worsening
cpu performance during heavy document mangling. The compiled XML will
often be significantly shorter. Syntax checking is simplified. You
may note that the S/390 EXECUTE instruction will look favorably upon
MVC's and CLC's of items preceded by one-byte length. You won't need
TRT's to do tokenizing either. Byte by byte scanning is out too. 

When a real binary XML standard is approved there'll be lessened CPU
anxiety amongst mainframers. Until then, such approaches are
proprietary and should only be considered when you know both ends of
the communication abide by the same rules. 

Of course, there may never be a binary standard. Many argue that the
intuitively obvious need for other than text parsing is often not
supported by the facts. They often cite tests performed on
workstations that uniformly are not CISC (complex instruction set
computer) like S/390. Unfortunately, I have often seen perfectly
performing C code developed on Windows or Unix bring a mainframe to
its knees. I have heard stories of poorly performing XML code on
S/390. You can't go by those stories. What is needed is actual test
and studies done and, if there is a performance concern, then
mainframers should contribute their two cents to the discussion.
Curiously, some non-mainframers cite the possible desirability of XML
compression which would appear to have many of the same drawbacks
regarding universal data interchange as does binary XML.
Visit <http://www.xml.com/pub/a/2001/04/18/binaryXML.html>.

Anyone using IBM's recently announced XML Toolkit for z/OS and
OS/390? Can you report back on any performance testing of XML parser?
Visit <http://www-1.ibm.com/servers/eserver/zseries/software/xml/>.

Ditto for anyone who has ported LibXML to OS/390 or developed their
own XML Parser. I may have some numbers soon. 
------------------------------------------------------
About the author: Jim Keohane ([EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>) is president of New York
consulting company Multi-Platforms, Inc. His company specializes in
commercial software development/consulting with emphasis on
cross-platform and performance issues. 

Did you like this tip? Send us an email <mailto:[EMAIL PROTECTED]>
to let us know your thoughts.

====================================
Related Book
====================================
XML: A Primer, 3rd Edition
http://www.digitalguru.com/dgstore/product.asp?sku=0764547771&dept%5Fid=278&ac%5Fid=54&accountnumber=&couponnumber=
Author: Simon St. Laurent
Publisher: M & T Books
ISBN/CODE: 0764547771
Cover Type: Soft Cover
Pages: 560
Published: May 2001
Summary: St.Laurent's popular primer offers Web developers a quick
start to understanding and implementing XML. This third edition of
XML: A Primer, 3rd Edition includes new developments in XML
technology regarding XLink, XPointer, XPath and XSLT. This guide for
Web developers explains the differences and similarities between
SGML, HTML, and XML, and provides you with a solid understanding of
how to create custom tags and Document Type Definitions (DTDs).
You'll also find discussion on the impact of XML Schemas and RELAX.

==============================
LIVE AUDIO EVENT
==============================
Speaking of mainframe performance...  Join search390 next Tues, June
26, for a live audio Q&A entitled: "S/390 and zSeries performance
management: the key to running your enterprise," with Christopher
Roy, Corporate Software Consultant for BMC Software.  Christopher
will answer your performance management questions on availability,
optimization and scalability.  He'll also discuss enterprise
management and the questions facing the industry today.   This
exciting event will take place from 2pm (EDT) to 3pm (EDT).  For more
information,  go to: 
http://search390.techtarget.com/onlineEvents/0,289675,sid10,00.html. 
See you there!
==============================

==============================
TRIED IBM'S XML TOOLKIT YET?
==============================
Jim Keohane wants to know if anyone has used IBM's recently announced
XML Toolkit for z/OS and OS/390.  We'd like to know too.  If you
have, share your feedback with your peers in our Developer Forum. 
What better way to learn about a new product?  While you're there,
check out some of the hot threads, and see if you can help user
"askMikey" with the following:  "We want to start using the
OPTIMIZE(FULL) option of the COBOL compiler. One of the requirements
is that we remove a specific form of coding. Specifically, in
programs converted from OS/VS COBOL we would sometimes code large
working storage tables as: 

01 the-real-table. 05 xxx occurs yyy times. 01 over-flow-table-1. 05
zzz occurs yyy times. 

The procedure division would only reference elements in
"the-real-table". But our subscripting/indexing would overflow into
the "over-flow-table-1". 

Does anyone know of a tool that will find these?  An alternative
method was 

01 the-real-table. 05 xxx occurs yyy times. 01 filler pic x(120400). 

Once again, the references to the-real-table would overflow into the
filler."

If you have an answer for askMikey, post it here
http://search390.discussions.techtarget.com/WebX?[EMAIL PROTECTED]^[email protected]
========================================================

========================================================
Disclaimer: Our tips exchange is a forum for you to share technical
advice and expertise with your peers and to learn from other IT
professionals. Techtarget.com provides the infrastructure to
facilitate this sharing of information. However, we can't guarantee
the accuracy and validity of the material submitted. You agree that
your use of the ask the expert services and your reliance on any
questions, answers, information or other materials received through
the web site will be at your own risk.
========================================================

======================================================== 
If you would like to sponsor this or any techtarget newsletter,
please contact Gabrielle DeRussy at [EMAIL PROTECTED]
======================================================== 


If you no longer wish to receive this newsletter simply reply to 
this message with "REMOVE" in the subject line.  Or, visit 
http://search390.techtarget.com/register 
and adjust your subscriptions accordingly. 

If you choose to unsubscribe using our automated processing, you 
must send the "REMOVE" request from the email account to which 
this newsletter was delivered.  Please allow 24 hours for your 
"REMOVE" request to be processed.
Developer Tip: Resistance is futile! You will be XML-ated!

Reply via email to