Search390.com Developer Tip June 20, 2001 ======================================================== SPONSORED BY: Postmaster Direct ======================================================== What do you like? Networks? Computer Games? Downloads? How about Free Stuff? Search390.com can get you FREE info on the topics that interest you most - and there are so many to choose from! We'll find related news, information and special offers and deliver them directly to your e-mailbox, all at no charge! Sign up here today http://search390.techtarget.com/postmasterDirect/ ======================================================== ======================================================= HURRY AND GET IN ON THE ACTION! ======================================================= Our Tip of the Month contest for June is still going strong, but will end soon. The good news is, it's not too late to enter to win this month's prize -- A FABULOUS Palm Vx Ultra Slim Handheld! To check out existing tips, this month's prize, or submit a tip of your own, go to: http://search390.techtarget.com/tips/0,289484,sid10_tax1642_prz_cts,00.html ======================================================= Resistance is futile! You will be XML-ated! Jim Keohane ---with apologies to Locutus and Seven of Nine You know things have gone too far when supermarket checkout lines are festooned with tabloid headlines about XML (eXtensible Markup Language). There are now proposals for legal contracts in XML. Pretty soon the legalese on a YODLE's wrapper will require an XML viewer. There's a Geography Markup Language (GML). Visit <http://www.opengis.net/gml/01-029/GML2.html>. There's a MathML. Visit <http://www.w3.org/TR/MathML2/>. There's a FAQ (Frequently Asked Questions) Markup Language QAML. Visit <http://www.ascc.net/xml/en/utf-8/qaml-index.html>. There's even an XML-variant called ComicsML for online comic strips! Visit <http://www.jmac.org/projects/comics_ml/about.html>. Enough! Let me try to delve into XML from a mainframer's perspective. How many remember the old OS/VS COBOL "EXHIBIT NAMED" facility? The following code: MOVE 25 TO HOURLY-RATE. MOVE "JIM" TO FNAME. MOVE "KEOHANE" TO LNAME. EXHIBIT NAMED FNAME LNAME HOURLY-RATE. Results in something like the following output: FNAME = 'JIM' LNAME = 'KEOHANE' HOURLY-RATE = 25.00 This "KEYWORD = VALUE" format was a simple way to produce debugging output. PL/I had its PUT DATA(FNAME, LNAME, HOURLY_RATE) which gave similar results. Importantly, PL/I also had the reverse GET DATA(FNAME, LNAME, HOURLY_RATE) which could read back the results of a prior PUT DATA. It was a simple way to transfer information from one program to another without having to define record field layouts for an intermediate file. PL/I also had the PUT DATA() and GET DATA() where the absence of variable names in DATA list meant ALL variables! This was very convenient for debugging. You could not only see all variable contents at time of a problem you could also make code changes and retest with identical state! Admit it, you PL/I aficionados. How many think debugging consists entirely of the "ON ERROR PUT DATA" statement? PUT/GET DATA was one of many PL/I facilities that were strongly discouraged due to understandable performance concerns. XML sometimes engenders those same performance concerns. XML has same keyword=value but also has an equivalent <keyword>value</keyword> notation which involves more processing to parse. XML can also nest a hierarchy of information like: <family> <father> <name>Jim_Keohane</name> <height>79</height> </father> <mother> <name>Rae_Keohane</name> <height>61</height> </mother> <daughter> <name>Jo_Keohane</name> <height>65</height> </daughter> <daughter> <name>Meg_Keohane</name> <height>65</height> </daughter> </family> PL/I PUT DATA will also output such structure/hierarchy information if the variables are so organized. This is obviously just the tip of the iceberg as far as XML's features. There are a zillion articles already extant for those interested in such. I'm narrowly focusing on the mainframe performance concerns. There are 2 mechanisms for parsing an XML document, SAX & DOM: SAX (Simple API for XML) parses the string of characters in an XML document. You can have SAX grab only certain keyword/values and ignore the rest. This can be relatively fast for extracting a small number of items out of a large document. DOM (Document Object Module) parses the whole XML source into a complex internal structure. DOM is the better choice if you require access to a lot of fields and/or need to build/modify a complex XML document. In both cases, if you encounter invalid XML documents, you'll need that extra level of syntax checking with the additional cpu hit it implies. What is now being investigated as a performance boost is a binary XML format. You give up some of the human-readable friendliness of the XML document in return for faster parsing and simplified syntax checks. Call it a compiled XML document. I recall a simple tweak done to text files on a Mac to store them in memory as string resources. You throw away the line breaks (<CR> on Mac) and replace them with a line length at start of each line. Conceptually it is analogous to OS/390 RECFM V files. The Mac took it further by reformatting lines into tokens preceded by length fields. Something like that may have above XML "family" example start off thusly: [1][6]family[4][6]father[2][4]name[1][11]Jim_Keohane[1][6]height[2]79[6]mother[2][4]name[1][11]Rae?. Keywords and values are preceded by length byte (i.e. "[4]name" and "[11]Jim_Keohane") and subdivisions are preceded by repetition factor (i.e. "[2]" following "father" and preceding "[4]name?[6]height"). It can be read as 1 division, "family", with four subdivisions, "father", "mother", etc. each with 2 subsubdivisions, "name" and "height." Above is grossly oversimplified since there are issues of conversions from ASCII<->EBCDIC to the more involved UniCode. Likewise a byte is insufficient as a length field. The example serves only as a suggestion for faster parsing/reading. What if you wish to modify or create an XML document? There's another old Mac trick I recall that may be useful. You tokenize as above into buffer 1. You also have buffer 2 initially empty. Into a 3rd buffer, tiny at first, you have alternating length values for buffer 1 and 2 fields. Buffer 3 starts off as simply "[1][X]" where [1] says there is only one subdivision and [X] is the length of that field in buffer 1. If you change my name from Jim to James you get buffer 1 unchanged, buffer 2 is "[13]James_Keohane" and buffer 3 is "[3][A][14][B]" where [A] is length of buffer 1 preceding "[11]Jim", [14] is entire length of buffer 2 and [B] is buffer 1 contents following "Jim_Keohane". The contrivance above serves to avoid possible massive and frequent data shifts as portions of XML documents are removed, modified or added . Periodic reconstitution of buffer 1 can avoid ever-worsening cpu performance during heavy document mangling. The compiled XML will often be significantly shorter. Syntax checking is simplified. You may note that the S/390 EXECUTE instruction will look favorably upon MVC's and CLC's of items preceded by one-byte length. You won't need TRT's to do tokenizing either. Byte by byte scanning is out too. When a real binary XML standard is approved there'll be lessened CPU anxiety amongst mainframers. Until then, such approaches are proprietary and should only be considered when you know both ends of the communication abide by the same rules. Of course, there may never be a binary standard. Many argue that the intuitively obvious need for other than text parsing is often not supported by the facts. They often cite tests performed on workstations that uniformly are not CISC (complex instruction set computer) like S/390. Unfortunately, I have often seen perfectly performing C code developed on Windows or Unix bring a mainframe to its knees. I have heard stories of poorly performing XML code on S/390. You can't go by those stories. What is needed is actual test and studies done and, if there is a performance concern, then mainframers should contribute their two cents to the discussion. Curiously, some non-mainframers cite the possible desirability of XML compression which would appear to have many of the same drawbacks regarding universal data interchange as does binary XML. Visit <http://www.xml.com/pub/a/2001/04/18/binaryXML.html>. Anyone using IBM's recently announced XML Toolkit for z/OS and OS/390? Can you report back on any performance testing of XML parser? Visit <http://www-1.ibm.com/servers/eserver/zseries/software/xml/>. Ditto for anyone who has ported LibXML to OS/390 or developed their own XML Parser. I may have some numbers soon. ------------------------------------------------------ About the author: Jim Keohane ([EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>) is president of New York consulting company Multi-Platforms, Inc. His company specializes in commercial software development/consulting with emphasis on cross-platform and performance issues. Did you like this tip? Send us an email <mailto:[EMAIL PROTECTED]> to let us know your thoughts. ==================================== Related Book ==================================== XML: A Primer, 3rd Edition http://www.digitalguru.com/dgstore/product.asp?sku=0764547771&dept%5Fid=278&ac%5Fid=54&accountnumber=&couponnumber= Author: Simon St. Laurent Publisher: M & T Books ISBN/CODE: 0764547771 Cover Type: Soft Cover Pages: 560 Published: May 2001 Summary: St.Laurent's popular primer offers Web developers a quick start to understanding and implementing XML. This third edition of XML: A Primer, 3rd Edition includes new developments in XML technology regarding XLink, XPointer, XPath and XSLT. This guide for Web developers explains the differences and similarities between SGML, HTML, and XML, and provides you with a solid understanding of how to create custom tags and Document Type Definitions (DTDs). You'll also find discussion on the impact of XML Schemas and RELAX. ============================== LIVE AUDIO EVENT ============================== Speaking of mainframe performance... Join search390 next Tues, June 26, for a live audio Q&A entitled: "S/390 and zSeries performance management: the key to running your enterprise," with Christopher Roy, Corporate Software Consultant for BMC Software. Christopher will answer your performance management questions on availability, optimization and scalability. He'll also discuss enterprise management and the questions facing the industry today. This exciting event will take place from 2pm (EDT) to 3pm (EDT). For more information, go to: http://search390.techtarget.com/onlineEvents/0,289675,sid10,00.html. See you there! ============================== ============================== TRIED IBM'S XML TOOLKIT YET? ============================== Jim Keohane wants to know if anyone has used IBM's recently announced XML Toolkit for z/OS and OS/390. We'd like to know too. If you have, share your feedback with your peers in our Developer Forum. What better way to learn about a new product? While you're there, check out some of the hot threads, and see if you can help user "askMikey" with the following: "We want to start using the OPTIMIZE(FULL) option of the COBOL compiler. One of the requirements is that we remove a specific form of coding. Specifically, in programs converted from OS/VS COBOL we would sometimes code large working storage tables as: 01 the-real-table. 05 xxx occurs yyy times. 01 over-flow-table-1. 05 zzz occurs yyy times. The procedure division would only reference elements in "the-real-table". But our subscripting/indexing would overflow into the "over-flow-table-1". Does anyone know of a tool that will find these? An alternative method was 01 the-real-table. 05 xxx occurs yyy times. 01 filler pic x(120400). Once again, the references to the-real-table would overflow into the filler." If you have an answer for askMikey, post it here http://search390.discussions.techtarget.com/WebX?[EMAIL PROTECTED]^[email protected] ======================================================== ======================================================== Disclaimer: Our tips exchange is a forum for you to share technical advice and expertise with your peers and to learn from other IT professionals. Techtarget.com provides the infrastructure to facilitate this sharing of information. However, we can't guarantee the accuracy and validity of the material submitted. You agree that your use of the ask the expert services and your reliance on any questions, answers, information or other materials received through the web site will be at your own risk. ======================================================== ======================================================== If you would like to sponsor this or any techtarget newsletter, please contact Gabrielle DeRussy at [EMAIL PROTECTED] ======================================================== If you no longer wish to receive this newsletter simply reply to this message with "REMOVE" in the subject line. Or, visit http://search390.techtarget.com/register and adjust your subscriptions accordingly. If you choose to unsubscribe using our automated processing, you must send the "REMOVE" request from the email account to which this newsletter was delivered. Please allow 24 hours for your "REMOVE" request to be processed.
