Gladiator ----- Original Message ----- From: <[EMAIL PROTECTED]> To: <axis-dev@xml.apache.org> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Monday, April 02, 2001 4:33 PM Subject: RE: cvs commit: xml-axis/java/src/org/apache/axis/utils XMLUtils.java
> > Hi Sam! > > I agree with the spirit of everything you say here. For the benefit of > myself as well as the others who may not be as in tune with SOAP, I'm going > to quickly run down some bullet points about the environment we're in. > These are in no particular order, but cover what I consider the important > facets of the job we have to do. This begins to describe our requirements, > I hope. > > * SOAP is XML. It's basically structured as follows: > > <SOAP-ENV:envelope xmlns:SOAP-ENV="insert-important-url-here"> > <SOAP-ENV:header> > <header-entry /> > </SOAP-ENV:header> > <SOAP-ENV:body> > <body-entry /> > </SOAP-ENV:body> > </SOAP-ENV:envelope> > > * Inside the header and body entries may be XML-encoded language objects, > particularly ones which are encoded as specified in the SOAP spec [1]. The > encoding (in section 5 of the spec) calls out the use of the XML Schema > basic types, plus a few other rules about structures and arrays. > > * One feature of the SOAP section 5 encoding is "multi-ref accessors", which > work like this: > > <SOAP-ENV:envelope xmlns:SOAP-ENV="insert-important-url-here" > xmlns:foo="urn:foo" > xmlns:xsi="schema-instance-uri" > xmlns:xsd="schema-data-uri"> > <SOAP-ENV:header> > <foo:header ref="#1" /> > </SOAP-ENV:header> > <SOAP-ENV:body> > <foo:body ref="#1" /> > <foo:actualElement id="1" xsi:type="xsd:int">5</foo:actualElement> > </SOAP-ENV:body> > </SOAP-ENV:envelope> > > (both the foo:header and the foo:body are references to the same integer) > > * To deserialize multi-ref accessors, we may need to look arbitrarily far > ahead in the document for the element with the correct id. This makes a > straight-ahead "streaming" approach (process the XML in order as it comes > in) somewhat challenging. Also, different pieces of code may desire to > process particular headers in an order different from that in which they are > serialized in the XML. > > * There is some concern that the XML, especially the body entries, may get > to be really large (giant base-64-encoded documents, for instance), hence we > are somewhat cautious about assuming we need to pull the whole document into > memory before processing. I note that there is a school of thought here (to > which I subscribe, btw) that says it's pilot error to try and send a huge > chunk of data inside your XML; rather you should take such things and attach > them per the SOAP with Attachments spec [2]. > > * We need this stuff to be parsed into some usable form very quickly and > efficiently. > > * Some developers will want direct access to the XML within a particular > part of the envelope as DOM, or JDOM, or perhaps SAX events. > > * Graham Glass claims to parse XML into an internal object model (I suspect > he parses the whole document before processing, btw) EXTREMELY quickly using > his Electric XML parser [3]. This model is used for SOAP processing. > > * W3C XML Protocol [4] will be arriving on the scene at some point. We'd > like to abstract out as much of the SOAPness as possible so that Axis can > easily become XMLP-compatible as soon as possible. > > Is there other stuff I've left out, folks? > > OK, so as I said, I agree with Sam's points here. The first thing I'd like > to do is some basic performance testing of various XML parsing models. I do > not see a real streaming approach being all that viable for Axis v1.0 (I'm > open to argument on that). If that is the case, we're talking about parsing > the document into some object model. As I see it, we can either: 1) use a > pre-existing model like DOM or JDOM, or 2) use SAX or a pull parser such as > XPP to parse into our own SOAP-specific object model. > > Option 2 might be faster. Option 1 gains us a standard programming model > (i.e. when developers ask us for JDOM/DOM we can just give it to them), plus > perhaps a speedier development cycle. > > I'd like to do the simplest possible thing that gives us the desired > results. > > Jason, do you have any numbers/stats as to whether parsing into JDOM using > SAX is faster than a typical DOM parse in, say, Xerces? > > Over and out for now, > > --Glen > > [1] http://www.w3.org/TR/soap > [2] http://www.w3.org/TR/SOAP-attachments > [3] http://www.themindelectric.com/products/xml/xml.html > [4] http://www.w3.org/2000/xp/ > > > -----Original Message----- > > From: Sam Ruby [mailto:[EMAIL PROTECTED] > > Sent: Monday, April 02, 2001 4:00 PM > > To: axis-dev@xml.apache.org > > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > > Subject: RE: cvs commit: xml-axis/java/src/org/apache/axis/utils > > XMLUtils.java > > > > > > Glen Daniels wrote: > > > > > > OK, here's my suggestion. Take it with appropriate salt. > > > > > > DOM is pretty much a pain in the ass to work with, and we're Java > > > developers, with access to JDOM. JDOM is screaming along > > in terms of > > > functionality, and they now deal just fine with JAXP on the > > bottom end, so > > > you can use whatever parser you want underneath there. > > JDOM is also going > > > to be rolled into the Java standard fairly soon (JSR-102, I think). > > > > > > Until we figure out what we're "really" doing about XML parsing and > > > modeling, I think we'd move much faster with JDOM, and > > that's where I think > > > we should be. Besides, if we're going to end up using some > > other model like > > > pull or whatever anyway, why should it matter if we use > > JDOM or DOM right > > > now? > > > > > > Suggestion : put JDOM back for now, and feel free to use the JAXP > > > interface to pick a parser. > > > > OK, here's my suggestion. Take it with appropriate salt. > > > > Warning: the message is a real downer. Parental discretion advised. > > > > The xml-soap implementation continues to be popular. It is > > getting ever > > more interopable with other implementations (thanks Glen!). > > > > The biggest gripe I hear is that it can't process as many messages per > > second as some other implementations. Some say it is Java's > > fault, but I > > see some boasting orders of magnitude improvements over > > Apache with their > > Java implementations. Others have noticed perhaps a 20% > > improvement with > > C/C++. > > > > Some measurements suggest that up to the 75% of the time is > > in the parser. > > Even if we accept that on face value, we have to conclude > > that 25% of the > > time is not, and even if the parser were eliminated entirely > > we will never > > see an order of magnitude improvement by just fixing the parser. > > > > I believe that some new thinking is required. It likely will > > require some > > cooperation with the parser team (hence why I am copying the > > Xerces mailing > > list, and for that matter Jason too in order to get a JDOM > > perspective) to > > pull it off. > > > > Meanwhile, my response to "if we're going to end up with some > > other model > > like pull or whatever anyway" is that I don't think it much > > matters what > > you work with right now as it will likely by DOA. > > > > Lets start by setting some priorities, and expressing them > > with concrete > > scenarios and test cases. Lets start with a trivial > > implementation which > > simply reads from a socket and sends back a canned reply, and > > measure that. > > No parser, no servlet engine, simply Java code. Then lets > > slowly introduce > > more function measuring the impact and determining if the impact is > > reasonable and if not what is the alternative. > > > > Meanwhile, lets figure out a concrete way to express our > > requirements to > > the parser team in a way that helps them understand what our > > needs are. > > > > Thoughts? > > > > - Sam Ruby > > > > Disclaimer: IMHO, a parser and a servlet engine is a > > requirement, don't > > take any of the above as an indication to the contrary. > > >