Gladiator
----- Original Message -----
From: <[EMAIL PROTECTED]>
To: <axis-dev@xml.apache.org>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, April 02, 2001 4:33 PM
Subject: RE: cvs commit: xml-axis/java/src/org/apache/axis/utils
XMLUtils.java


>
> Hi Sam!
>
> I agree with the spirit of everything you say here.  For the benefit of
> myself as well as the others who may not be as in tune with SOAP, I'm
going
> to quickly run down some bullet points about the environment we're in.
> These are in no particular order, but cover what I consider the important
> facets of the job we have to do.  This begins to describe our
requirements,
> I hope.
>
> * SOAP is XML.  It's basically structured as follows:
>
> <SOAP-ENV:envelope xmlns:SOAP-ENV="insert-important-url-here">
>  <SOAP-ENV:header>
>   <header-entry />
>  </SOAP-ENV:header>
>  <SOAP-ENV:body>
>   <body-entry />
>  </SOAP-ENV:body>
> </SOAP-ENV:envelope>
>
> * Inside the header and body entries may be XML-encoded language objects,
> particularly ones which are encoded as specified in the SOAP spec [1].
The
> encoding (in section 5 of the spec) calls out the use of the XML Schema
> basic types, plus a few other rules about structures and arrays.
>
> * One feature of the SOAP section 5 encoding is "multi-ref accessors",
which
> work like this:
>
> <SOAP-ENV:envelope xmlns:SOAP-ENV="insert-important-url-here"
>                    xmlns:foo="urn:foo"
>                    xmlns:xsi="schema-instance-uri"
>                    xmlns:xsd="schema-data-uri">
>  <SOAP-ENV:header>
>   <foo:header ref="#1" />
>  </SOAP-ENV:header>
>  <SOAP-ENV:body>
>   <foo:body ref="#1" />
>   <foo:actualElement id="1" xsi:type="xsd:int">5</foo:actualElement>
>  </SOAP-ENV:body>
> </SOAP-ENV:envelope>
>
>   (both the foo:header and the foo:body are references to the same
integer)
>
> * To deserialize multi-ref accessors, we may need to look arbitrarily far
> ahead in the document for the element with the correct id.  This makes a
> straight-ahead "streaming" approach (process the XML in order as it comes
> in) somewhat challenging.  Also, different pieces of code may desire to
> process particular headers in an order different from that in which they
are
> serialized in the XML.
>
> * There is some concern that the XML, especially the body entries, may get
> to be really large (giant base-64-encoded documents, for instance), hence
we
> are somewhat cautious about assuming we need to pull the whole document
into
> memory before processing.  I note that there is a school of thought here
(to
> which I subscribe, btw) that says it's pilot error to try and send a huge
> chunk of data inside your XML; rather you should take such things and
attach
> them per the SOAP with Attachments spec [2].
>
> * We need this stuff to be parsed into some usable form very quickly and
> efficiently.
>
> * Some developers will want direct access to the XML within a particular
> part of the envelope as DOM, or JDOM, or perhaps SAX events.
>
> * Graham Glass claims to parse XML into an internal object model (I
suspect
> he parses the whole document before processing, btw) EXTREMELY quickly
using
> his Electric XML parser [3].  This model is used for SOAP processing.
>
> * W3C XML Protocol [4] will be arriving on the scene at some point.  We'd
> like to abstract out as much of the SOAPness as possible so that Axis can
> easily become XMLP-compatible as soon as possible.
>
> Is there other stuff I've left out, folks?
>
> OK, so as I said, I agree with Sam's points here.  The first thing I'd
like
> to do is some basic performance testing of various XML parsing models.  I
do
> not see a real streaming approach being all that viable for Axis v1.0 (I'm
> open to argument on that).  If that is the case, we're talking about
parsing
> the document into some object model.  As I see it, we can either: 1) use a
> pre-existing model like DOM or JDOM, or 2) use SAX or a pull parser such
as
> XPP to parse into our own SOAP-specific object model.
>
> Option 2 might be faster.  Option 1 gains us a standard programming model
> (i.e. when developers ask us for JDOM/DOM we can just give it to them),
plus
> perhaps a speedier development cycle.
>
> I'd like to do the simplest possible thing that gives us the desired
> results.
>
> Jason, do you have any numbers/stats as to whether parsing into JDOM using
> SAX is faster than a typical DOM parse in, say, Xerces?
>
> Over and out for now,
>
> --Glen
>
> [1] http://www.w3.org/TR/soap
> [2] http://www.w3.org/TR/SOAP-attachments
> [3] http://www.themindelectric.com/products/xml/xml.html
> [4] http://www.w3.org/2000/xp/
>
> > -----Original Message-----
> > From: Sam Ruby [mailto:[EMAIL PROTECTED]
> > Sent: Monday, April 02, 2001 4:00 PM
> > To: axis-dev@xml.apache.org
> > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> > Subject: RE: cvs commit: xml-axis/java/src/org/apache/axis/utils
> > XMLUtils.java
> >
> >
> > Glen Daniels wrote:
> > >
> > > OK, here's my suggestion.  Take it with appropriate salt.
> > >
> > > DOM is pretty much a pain in the ass to work with, and we're Java
> > > developers, with access to JDOM.  JDOM is screaming along
> > in terms of
> > > functionality, and they now deal just fine with JAXP on the
> > bottom end, so
> > > you can use whatever parser you want underneath there.
> > JDOM is also going
> > > to be rolled into the Java standard fairly soon (JSR-102, I think).
> > >
> > > Until we figure out what we're "really" doing about XML parsing and
> > > modeling, I think we'd move much faster with JDOM, and
> > that's where I think
> > > we should be.  Besides, if we're going to end up using some
> > other model like
> > > pull or whatever anyway, why should it matter if we use
> > JDOM or DOM right
> > > now?
> > >
> > > Suggestion : put JDOM back for now, and feel free to use the JAXP
> > > interface to pick a parser.
> >
> > OK, here's my suggestion.  Take it with appropriate salt.
> >
> > Warning: the message is a real downer.  Parental discretion advised.
> >
> > The xml-soap implementation continues to be popular.  It is
> > getting ever
> > more interopable with other implementations (thanks Glen!).
> >
> > The biggest gripe I hear is that it can't process as many messages per
> > second as some other implementations.  Some say it is Java's
> > fault, but I
> > see some boasting orders of magnitude improvements over
> > Apache with their
> > Java implementations.  Others have noticed perhaps a 20%
> > improvement with
> > C/C++.
> >
> > Some measurements suggest that up to the 75% of the time is
> > in the parser.
> > Even if we accept that on face value, we have to conclude
> > that 25% of the
> > time is not, and even if the parser were eliminated entirely
> > we will never
> > see an order of magnitude improvement by just fixing the parser.
> >
> > I believe that some new thinking is required.  It likely will
> > require some
> > cooperation with the parser team (hence why I am copying the
> > Xerces mailing
> > list, and for that matter Jason too in order to get a JDOM
> > perspective) to
> > pull it off.
> >
> > Meanwhile, my response to "if we're going to end up with some
> > other model
> > like pull or whatever anyway" is that I don't think it much
> > matters what
> > you work with right now as it will likely by DOA.
> >
> > Lets start by setting some priorities, and expressing them
> > with concrete
> > scenarios and test cases.  Lets start with a trivial
> > implementation which
> > simply reads from a socket and sends back a canned reply, and
> > measure that.
> > No parser, no servlet engine, simply Java code.  Then lets
> > slowly introduce
> > more function measuring the impact and determining if the impact is
> > reasonable and if not what is the alternative.
> >
> > Meanwhile, lets figure out a concrete way to express our
> > requirements to
> > the parser team in a way that helps them understand what our
> > needs are.
> >
> > Thoughts?
> >
> > - Sam Ruby
> >
> > Disclaimer:  IMHO, a parser and a servlet engine is a
> > requirement, don't
> > take any of the above as an indication to the contrary.
> >
>

Reply via email to