RE: XMLBeans performance and source code status [Re: Proposal: XMLBeans]

2003-07-06 Thread David Bau
Adding a few links and other info -

Aleksander Slominski wrote:
 http://dev2dev.bea.com/articles/hitesh_seth.jsp that is
 good overview but has not enough technical details and
 other docs): as far as i can understand actual objects

Above you've linked to an XML Journal review reprint.

Here is a page the points to other information:

http://dev2dev.bea.com/technologies/xmlbeans/index.jsp

One of the links is a very brief summary of some
brutally transparent and upfront performance and
test compliance numbers:

http://workshop.bea.com/xmlbeans/schemaandperf.jsp

BTW, despite the fact that we posted the numbers on
pretty marketing pages on bea.com, the numbers above
are not marketing-varnished numbers - they are the
actual measurements that we developers track day-to-day.
Those are numbers we measure to help us focus on
use-cases that we're working on making faster.

The XML cursor access _without_ strong-type conversion is
between 10% and 58% faster than Xerces2 DOM access, going
to about 35% for large (1Mb) XML documents.  Xerces2, btw,
is extremely speedy, so we're proud to be on par with it
in any scenario!

Adding strong-type conversion (for example parsing xs:int
to java int and dates to Calendars) adds enough cost that
reading the data out of a document is between 0% and 48%
slower than reading out using (untyped) Xerces2 DOM.

Apples-to-apples, we measure ourselves significantly
faster than JAXB RI and Castor (140% to 282% and 66% to
800%). Please don't sue me - those are our real numbers,
but if performance is important to your application,
you should measure it for yourself.

We do fault-in object allocations when demanded, and
you can see in our memory test that when we fault-in
all the objects for a whole document, we take up more
memory than Xerces2 DOM.  One current project is to take
steps to reduce that number.  When we use XmlCursor and
don't fault-in all the objects, the memory number you
will find to be much slimmer. (I don't have a measurement
because our measurements focus on problem areas we're
actually working on.)

Eric Vasilik writes:
 The synchronization described refers to the fact
 that one may manipulate the XML via the XmlCursor
 or the strongly typed XMLBean classes generated from
 the schema

As Eric says, we don't want to confuse the two uses of
the word synchronize.  But since Aleksander brought it
up - here's some information on thread-synchronization
too.

We examined both with- and without-thread-synchronized
access, and found that without-thread-sync, programmers
fall into traps like working with XML config files on
multiple threads in thread-unsafe ways without without
being aware of it.  We found that it costs between 1%
(strongly-typed access) and 10% (XmlCursor access) to
synchronize. So we're currently synchronizing access
to the data now, paying for more [app] stability with a
little bit of perf. We'd like to provide the option to
single-threaded (or savvy) users of not synchronizing
to get the 1-10% back. That's future work.

As Eric pointed out, the key I think is not in what our
current numbers are, but the fact that we've isolated
our implementation from our interface so that we have
the flexibility of reducing allocations, deferring work,
and otherwise improving performance further in the future.
Abstracting the primary store behind a cursor rather
than a tree of objects with identity gives us some leeway
in shuffling our implementation strategy in the future
without restructing the APIs.

David Bau

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



XML Beans details

2003-07-03 Thread David Bau
On 7/3/03 4:22 PM Andrew C. Oliver [EMAIL PROTECTED] wrote:
 
 On 7/3/03 3:50 PM, Cliff Schmidt [EMAIL PROTECTED] wrote:
  
  this might be a good time for David Bau, the architect behind 
  XMLBeans, to jump in with his views.
 
 Okay.  It sounds like there are some issues which warrant 
 this over others.
 I could see this being useful in things like web services as 
 well...  Limit
 object creation/serialization and yada yada yada...  Though 
 from reading the
 10k foot view you could support JAXB if you wanted to.  Just 
 an element of

Yeah, limiting object creation is one of the particular benefits
to the approach.  Also JAXB is of particular interest to us!
We're particularly interested in JSR 222.

I started writing up some details to the Q of what is XML Beans
and how does it compare to X Y and Z? but it was getting long
so I posted it on a wiki page

http://nagoya.apache.org/wiki/apachewiki.cgi?XmlBeansExplanation

David Bau
XML Beans architect

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: XML Beans details

2003-07-03 Thread David Bau
 On Thursday, July 03, 2003 5:23 PM, Neil Graham 
 [EMAIL PROTECTED] writes:
 
  How does this approach handle things like the schema unique particle
  attribution or element declarations consistent constraints?  
  Clearly you
  could parse schema documents simply using information from the
  schema-for-schemas, but using that alone, surely it's not 
 possible to
  handle all conditions imposed on valid schemas by that spec.
 
 Ah, Neil, you're a man in the trenches with Schema along with me!
 

My apologies Neil,

Just after hitting send I realized I think I answered a
different question than the one that was asked.  Let me try again.

We have three layers the compiler handles.  Basically the compiler's
job is to go from (1) - (2) - (3).

(1) Schema syntax
(2) Schema type system in-memory
(3) Schema metadata binary persistence format

Layer (1) is handled by schema-for-schema as described in the
previous mail.

Layer (2) is where we do type resolution and verify all the
semantic rules of schema that are not imposed by s4s for
example particle valid (restriction) and element consistency
and actually many many many other rules.

I think layer (2) is the answer to your question.  Layer (2)
catches the valid-schema errors that are not syntactic-validation
issues.  Building layer (2) is the core of the compiler.

Once we pass layer (2) then we have a fully valid schema
type system and can persist it as layer (3).

Layer (3) can be loaded directly at runtime for high-speed
schema validation.  [basically it reconstructs (2) on-demand.]
Or at runtime if you choose to build the schema types via
layer (1)  and (2) you can do that too.

David Bau
XMLBeans Architect

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]