I have been having some recent correspondence with Ben Barringer on
the performance of SCA. Some of us have been talking about this
amongst ourselves,and in correspondence with Ben. I wanted to move the
conversation onto here so that anyone can share and we capture the
WHAT'S THE PROBLEM
The nub of it is that we spend a lot of time processing XML schema
files in order to build an SDO model. Sometimes it is schema that SCA
wants for itself - the schema for SOAP or the schema for WSDL, for
example - but just it is just as likely to be schema for the
application's complex types, and these can be enormous: the schema for
eBaY for example is enormous.
Ben is not the only one who has pointed out that we are making
ourselves unusable for some applications with our performance:
Adam Trachtenberg and Rob Richards have both commented adversely on
SDO really falls down for me is performance. ")
(BTW this thread has some suggestions in it for things that we might
do to help the performance)
It's not just one problem. We know that SCA for PHP sometimes loads
the same schema more than once in a single request. We know that SDO
runs any schema it does load through a SAX parser twice when ideally
it would do it once. We know there are places in the parsing that we
can get some improvement. I don't expect any truly dramatic
improvements if we just chip away at those though.
DISCUSSION TO DATE
Given how infrequently a given WSDL or schema file changes, it makes
no sense to pound away on it building the SDO model from it on every
request. We ought to cache the result of doing that: caching either
the SDO model or the data factory that contains that model.
There are two approaches we could take:
1. We could try to keep the interface unchanged, so all PHP code
continues to use just SDO_DAS_XML::create() and addTypes() ...
2. We could put in some explicit caching that is visible at the PHP
level and is controlled by the SCA for PHP code or even the
application code somehow
There are, independently, a couple of possibilities for where and what
we cache. Two options seem to be:
A. we could serialise the SDO model out to a file and read it back
in when needed ...
B. we could hold on to the data factory within memory, within the
We examined option A, write the XML DAS to a file. What we found is
that there is logic in the XML DAS to cache the model to a file
already, but it caches as schema, so reading it back in just gets us
back into loading schema again. So, we would need to come up with a
format - binary or human-readable - that is quicker to re-read. We
imagine by the way that anything cached in this way does not have to
last very long. We would not want to get into the situation of trying
to have file formats that were compatible across different releases of
SDO, or between different platforms, or anything fancy.
So, we have concluded that the simplest thing to do is probably to
cache in memory, option B.
Now look at the options 1. vs 2. i.e the interface. The ideal is
probably to keep the interface unchanged, but in the meantime we might
want to do something quicker to implement as a stop-gap, even if it
puts a bit of responsibility into the SCA code.
The thing that worries me about option 1. comes about because we have
addTypes(). If you do create(), followed by a string of addTypes(), at
what point do you consider the data factory/model finished? And then
they come back issuing the same string of create() and addTypes()
(hence wanting the exact same model), how do you spot that and use the
cached one? It seems to me that that needs a solution. Perhaps allow
create() to take an array of types, and make that array the the key to
the cached DAS?
You also need to consider what to do to catch when the files changed
of course. Would you inspect the file modification times to check they
had not changed? Would you want to do some quick hash of the contents
as a backup check?
I now want to finish this posting and leave it up to others to
comment. I intend to close with an extract form Ben's most recent
note. We had got to the point where I was suggesting the combination
of cache in memory, under PHP control, and I had suggested explicit
caching calls like
I have a couple of suggestions. If you have a saveModuleUnderKey
function it seems like you would need a clearModuleUnderKey function
as well. Not sure if it would be good or not but to reduce the number
of function calls we could add another parameter to Create such as
$cache='true. Then if it wasn't cached it would cache it with the
file name and if it was cached it would reload it from cache. In most
real world scenarios that I can think of most everyone will want to
use the caching, so making it the default might be a good idea.
So the code could be
$xmldas = SDO_DAS_XML::create('cmssys.xsd', true) ;
With this approach we will still need a SDO_DAS_XML::ClearCache()
I realize this gets more complicated because if we updated our xsd
file we would need to clear the cache so that leaves me checking the
modified date on the xsd file every page load in dev and QA. Not
trying to ask for too much here but it would be great if the extension
could check for date modified on the xsd file and reload automatically
if it has been updated. So what I am proposing is
1. Add a new cache parameter to Create() that defaults to true.
2. Use the file name as the key for caching. This would also avoid
accidentally caching the same file with different keys.
3. Check the xsd file modified time and refresh the cache if changed.
This could be controlled via PHP.ini setting so it could be turned off
for peak performance. In our Dev/QA region we would have it
automatically refreshed in production we would turn this off for
I hope I have captured enough of the conversation to date that we can
continue from here. Any comments, anyone?
You received this message because you are subscribed to the Google Groups
To post to this group, send email to firstname.lastname@example.org
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at