On 27 Feb, 15:54, "Matthew Peters" <[EMAIL PROTECTED]>
> I have been having some recent correspondence with Ben Barringer on
> the performance of SCA. Some of us have been talking about this
> amongst ourselves,and in correspondence with Ben. I wanted to move the
> conversation onto here so that anyone can share and we capture the
> WHAT'S THE PROBLEM
> The nub of it is that we spend a lot of time processing XML schema
> files in order to build an SDO model. Sometimes it is schema that SCA
> wants for itself - the schema for SOAP or the schema for WSDL, for
> example - but just it is just as likely to be schema for the
> application's complex types, and these can be enormous: the schema for
> eBaY for example is enormous.
> Ben is not the only one who has pointed out that we are making
> ourselves unusable for some applications with our performance:
> Adam Trachtenberg and Rob Richards have both commented adversely on
> SDO really falls down for me is performance.
> (BTW this thread has some suggestions in it for things that we might
> do to help the performance)
> It's not just one problem. We know that SCA for PHP sometimes loads
> the same schema more than once in a single request. We know that SDO
> runs any schema it does load through a SAX parser twice when ideally
> it would do it once. We know there are places in the parsing that we
> can get some improvement. I don't expect any truly dramatic
> improvements if we just chip away at those though.
> DISCUSSION TO DATE
> Given how infrequently a given WSDL or schema file changes, it makes
> no sense to pound away on it building the SDO model from it on every
> request. We ought to cache the result of doing that: caching either
> the SDO model or the data factory that contains that model.
> There are two approaches we could take:
> 1. We could try to keep the interface unchanged, so all PHP code
> continues to use just SDO_DAS_XML::create() and addTypes() ...
> 2. We could put in some explicit caching that is visible at the PHP
> level and is controlled by the SCA for PHP code or even the
> application code somehow
> There are, independently, a couple of possibilities for where and what
> we cache. Two options seem to be:
> A. we could serialise the SDO model out to a file and read it back
> in when needed ...
> B. we could hold on to the data factory within memory, within the
> sdo_php extension.
> We examined option A, write the XML DAS to a file. What we found is
> that there is logic in the XML DAS to cache the model to a file
> already, but it caches as schema, so reading it back in just gets us
> back into loading schema again. So, we would need to come up with a
> format - binary or human-readable - that is quicker to re-read. We
> imagine by the way that anything cached in this way does not have to
> last very long. We would not want to get into the situation of trying
> to have file formats that were compatible across different releases of
> SDO, or between different platforms, or anything fancy.
> So, we have concluded that the simplest thing to do is probably to
> cache in memory, option B.
> Now look at the options 1. vs 2. i.e the interface. The ideal is
> probably to keep the interface unchanged, but in the meantime we might
> want to do something quicker to implement as a stop-gap, even if it
> puts a bit of responsibility into the SCA code.
> The thing that worries me about option 1. comes about because we have
> addTypes(). If you do create(), followed by a string of addTypes(), at
> what point do you consider the data factory/model finished? And then
> they come back issuing the same string of create() and addTypes()
> (hence wanting the exact same model), how do you spot that and use the
> cached one? It seems to me that that needs a solution. Perhaps allow
> create() to take an array of types, and make that array the the key to
> the cached DAS?
> You also need to consider what to do to catch when the files changed
> of course. Would you inspect the file modification times to check they
> had not changed? Would you want to do some quick hash of the contents
> as a backup check?
> I now want to finish this posting and leave it up to others to
> comment. I intend to close with an extract form Ben's most recent
> note. We had got to the point where I was suggesting the combination
> of cache in memory, under PHP control, and I had suggested explicit
> caching calls like
> Ben's reply:
> I have a couple of suggestions. If you have a saveModuleUnderKey
> function it seems like you would need a clearModuleUnderKey function
> as well. Not sure if it would be good or not but to reduce the number
> of function calls we could add another parameter to Create such as
> $cache='true. Then if it wasn't cached it would cache it with the
> file name and if it was cached it would reload it from cache. In most
> real world scenarios that I can think of most everyone will want to
> use the caching, so making it the default might be a good idea.
> So the code could be
> $xmldas = SDO_DAS_XML::create('cmssys.xsd', true) ;
> With this approach we will still need a SDO_DAS_XML::ClearCache()
> I realize this gets more complicated because if we updated our xsd
> file we would need to clear the cache so that leaves me checking the
> modified date on the xsd file every page load in dev and QA. Not
> trying to ask for too much here but it would be great if the extension
> could check for date modified on the xsd file and reload automatically
> if it has been updated. So what I am proposing is
> 1. Add a new cache parameter to Create() that defaults to true.
> 2. Use the file name as the key for caching. This would also avoid
> accidentally caching the same file with different keys.
> 3. Check the xsd file modified time and refresh the cache if changed.
> This could be controlled via PHP.ini setting so it could be turned off
> for peak performance. In our Dev/QA region we would have it
> automatically refreshed in production we would turn this off for
> better performance.
> I hope I have captured enough of the conversation to date that we can
> continue from here. Any comments, anyone?
Hi Matthew, I aggree that the in memory cache is likely to lead to the
desired result more quickly than going down the route of defining a
new serialization format.
The API though is a little more tricky. You have raised a couple of
- The type model may not be loaded in one go and hence the name of the
originally loaded file may not uniquely identify the resulting type
- Could optionally allow a key to be specified on create and
default to the file name if a key is not provided.
- It may be the case that the initial schema file is retrieved
remotely, via a URL, or that the local file includes remote files.
- This is where caching can give us the bigest payback.
- This makes automatic change testing, i.e. based on stating a
file, difficult because while the file itself may not have changed
something that it includes may well have done.
- typically live XML versioning is done via adopting namespace
naming conventions. Uploading changed xsd files automatically to a
production system without due care, i.e. associated code changes, will
likely lead to problems. We have this problem anyhow when we restart
the server so maybe you would expect all schema files to be copied to
the local file system but I don't know if you can guarantee this.
- On this basis I would avoid automatic cache refreshes at the
moment. At least if you are going to provide it to make the
development environment more flexible make it switchable as Ben
suggests. We could alternatively provide a cache clear script that
does the job for us in a development environment without taking the
service down.We would need the CacheClear interface Ben suggests to
make this work.
You received this message because you are subscribed to the Google Groups
To post to this group, send email to email@example.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at