Folks,

In light of recent comments about slide running slowly when we have 100+
documents, I thought I'd take a look into it. I'd say that there isn't a lot
we could do with indices in the database, but rather that we're being very
inneficient in our SQL. I did some measurements and found that on doing a
PROPFIND in a scope containing two text files, slide issued 61 select
statements (approx 1000ms). With 32 files, a PROPFIND resulted in 438
selects (4400ms). The startup time on a query is pretty significant and so
we can expect this to have a dramatic effect with any number of documents.

There is scope for "JOIN" to reduce the number of selects, but I suspect
that this is not the best solution on its own.

For example the sequence

  select * from objects where uri='/users/root/foo'
  select * from children where uri='/users/root/foo'
  select * from links where linkto='/users/root/foo'
  select * from permissions where object='/users/foo'

  select * from objects where uri='/users/root/bar'
  select * from children where uri='/users/roo/bar'
  select * from links where linkto='/users/root/bar'
  select * from permissions where object='/users/root/bar'

[etc]

accounts for a lot of the work.

Surely a far better way to do it would be to perform a _single_ search on a
scope which returns a set of records corresponding to its children with
fields along the lines of:

                uri classname linkcount permission

there could be several of these for a uri, corresponding to different
permissions (one per role for example), or better yet, compute the target
before issuing the select, passing the role/person in question as a part of
the "WHERE" clause.

Now... all this is really nice but is would involve a little more than
simply reworking the schema. What I propose is that we extend the store's
cache for the JDBC case and build it so that it gets populated by the
children of the current node using a single search

Thinking initially about a depth=1 PROPFIND, if we work on the statistical
likelihood of an initial request for a uri being followed by requests for
its children, we can save time by dragging all of them from the store at
once. To prevent the cache growing too fast, there is plenty of scope for
reducing its size by altering the schema - the classname could be a pointer
to a lookup table and permissions could be expressed using lookup tables as
well. We might reasonably be able to expect to make a typical record around
32 bytes.

Now... for the situation where we're talking about recursive PROPFINDs I can
see a few ways to proceed - but let's say we pass the depth with the initial
request and have the store retrieve the entire relevant part of the tree.
This is probably OK sizewise and is perhaps the simplest way to do it
programmatically.

There is no need for testing for nodes being "clean" or "dirty" for now -
that could be a future extension, but for now all we're talking about is
condensing all the searches pertaining to a *single* PROPFIND command
("snapshot") into one and using a cache during the execution of it. We could
scrub the cache at the end of the PROPFIND for now and still ought to see a
significant performance boost.

Comments?

Regards,
Peter



Reply via email to