Hi Max,
I had a similar situation in the past involving a single table of
Content Items which evolved from simple strings to elaborate objects
with branching / staging, localization, history, and cached
processing. The solution I implemented involved three tables:
1. Content Items: a placeholder for the item and its primary data
(id, name, current version, access level, etc.)
2. Content Item Instances: the item value is kept here (id, item_id,
version, value, etc.)
3. Content Items Metadata: the caching lives here (id, item_id, key,
value, creation date)
When a Content Item is updated you'll increment its version number
and add a new instance with the new value and version number (be wary
of the potential race condition here). That gives you a full item
history in the Content Item Instances table.
When an item is accessed you'll check its Metadata to see if a cached
value for its particular processing key exists. If it does and the
creation date is later than the update date of those processors (I
use a single site-wide update date for simplicity) show the cached
value. Otherwise create the cache and store it in the Metadata table.
Some notes:
You might be able to store the value directly in the Content Items
table if you don't have to deal with i18n, staging, branching and the
like. In my case I needed multiple instances per Content Item to
begin with, and the history was a simple bonus extension to that
implementation. You'd still need something like Table 2 above, but
you wouldn't have to check it every time you need the raw value for a
Content Item.
Likewise my metadata situation is a bit more complicated, as I
actually store the caches on a per-instance basis. I still have Table
3, but I also have a fourth table for Content Item Instance Metadata
which is where the caches live.
You mentioned that you have many tables in your database that need
this treatment. You may need to experiment a bit to see if it is more
efficient to triple the number of tables in your database (ouch) or
to add something like a 'table_name' field to Tables 2 and 3 above
and use them for all your data. I suspect this later route may lead
to some table size issues, however, as those tables tend to fill up
rather rapidly.
I've been meaning to write this up for awhile; if the ideas seem
useful I can prepare a more rigorous treatment. Let me know what you
eventually decide to use, I'd be interested to hear how your
particular requirements shape the solution.
-- dann
Hello buds,
It's been a while since I've been active on the list but I figured
I'd give
a holler and see if anyone had any suggestions for an application
design
problem I've run into.
I have a large number of text fields across many tables in a fairly
large
database all of which can be manipulated in any number of ways.
Some common
manipulations would be scrubbing strings for display on the web (XHTML
compliance and XSS avoidance), censoring of "bad" words, rich-text,
etc.
All in all, once you mix and match all of the various text
manipulations,
you end up with a large number of versions of the same chunk of
text, and
you need access to all of them based on a plethora of variables
such as user
options, access interface etc. On top of that, some fields can be
edited,
and I'd like to keep copies of the entire revision history, which adds
another level of complexity.
Originally I thought of some sort of memory caching solution, but
the main
goal of this is to come up with a scalable solution and there is
currently a
few gigabytes of text that this would apply to, so if anything it
would
probably need to expire. It's possible that I could have some
mixture of
short-term memory cache and long-term disk cache, as disk/database
space
isn't a large concern.
Another issue is manipulation function versioning, e.g. when a new
word is
added to the censor function, you want to purge the cache of all of
the
censored text created by the last version.
Maybe I'm just over-complicating the entire thing, but doing this
sort of
manipulation on a high traffic site seems like a gigantic
duplication of
CPU-intensive work that could (and should) be avoided.
I've come up with a lot of solutions, but none of them seem very
elegant.
I'm trying to avoid a lot of excess DB queries and SQL joins.
I've done some searching around and it seems like anyone who has
solved this
problem hasn't discussed it publicly. I thought maybe someone
dealing with
locale on a large scale might have come up with a good solution,
but since
locale is mostly static, it doesn't seem to apply in most cases.
So has anyone dealt with something similar, or is there an obvious
solution
that I'm missing? I'd be interested in hearing some of the more
seasoned
NYPHPer's opinions.
Thanks for any advice in advance!
-Max
_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk
NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com
Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php