Re: [OpenXPKI-devel] Database layer: indices and schema

Martin Bartosch Wed, 04 Apr 2012 01:44:44 -0700

Hi Oli,

>> Speaking of that, I don't find too many instances where such joins are 
>> actually used - possibly we could work around that. I need to think about it 
>> for a while.
> 
> I dont know if this is easy to implement and it will certainly need some 
> rework, but whats about having two contexts (or context tables), one for 
> "scalar" data and for blobs?


well, that's actually what we did to address the problem in our installation 
here. We have a workflow_context table which uses an indexed VARCHAR2 for the 
workflow_context_value and a workflow_context_bulk table which uses a CLOB for 
that. The persister has been modified to read both tables and fill the context 
from whatever table returns a non-null content. 
For write operation it decides from a lookup table if this particular workflow 
key is known to potentially contain large amounts of data. Workflow context 
entries that are considered bulk data are then written to the bulk table, 
everything else goes to the normal table. Joins are only done on the normal 
table.
Problem: you need to know if a certain workflow_context_key (which is after all 
free style and chosen in the wf definition) may contain bulk data. This is 
currently done in a hardcoded lookup table, which is surely not what one wishes 
to see as a generic solution.

Other possible approaches to solving this:

1. Use the above solution of having to tables (workflow_context and 
workflow_context_bulk) and introduce a prefix or modifier to workflow context 
keys that indicate that they may contain bulk (and non-indexed) data. That way 
the workflow author can decide which values should be stored in the bulk table, 
and a hardcoded lookup table is no longer needed. Downside: you need to modify 
all workflows and modify the naming of workflow context entries to use the 
prefix if they shall include bulk data.

2. Use heuristics in the persister to decide in which table the data shall go: 
If the workflow context value length exceeds a certain length, the entry is 
written to the bulk table, otherwise it goes to the normal table. Advantage: 
completely transparent. Disadvantage: it is not deterministic which workflow 
context values are available in the normal table for joins, possibly producing 
inconsistent behavior when querying the database.

I am a bit inclined towards solution 1. - the code is almost there 
(unpublished), and as far as I can see there are really not too many places 
where bulk data is actually processed. Mainly this affects requests, 
certificates, P12 data and so on, as well as possibly data read from LDAP.

cu

Martin



------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
OpenXPKI-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openxpki-devel

Re: [OpenXPKI-devel] Database layer: indices and schema

Reply via email to