Hi Oli, >> Speaking of that, I don't find too many instances where such joins are >> actually used - possibly we could work around that. I need to think about it >> for a while. > > I dont know if this is easy to implement and it will certainly need some > rework, but whats about having two contexts (or context tables), one for > "scalar" data and for blobs?
well, that's actually what we did to address the problem in our installation here. We have a workflow_context table which uses an indexed VARCHAR2 for the workflow_context_value and a workflow_context_bulk table which uses a CLOB for that. The persister has been modified to read both tables and fill the context from whatever table returns a non-null content. For write operation it decides from a lookup table if this particular workflow key is known to potentially contain large amounts of data. Workflow context entries that are considered bulk data are then written to the bulk table, everything else goes to the normal table. Joins are only done on the normal table. Problem: you need to know if a certain workflow_context_key (which is after all free style and chosen in the wf definition) may contain bulk data. This is currently done in a hardcoded lookup table, which is surely not what one wishes to see as a generic solution. Other possible approaches to solving this: 1. Use the above solution of having to tables (workflow_context and workflow_context_bulk) and introduce a prefix or modifier to workflow context keys that indicate that they may contain bulk (and non-indexed) data. That way the workflow author can decide which values should be stored in the bulk table, and a hardcoded lookup table is no longer needed. Downside: you need to modify all workflows and modify the naming of workflow context entries to use the prefix if they shall include bulk data. 2. Use heuristics in the persister to decide in which table the data shall go: If the workflow context value length exceeds a certain length, the entry is written to the bulk table, otherwise it goes to the normal table. Advantage: completely transparent. Disadvantage: it is not deterministic which workflow context values are available in the normal table for joins, possibly producing inconsistent behavior when querying the database. I am a bit inclined towards solution 1. - the code is almost there (unpublished), and as far as I can see there are really not too many places where bulk data is actually processed. Mainly this affects requests, certificates, P12 data and so on, as well as possibly data read from LDAP. cu Martin ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ OpenXPKI-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openxpki-devel
