Hi,

> this is a bug. I checked the schema code and it looks like I mixed up
> INDEX and PRIMARY_KEY. I put it on my todo list.

the git transition and introduction of the development branch gives us a very 
good opportunity to refine some core components, and I think we need to address 
some issues with the database layer:

1. Indices
Index creation is currently not optimal, some required indices are not created 
automatically.

2. Additional columns for workflow engine and cluster support
Oli and Andreas are working on a "supervisor" or watchdog mechanism that is 
necessary for the cluster support and for properly handling stalled workflows. 
This will require at least an additional column in the workflow table.

3. Regarding the indices we encountered a design deficiency in relation to the 
database abstraction and the Oracle database driver.

The problem: 
In our original design for OpenXPKI we specified two abstract data types: TEXT 
and TEXT_KEY. 
TEXT_KEY was supposed to store at least 256 characters, while TEXT should store 
at least 32k characters.

Now we have the following constraints:
- a workflow context value shall be of type TEXT (i. e. 32k minimum), otherwise 
large data cannot be stored
- we need an index on the workflow context key AND the workflow context value, 
otherwise certain joins will take forever (forces full table scan, which can 
take a long time with even a low number of workflow instances). 

With Oracle we have a slight problem: Oracle VARCHAR2 only stores a maximum of 
4000 characters. This forces us to choose CLOB as the database type for the 
abstract TEXT type. 
Now our shiny enterprise grade Oracle database, it turns out, does NOT 
(directly) support indices on CLOBs (there is a workaround that allows indexes 
on CLOBs with the Oracle Text package or whatever, but this a) costs extra 
money and b) only creates a static index that gets outdated after the first 
write operation to the table).

This is a real problem which we addressed with a kludge (not available 
upstream) with our Oracle installation here: we introduced a 
workflow_context_bulk table which gets used for context entries which are known 
to get huge. The code has severe drawbacks (it reduces flexibility), hence it 
was not committed upstream.

To address this problem a possible solutions might be the following 
modifications to the workflow persister and the database layer:

- extend the workflow_context table to store a truncated (and indexable) 
version of the workflow_context_value, e. g. named 
"workflow_context_value_indexed"
- add an index on workflow_context_value_indexed
- introduce a flag in the driver that indicates if the database support indexes 
on TEXT types
- modify the persister to use the following logic: on write operations, write 
the full workflow context value to workflow_context_value. In addition, write a 
truncated version (truncated to the max length of TEXT_KEY) to 
workflow_context_value_indexed.
- if the database driver does support indexes on TEXT values, use 
workflow_context_value in joins or search operations, otherwise use 
workflow_context_value_indexed
- for any record found in a search operation, return the value stored in 
workflow_context_value

The drawback is that it might result in unpredictable behavior for workflow 
entries exceeding the max length of TEXT_KEY due to ambiguities in the join 
over workflow_context_value_indexed.

Any better ideas?

cheers

Martin


------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
OpenXPKI-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openxpki-devel

Reply via email to