Hi Martin,

I don't think that it is a good idea to create indexes on LOBs. Is it
possible to create a sha1 hash of the LOB and use it as an index key?
Additionally we can generally forbid TEXT columns as index columns.

Best regards

Michael


Am 03.04.2012 16:13, schrieb Martin Bartosch:
> Hi,
> 
>> this is a bug. I checked the schema code and it looks like I mixed up
>> INDEX and PRIMARY_KEY. I put it on my todo list.
> 
> the git transition and introduction of the development branch gives us a very 
> good opportunity to refine some core components, and I think we need to 
> address some issues with the database layer:
> 
> 1. Indices
> Index creation is currently not optimal, some required indices are not 
> created automatically.
> 
> 2. Additional columns for workflow engine and cluster support
> Oli and Andreas are working on a "supervisor" or watchdog mechanism that is 
> necessary for the cluster support and for properly handling stalled 
> workflows. This will require at least an additional column in the workflow 
> table.
> 
> 3. Regarding the indices we encountered a design deficiency in relation to 
> the database abstraction and the Oracle database driver.
> 
> The problem: 
> In our original design for OpenXPKI we specified two abstract data types: 
> TEXT and TEXT_KEY. 
> TEXT_KEY was supposed to store at least 256 characters, while TEXT should 
> store at least 32k characters.
> 
> Now we have the following constraints:
> - a workflow context value shall be of type TEXT (i. e. 32k minimum), 
> otherwise large data cannot be stored
> - we need an index on the workflow context key AND the workflow context 
> value, otherwise certain joins will take forever (forces full table scan, 
> which can take a long time with even a low number of workflow instances). 
> 
> With Oracle we have a slight problem: Oracle VARCHAR2 only stores a maximum 
> of 4000 characters. This forces us to choose CLOB as the database type for 
> the abstract TEXT type. 
> Now our shiny enterprise grade Oracle database, it turns out, does NOT 
> (directly) support indices on CLOBs (there is a workaround that allows 
> indexes on CLOBs with the Oracle Text package or whatever, but this a) costs 
> extra money and b) only creates a static index that gets outdated after the 
> first write operation to the table).
> 
> This is a real problem which we addressed with a kludge (not available 
> upstream) with our Oracle installation here: we introduced a 
> workflow_context_bulk table which gets used for context entries which are 
> known to get huge. The code has severe drawbacks (it reduces flexibility), 
> hence it was not committed upstream.
> 
> To address this problem a possible solutions might be the following 
> modifications to the workflow persister and the database layer:
> 
> - extend the workflow_context table to store a truncated (and indexable) 
> version of the workflow_context_value, e. g. named 
> "workflow_context_value_indexed"
> - add an index on workflow_context_value_indexed
> - introduce a flag in the driver that indicates if the database support 
> indexes on TEXT types
> - modify the persister to use the following logic: on write operations, write 
> the full workflow context value to workflow_context_value. In addition, write 
> a truncated version (truncated to the max length of TEXT_KEY) to 
> workflow_context_value_indexed.
> - if the database driver does support indexes on TEXT values, use 
> workflow_context_value in joins or search operations, otherwise use 
> workflow_context_value_indexed
> - for any record found in a search operation, return the value stored in 
> workflow_context_value
> 
> The drawback is that it might result in unpredictable behavior for workflow 
> entries exceeding the max length of TEXT_KEY due to ambiguities in the join 
> over workflow_context_value_indexed.
> 
> Any better ideas?
> 
> cheers
> 
> Martin
> 
> 
> ------------------------------------------------------------------------------
> Better than sec? Nothing is better than sec when it comes to
> monitoring Big Data applications. Try Boundary one-second 
> resolution app monitoring today. Free.
> http://p.sf.net/sfu/Boundary-dev2dev
> _______________________________________________
> OpenXPKI-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/openxpki-devel


-- 
___________________________________________________________________

Michael Bell                        Humboldt-Universitaet zu Berlin

Tel.: +49 (0)30-2093 70143          ZE Computer- und Medienservice
Fax:  +49 (0)30-2093 70135          Unter den Linden 6
[email protected]       D-10099 Berlin
___________________________________________________________________

PGP Fingerprint: 09E4 3D29 4156 2774 0F2C  C643 D8BD 1918 2030 5AAB

Attachment: smime.p7s
Description: S/MIME Kryptografische Unterschrift

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
OpenXPKI-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openxpki-devel

Reply via email to