Hi Martin, I don't think that it is a good idea to create indexes on LOBs. Is it possible to create a sha1 hash of the LOB and use it as an index key? Additionally we can generally forbid TEXT columns as index columns.
Best regards Michael Am 03.04.2012 16:13, schrieb Martin Bartosch: > Hi, > >> this is a bug. I checked the schema code and it looks like I mixed up >> INDEX and PRIMARY_KEY. I put it on my todo list. > > the git transition and introduction of the development branch gives us a very > good opportunity to refine some core components, and I think we need to > address some issues with the database layer: > > 1. Indices > Index creation is currently not optimal, some required indices are not > created automatically. > > 2. Additional columns for workflow engine and cluster support > Oli and Andreas are working on a "supervisor" or watchdog mechanism that is > necessary for the cluster support and for properly handling stalled > workflows. This will require at least an additional column in the workflow > table. > > 3. Regarding the indices we encountered a design deficiency in relation to > the database abstraction and the Oracle database driver. > > The problem: > In our original design for OpenXPKI we specified two abstract data types: > TEXT and TEXT_KEY. > TEXT_KEY was supposed to store at least 256 characters, while TEXT should > store at least 32k characters. > > Now we have the following constraints: > - a workflow context value shall be of type TEXT (i. e. 32k minimum), > otherwise large data cannot be stored > - we need an index on the workflow context key AND the workflow context > value, otherwise certain joins will take forever (forces full table scan, > which can take a long time with even a low number of workflow instances). > > With Oracle we have a slight problem: Oracle VARCHAR2 only stores a maximum > of 4000 characters. This forces us to choose CLOB as the database type for > the abstract TEXT type. > Now our shiny enterprise grade Oracle database, it turns out, does NOT > (directly) support indices on CLOBs (there is a workaround that allows > indexes on CLOBs with the Oracle Text package or whatever, but this a) costs > extra money and b) only creates a static index that gets outdated after the > first write operation to the table). > > This is a real problem which we addressed with a kludge (not available > upstream) with our Oracle installation here: we introduced a > workflow_context_bulk table which gets used for context entries which are > known to get huge. The code has severe drawbacks (it reduces flexibility), > hence it was not committed upstream. > > To address this problem a possible solutions might be the following > modifications to the workflow persister and the database layer: > > - extend the workflow_context table to store a truncated (and indexable) > version of the workflow_context_value, e. g. named > "workflow_context_value_indexed" > - add an index on workflow_context_value_indexed > - introduce a flag in the driver that indicates if the database support > indexes on TEXT types > - modify the persister to use the following logic: on write operations, write > the full workflow context value to workflow_context_value. In addition, write > a truncated version (truncated to the max length of TEXT_KEY) to > workflow_context_value_indexed. > - if the database driver does support indexes on TEXT values, use > workflow_context_value in joins or search operations, otherwise use > workflow_context_value_indexed > - for any record found in a search operation, return the value stored in > workflow_context_value > > The drawback is that it might result in unpredictable behavior for workflow > entries exceeding the max length of TEXT_KEY due to ambiguities in the join > over workflow_context_value_indexed. > > Any better ideas? > > cheers > > Martin > > > ------------------------------------------------------------------------------ > Better than sec? Nothing is better than sec when it comes to > monitoring Big Data applications. Try Boundary one-second > resolution app monitoring today. Free. > http://p.sf.net/sfu/Boundary-dev2dev > _______________________________________________ > OpenXPKI-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/openxpki-devel -- ___________________________________________________________________ Michael Bell Humboldt-Universitaet zu Berlin Tel.: +49 (0)30-2093 70143 ZE Computer- und Medienservice Fax: +49 (0)30-2093 70135 Unter den Linden 6 [email protected] D-10099 Berlin ___________________________________________________________________ PGP Fingerprint: 09E4 3D29 4156 2774 0F2C C643 D8BD 1918 2030 5AAB
smime.p7s
Description: S/MIME Kryptografische Unterschrift
------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________ OpenXPKI-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openxpki-devel
