[sqlite] Looking for advice on improving performance of an algorithm

Michael Grigoriev Fri, 24 Oct 2003 15:23:19 -0700

Hi,

I am working on IMMS [http://www.luminal.org/wiki/index.php/IMMS/IMMS], an
adaptive playlist plugin for XMMS. It uses SQLite as the backend to store
all the metadata.


The algorithms IMMS uses for song correlation are in essence graph
algorithms. The graph currently is stored in the database as an adjacency
matrix (3 columns: origin, destination, link weight).

The most resource intensive algorithm has to do the following.
Given a graph of form
   
         A
        /
    P--Q--B
        \
         C

And points P and Q, it tries to create links between P and the outer graph
of Q. So in the example above it would create/update links P--A, P--B, and
P--C.

This would be easy to do, except the weight of the newly created links depends
on the weight of the current link between Q and node in question, as well as
the weight of the link from the node to P.

So right now, I do something like (this is a simplified example):

    (Have to create new table here, otherwise the table will be locked,
     and I will not be able to run updates from the callback)

    CREATE TEMP TABLE 'Temp' AS
        SELECT * FROM 'Correlations' WHERE origin = 'Q' AND destination != 'P';
    SELECT * FROM 'Temp';

And then the callback gets called it gets 3 parameters, lets call them
ORIGIN, DESTINATION and WEIGHT, and I have to do something horrible like:

    SELECT count(weight) FROM 'Correlations'
        WHERE origin = ORIGIN AND destination = DESTINATION;

    and then if it returns 0, I insert the new value

    INSERT INTO 'Correlations' ('origin', 'destination', 'weight')
                "VALUES (ORIGIN, DESTINATION, WEIGHT / MAX_WEIGHT);

            (where MAX_WEIGHT is just some constant)
    
    and if the select query returned something other than 0, I add to the
        existing link

    UPDATE 'Correlations' SET weight = weight + WEIGHT / MAX_WEIGHT
        WHERE origin = ORIGIN AND destination = DESTINATION;


Needless to say, this is very slow.

I was wondering if anybody has any ideas on how I can optimize this.
My knowledge of SQL is not too in depth, so I might be missing something.

The main problem (I think) is the way the adjacency matrix is stored.
Because there are two nodes, I cannot really use a UNIQUE constraint to be
able to simply do something like INSERT OR REPLACE. And even if I say
concatenated origin and destination (like make "2", "5" into "2|5" or
something), and enforced uniqueness on that, I don't know how to make a
single query that will add to the link or insert it if it does not exist....

So I do not see a way to compress this to a single update statement (which
would be significantly more efficient because there would be no need call
the callback, or even create the temp table), even if I was willing to
sacrifice some functionality to do that. For example I could maybe live
without using the weight of the links between Q and its external nodes, and
just always update links between P and the nodes by a constant value....
But because of the problem above, I don't see a good way to do even that.

Has anyone done any graph-algorithms-in-an-sql database type stuff? Any
ideas for a better way to store the graph in SQLite to make the updates
faster and/or the single query update possible?

Any suggestions would be greatly appreciated. Thanks in advance.

-- 
Have fun,                                       I tried to no avail
Michael "mag" Grigoriev           To keep my eyes from growing pale
[EMAIL PROTECTED]                         But my vision came too late
http://www.luminal.org               And my belief started to abate

pgp00000.pgp
Description: PGP signature

[sqlite] Looking for advice on improving performance of an algorithm

Reply via email to