On Thu, 28 Jan 2016 22:08:02 +0200
R Smith <rsmith at rsweb.co.za> wrote:

> I think you are misunderstanding the Pragma and the idea of automatic 
> indices. An automatic Index might be created on a table that doesn't 
> have an adequately assigned primary key. It might also be created during 
> a query (mostly SELECT or sub-SELECT) for which there is no useful Index 
> created by the table designer and the Query planner figures it will be 
> quicker to make an Index than to do table scans through the query. This 
> habit of creating indices during select queries can be forced to not 
> happen by setting the "PRAGMA automatic_index=0;", but this needs to 
> happen when you open the DB connection, or at a minimum, before you try 
> any query - not after the cursor is created,  by that time the index 
> might already be made. (This is why you are not seeing any speed 
> improvement).
> 
> I only mention all the above so you understand what the automatic 
> indexing is about, but it has almost certainly nothing to do with your 
> query slowness, and even if you switch it off at a more opportune time, 
> I would be surprised if it changes the query speed.

Indeed, I was misunderstanding. Are these indexes visible? Is there any 
relation with the ones I see from sqlitebrowser?

> 103k insertions to produce 15k rows... that is ~15% efficiency - the 
> opposite of good design. Perhaps we can help you find better SQL to 
> solve your problem. Let's see...

(red-face)

> 
> I do not know the shape of your data (it matters), but I'm guessing 
> "element" represents standard HTML tags with "attribute" and "value" 
> giving basic expansion of the attributes list. it'd probably be safer to 
> use non case-sensitive values and use standard equation tests in Selects.
> 

The shape will change, it's far from final. The tables at that step depends on 
the next steps in the overall procedure, which is not entirely fixed for now.

> Could you try these in your Query loop perhaps:
> 
> 
> -- Table: Renamed field count to cnt because "count" is an SQL reserved 
> word (though it will work, just a better habit)
> CREATE TABLE element_attribute_values (
>    element TEXT COLLATE NOCASE NOT NULL,
>    attribute TEXT COLLATE NOCASE NOT NULL,
>    value TEXT COLLATE NOCASE NOT NULL CHECK ((attribute <> '') OR (value 
> = '')),
>    cnt INTEGER NOT NULL DEFAULT 0,
>    PRIMARY KEY (element, attribute, value)
> );
> 
> -- Insert Loop start:
> -- This will simply fail if the PK already exists, else start the line 
> with 0 count.
> INSERT OR IGNORE INTO element_attribute_values VALUES (:element, 
> :attribute, :value, 0);
> 
> -- This might be faster since it uses only one lookup loop, but it might 
> also not be.
> -- If you share a list of example data to be inserted, we can find a 
> faster way. Try it and let us know...
> WITH EAV(id,icnt) AS (
>    SELECT rowid,cnt+1
>      FROM element_attribute_values
>     WHERE (element = :element) AND (attribute = :attribute) AND (value = 
> :value)
>     LIMIT 1
> )
> UPDATE element_attribute_values SET cnt = (SELECT icnt FROM EAV)
>   WHERE rowid = (SELECT id FROM EAV)
> ;
> -- Insert Loop end.
> 

The default may even be omitted, and may be the `LIMIT 1` too, as each triplet 
is unique.

I tried it, with the variant (no `LIMIT 1`) and it's about the same, even a bit 
more slow (not much, between one and two seconds more long). While speed is not 
the only concern, or perhaps I should not care that much about the DB file size 
(the other matters).

It's close to what I had at the beginning, which gave similar timings (just 
changed to match your recommendation about "cnt"'s name):

    INSERT OR IGNORE INTO element_attribute_values
      VALUES (:element, :attribute, :value, 0);
    UPDATE element_attribute_values 
      SET cnt = cnt + 1
      WHERE (element = :element)
        AND (attribute = :attribute)
        AND (value = :value);

> -- If you share a list of example data to be inserted, we can find a 
> faster way. Try it and let us know...

I don't mind, I can upload an archive somewhere. I guess you mean table's 
content?

I'm aware this use case may be a bit pathological, as I could use Python's 
dictionary. However, I decided to not to, for three reasons:

 * I wanted to see what it's like to use an SQLite DB as an application data 
container (file or memory)?;
 * Using a persistent DB is better for incremental process (may stop and resume 
later)?;
 * Persistent data is welcome for human review (I think about defining views in 
sqlitebrowser to dissect the results)?;

For persistence, I first tried CSV files, but this shows to be a inadequate. An 
SQL DB and a DB browser, looks better than CSV for this use?case.

-- 
Yannick Duch?ne

Reply via email to