On Thu, 28 Jan 2016 22:08:02 +0200 R Smith <rsmith at rsweb.co.za> wrote:
> I think you are misunderstanding the Pragma and the idea of automatic > indices. An automatic Index might be created on a table that doesn't > have an adequately assigned primary key. It might also be created during > a query (mostly SELECT or sub-SELECT) for which there is no useful Index > created by the table designer and the Query planner figures it will be > quicker to make an Index than to do table scans through the query. This > habit of creating indices during select queries can be forced to not > happen by setting the "PRAGMA automatic_index=0;", but this needs to > happen when you open the DB connection, or at a minimum, before you try > any query - not after the cursor is created, by that time the index > might already be made. (This is why you are not seeing any speed > improvement). > > I only mention all the above so you understand what the automatic > indexing is about, but it has almost certainly nothing to do with your > query slowness, and even if you switch it off at a more opportune time, > I would be surprised if it changes the query speed. Indeed, I was misunderstanding. Are these indexes visible? Is there any relation with the ones I see from sqlitebrowser? > 103k insertions to produce 15k rows... that is ~15% efficiency - the > opposite of good design. Perhaps we can help you find better SQL to > solve your problem. Let's see... (red-face) > > I do not know the shape of your data (it matters), but I'm guessing > "element" represents standard HTML tags with "attribute" and "value" > giving basic expansion of the attributes list. it'd probably be safer to > use non case-sensitive values and use standard equation tests in Selects. > The shape will change, it's far from final. The tables at that step depends on the next steps in the overall procedure, which is not entirely fixed for now. > Could you try these in your Query loop perhaps: > > > -- Table: Renamed field count to cnt because "count" is an SQL reserved > word (though it will work, just a better habit) > CREATE TABLE element_attribute_values ( > element TEXT COLLATE NOCASE NOT NULL, > attribute TEXT COLLATE NOCASE NOT NULL, > value TEXT COLLATE NOCASE NOT NULL CHECK ((attribute <> '') OR (value > = '')), > cnt INTEGER NOT NULL DEFAULT 0, > PRIMARY KEY (element, attribute, value) > ); > > -- Insert Loop start: > -- This will simply fail if the PK already exists, else start the line > with 0 count. > INSERT OR IGNORE INTO element_attribute_values VALUES (:element, > :attribute, :value, 0); > > -- This might be faster since it uses only one lookup loop, but it might > also not be. > -- If you share a list of example data to be inserted, we can find a > faster way. Try it and let us know... > WITH EAV(id,icnt) AS ( > SELECT rowid,cnt+1 > FROM element_attribute_values > WHERE (element = :element) AND (attribute = :attribute) AND (value = > :value) > LIMIT 1 > ) > UPDATE element_attribute_values SET cnt = (SELECT icnt FROM EAV) > WHERE rowid = (SELECT id FROM EAV) > ; > -- Insert Loop end. > The default may even be omitted, and may be the `LIMIT 1` too, as each triplet is unique. I tried it, with the variant (no `LIMIT 1`) and it's about the same, even a bit more slow (not much, between one and two seconds more long). While speed is not the only concern, or perhaps I should not care that much about the DB file size (the other matters). It's close to what I had at the beginning, which gave similar timings (just changed to match your recommendation about "cnt"'s name): INSERT OR IGNORE INTO element_attribute_values VALUES (:element, :attribute, :value, 0); UPDATE element_attribute_values SET cnt = cnt + 1 WHERE (element = :element) AND (attribute = :attribute) AND (value = :value); > -- If you share a list of example data to be inserted, we can find a > faster way. Try it and let us know... I don't mind, I can upload an archive somewhere. I guess you mean table's content? I'm aware this use case may be a bit pathological, as I could use Python's dictionary. However, I decided to not to, for three reasons: * I wanted to see what it's like to use an SQLite DB as an application data container (file or memory)?; * Using a persistent DB is better for incremental process (may stop and resume later)?; * Persistent data is welcome for human review (I think about defining views in sqlitebrowser to dissect the results)?; For persistence, I first tried CSV files, but this shows to be a inadequate. An SQL DB and a DB browser, looks better than CSV for this use?case. -- Yannick Duch?ne