As promised last week, I set out to do testing of memory usage on SQLite DBs.

A quick overview of the methods used follows (in short mostly as to avoid the 
boring bits):

Create three tables, one main table storing 1 Int PK, 2 columns of random string data along with 2 Int keys that link the two sub tables. Each sub table has 1 Int PK index for lookup and 1 column of random string data.

CREATE TABLE "mt_main" ("id" INTEGER PRIMARY KEY,  "fA" TEXT,  "fB" TEXT, "iS1" INT, 
"iS2" INT);
CREATE TABLE "mt_sub1" ("id" INTEGER PRIMARY KEY,  "fSA" TEXT);
CREATE TABLE "mt_sub2" ("id" INTEGER PRIMARY KEY,  "fSB" TEXT);

Set the Page size to 4096 and Cache pages to 1,600,000. This would allow a 
theoretical cache limit of something like 6.4GB

Next fill the two sub-tables each with 1-million entries of random strings (simply constructed from the hex(randomblob(1536)) notation of random byte-streams ~3KB in length (simply to stay under the 4KB Page size) Next fill the main table with a recurring set of product codes (just some 49,000 list of codes in varchar(32) format I had from another DB to facilitate some query filtering) in column fA and again random ~3KB strings as above for column fB, all of which iterated until the main db had 1 million entries too. (this means all 3 tables have integer id columns with matching spans, which allows some weird queries).

This all took some time to populate and the resulting DB weighed in at just over 12GB, a bit more than I estimated (was thinking ~3GB for the main table and ~2.8GB for the sub tables from armchair calculations). While populating the tables memory usage climbed steadily topping out at about 1.1GB over the baseline.

Now I set out to devise queries that would use/abuse maximum amount of system resources with an eye on determining the maximum amount of memory that can be put to use. I used as a first testbed (to find a common denominator) the 32-bit DLL for 3.8.5 downloadable from the sqlite site. (i.e. no fancy compiler switches)

Some queries produced insanely big output files and took very long to run, others cranked up the memory usage somewhat, but I failed at first to devise any query that would push the memory profiler or the system resource monitor over about 850MB. Having "Distinct" queries run against multiple cross joins seems to provide the best memory-abuse. An order-by clause made a difference on non-distinct queries but did not really escalate the memory of distinct queries, small or large scale. It is interesting to note that "GROUP BY" made it both faster and using less memory, the latter seems intuitive but the former not so much - maybe someone else knows the reason for this.

Some experimentation later, the following query seemed to do the trick and 
pushed the memory usage right to its limits:

SELECT DISTINCT M.fA, S3.fA, substr(S1.fSA,1,32) AS SS, S2.id
  FROM mt_main AS M
    JOIN mt_sub1 AS S1 ON S1.id=M.id
    JOIN mt_sub2 AS S2 ON S2.id=M.id+1
    JOIN mt_main AS S3 ON S3.id=S2.id
ORDER BY SS, M.fA
LIMIT 1000;

(The limit clause doesn't matter in this type of query, the entire query has to be evaluated anyway, the limit simply inhibits large amounts of output processing which might influence running time)

Baseline System-Memory usage before query is run: 2.77GB (which already 
included some usage from the test system).
Memory Ramp: +/-15MB/sec flat-lining after a couple of minutes at: 4.64GB - That's just shy of 2.0GB worth of memory escalation, at which point the out-of-memory error popped up. Interestingly when I dialed the cache pages down to 800K (as opposed to 1.6M), the query took slightly longer (indicated by ramp) and created visible steps in the memory graph, and finished execution without memory errors being reported - and the results were correct. Also to note - the error was due to a return value from either sqlite_prepareV2() or the very first sqlite_step() call (I'd get more clarity but it was not important to me at the time which returned it, just that it did) I can only assume either sqlite itself was not able to index the cache or an internal api call to getmem() failed or such. It is worth noting that the error was specifically the graceful memory error and not a random exception, segfault or "unknown".

Next up, I proceeded to use the 64-bit DLL in a 64-bit build of the testbed and redid the DB and data population (just in case the DB itself was affected by 32-bitness, though unlikely since both DBs were equal in size and read correctly by both systems afterwards). Execution speeds were very similar, as were memory ramps. The query seemlessly executed, memory grew right up to the 6.4GB cache ceiling (above the baseline) without failing, without reporting an error and returned the correct result set. Almost an anti-climax, but nothing more to report really, it just worked. I increased the cache size to ~12GB but the next run topped out at just over 7GB before spitting out results and releasing the memory, so I assume my query just did not require more than that.

As a final test, of interest, I compared query execution times with this last query run (2m 37s) vs. one where the cache was reset to the default 2000 pages (~8MB) and query execution took only 3m 18s, that;s 157s vs. 198s - just over 20% improvement using the 800% larger cache, but when I thought about it and considered my above attempts at abusing the memory, I realised the number is just arbitrary and specific to this query, any other combination of data/cache/query will probably produce wildly varying statistics - though no doubt a larger cache will always result in some bit of gain.

Special Omission: I was unable to devise a query or cache requirement larger than the machine's physical memory and thus unable to test it for function and error reporting - but I would assume when you plan to use insanely large cache values, some check to assert available system resources should be included.

Test-system specs are probably not very relevant in this type of test, but if 
anyone wondered or needed to compare, related specs are:
AMD FX8350@4.5GHz / 16GBDDR3@2000MHz(4x4) / 3TB-WDBlack / Win7Pro / SQLite3.8.5 
32/64 DLLs.
Test-bed setup: "SQLitespeed" 32/64bit builds linked to above DLLs and used in 
single-threaded mode only.


Conclusion:
Very large cache performance gain seems muted against implementation efforts, but if you do need to squeeze out every drop of performance gain - absolutely use the 64-bit versions.



Hope this is useful to someone, have a great day all!
Ryan

PS - As promised upon successful testing - will see next iteration of SQLitespeed published in 64-bit guise after standard test phases conclude (+/- 10 days).

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to