As promised last week, I set out to do testing of memory usage on SQLite DBs.
A quick overview of the methods used follows (in short mostly as to avoid the
boring bits):
Create three tables, one main table storing 1 Int PK, 2 columns of random string data along with 2 Int keys that link the two sub
tables. Each sub table has 1 Int PK index for lookup and 1 column of random string data.
CREATE TABLE "mt_main" ("id" INTEGER PRIMARY KEY, "fA" TEXT, "fB" TEXT, "iS1" INT,
"iS2" INT);
CREATE TABLE "mt_sub1" ("id" INTEGER PRIMARY KEY, "fSA" TEXT);
CREATE TABLE "mt_sub2" ("id" INTEGER PRIMARY KEY, "fSB" TEXT);
Set the Page size to 4096 and Cache pages to 1,600,000. This would allow a
theoretical cache limit of something like 6.4GB
Next fill the two sub-tables each with 1-million entries of random strings (simply constructed from the hex(randomblob(1536))
notation of random byte-streams ~3KB in length (simply to stay under the 4KB Page size)
Next fill the main table with a recurring set of product codes (just some 49,000 list of codes in varchar(32) format I had from
another DB to facilitate some query filtering) in column fA and again random ~3KB strings as above for column fB, all of which
iterated until the main db had 1 million entries too. (this means all 3 tables have integer id columns with matching spans, which
allows some weird queries).
This all took some time to populate and the resulting DB weighed in at just over 12GB, a bit more than I estimated (was thinking
~3GB for the main table and ~2.8GB for the sub tables from armchair calculations). While populating the tables memory usage climbed
steadily topping out at about 1.1GB over the baseline.
Now I set out to devise queries that would use/abuse maximum amount of system resources with an eye on determining the maximum
amount of memory that can be put to use. I used as a first testbed (to find a common denominator) the 32-bit DLL for 3.8.5
downloadable from the sqlite site. (i.e. no fancy compiler switches)
Some queries produced insanely big output files and took very long to run, others cranked up the memory usage somewhat, but I failed
at first to devise any query that would push the memory profiler or the system resource monitor over about 850MB. Having "Distinct"
queries run against multiple cross joins seems to provide the best memory-abuse. An order-by clause made a difference on
non-distinct queries but did not really escalate the memory of distinct queries, small or large scale. It is interesting to note
that "GROUP BY" made it both faster and using less memory, the latter seems intuitive but the former not so much - maybe someone
else knows the reason for this.
Some experimentation later, the following query seemed to do the trick and
pushed the memory usage right to its limits:
SELECT DISTINCT M.fA, S3.fA, substr(S1.fSA,1,32) AS SS, S2.id
FROM mt_main AS M
JOIN mt_sub1 AS S1 ON S1.id=M.id
JOIN mt_sub2 AS S2 ON S2.id=M.id+1
JOIN mt_main AS S3 ON S3.id=S2.id
ORDER BY SS, M.fA
LIMIT 1000;
(The limit clause doesn't matter in this type of query, the entire query has to be evaluated anyway, the limit simply inhibits large
amounts of output processing which might influence running time)
Baseline System-Memory usage before query is run: 2.77GB (which already
included some usage from the test system).
Memory Ramp: +/-15MB/sec flat-lining after a couple of minutes at: 4.64GB - That's just shy of 2.0GB worth of memory escalation, at
which point the out-of-memory error popped up. Interestingly when I dialed the cache pages down to 800K (as opposed to 1.6M), the
query took slightly longer (indicated by ramp) and created visible steps in the memory graph, and finished execution without memory
errors being reported - and the results were correct.
Also to note - the error was due to a return value from either sqlite_prepareV2() or the very first sqlite_step() call (I'd get more
clarity but it was not important to me at the time which returned it, just that it did) I can only assume either sqlite itself was
not able to index the cache or an internal api call to getmem() failed or such. It is worth noting that the error was specifically
the graceful memory error and not a random exception, segfault or "unknown".
Next up, I proceeded to use the 64-bit DLL in a 64-bit build of the testbed and redid the DB and data population (just in case the
DB itself was affected by 32-bitness, though unlikely since both DBs were equal in size and read correctly by both systems
afterwards). Execution speeds were very similar, as were memory ramps. The query seemlessly executed, memory grew right up to the
6.4GB cache ceiling (above the baseline) without failing, without reporting an error and returned the correct result set. Almost an
anti-climax, but nothing more to report really, it just worked. I increased the cache size to ~12GB but the next run topped out at
just over 7GB before spitting out results and releasing the memory, so I assume my query just did not require more than that.
As a final test, of interest, I compared query execution times with this last query run (2m 37s) vs. one where the cache was reset
to the default 2000 pages (~8MB) and query execution took only 3m 18s, that;s 157s vs. 198s - just over 20% improvement using the
800% larger cache, but when I thought about it and considered my above attempts at abusing the memory, I realised the number is just
arbitrary and specific to this query, any other combination of data/cache/query will probably produce wildly varying statistics -
though no doubt a larger cache will always result in some bit of gain.
Special Omission: I was unable to devise a query or cache requirement larger than the machine's physical memory and thus unable to
test it for function and error reporting - but I would assume when you plan to use insanely large cache values, some check to assert
available system resources should be included.
Test-system specs are probably not very relevant in this type of test, but if
anyone wondered or needed to compare, related specs are:
AMD FX8350@4.5GHz / 16GBDDR3@2000MHz(4x4) / 3TB-WDBlack / Win7Pro / SQLite3.8.5
32/64 DLLs.
Test-bed setup: "SQLitespeed" 32/64bit builds linked to above DLLs and used in
single-threaded mode only.
Conclusion:
Very large cache performance gain seems muted against implementation efforts, but if you do need to squeeze out every drop of
performance gain - absolutely use the 64-bit versions.
Hope this is useful to someone, have a great day all!
Ryan
PS - As promised upon successful testing - will see next iteration of SQLitespeed published in 64-bit guise after standard test
phases conclude (+/- 10 days).
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users