Re: [sqlite] Uncached performance
Hello Donald, DG> 1) Might there be further performance gains by placing the blobs in a DG> separate table? DG> E.g. DG> CREATE TABLE myBlobs ( DG> idINTEGER PRIMARY KEY REFERENCES global (id), DG> value BLOB DG> ); DG> Then (if you haven't rebuilt a new db, and perhaps only once ever) run DG> VACUUM and ANALYZE. This is how I do it. Seems to give me the best performance. I sometimes put them in their own DB file and then attach this file too. I have DB's with 30+ gigs of blobs in them. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Uncached performance
Hi, Maksim, Others with more knowledge than I have given great advice regarding placing your blob as the *last* table column, and about looking at cache size and page size. I wondered about three things: 1) Might there be further performance gains by placing the blobs in a separate table? E.g. CREATE TABLE myBlobs ( idINTEGER PRIMARY KEY REFERENCES global (id), value BLOB ); Then (if you haven't rebuilt a new db, and perhaps only once ever) run VACUUM and ANALYZE. 2) Any chance your app might be faster with your blobs in files instead of in the db? Is the following info useful? https://www.sqlite.org/intern-v-extern-blob.html 3) Are you truly certain the solution against which you compared sqlite did not have the data in system cache? Don't know if this is helpful, Donald G. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Uncached performance
> SS> Is there a difference between a key being present in your database with > NULL value, and the key not being present ? > Surely there is. But, actually, those 1 queries are made only to keys > with a non-NULL value (usually less than 1 Kb, and maybe 20% are about 16 > kb). *Probably* not related to your performance concerns (since as you say, searching for the key doesn't look to be your issue), but for this case you may consider a partial index https://www.sqlite.org/partialindex.html to reduce the number of seeks a little bit by reducing the depth of the index tree. signature.asc Description: Message signed with OpenPGP using GPGMail ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Uncached performance
On Thu, 8 Jan 2015 15:04:28 +0500, ?? ??? wrote: > CREATE TABLE global ( > [key] VARCHAR (1024), > value BLOB, > level INTEGER NOT NULL, > original_name VARCHAR (1024), > id INTEGER PRIMARY KEY AUTOINCREMENT, > parent_id REFERENCES global (id) > ); The order of columns looks suboptimal. It's better to put smaller columns first, so all PK and key info is in the first page, and is not pushed to an overflow page when the values of text or blob columns have a biggish length(). Other remarks: * VARCHAR() translates to TEXT in SQLite. * parent_id missed a type definition. Typically, your table would look like: CREATE TABLE global ( id INTEGER PRIMARY KEY AUTOINCREMENT, parent_id INTEGER REFERENCES global (id), level INTEGER NOT NULL, [key] TEXT, original_name TEXT, value BLOB ); Other things to consider: PRAGMA page_size=something_larger_than_default; PRAGMA cache_size= .. ; -- Regards, Kees Nuyt ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Uncached performance
Thanks for your responces! SS> This suggests that you are filling up a cache, especially if your BLOBs are large. In terms of overall time, 1 queries in 2 seconds is 5000 queries a second. If you are using a rotating hard disk then this is not unexpected, given that your disk probably rotates at 5,400 rpm, that gives an average latency of 5.55ms /per read > On the first run, thought, 1 queries take >about 30 seconds! KM> Sounds normal as you are loading up the OS cache with data. In principle, of course, it's how caching works :) But, like I wrote, there is another engine and when using it (with non-SQLite database with the same contents), I am able to perform that 1 queries a dozen times faster, even on a 1st run. This means SQLite uses a non-optimal solution in this case; well that's quite OK, 'cause that another engine was optimized for this particular DB structure, while SQLite was not. So the question is whether things in SQLite could be sped up by, for example, tweaking some SQLite settings / fine-tuning DB structure / splitting a big database into several smaller ones / etc.? If there are no brilliant ideas, then maybe it's pure bad luck by which all the required keys are spread over the entire file in SQLite DB, while in my other DB they are condensed in a portion of the file so that reading just a portion of the file already fills the cache with all the necessary data. I asked my question in assumption (possibly wrong) that this is not the case, and that there is another reason. Of course, if those 1 queries were meant to read the entire DB, then of course any way to do that would require to ultimately scan the entire file - every possible DB engine would have to do that. But since we speak of just 10'000 keys (out of 130'000), then ideally only a portion of the file has to be read. So, if this is the reason of delays (and I am not sure it is, but isn't it probable?), - if SQLite reads more data than it would be ideally necessary, - is there a way to tweak something in such a way to reduce that excessing file accesses? KM> In order for caching to work on a USB attached device, it must not be mounted in "let the user pull it out at any time" mode (the default), or what Microsoft calls "optimize for random user yanking", but must be mounted as a connected device (called optimized for performance). Well OK if you say so, thought I believed that is only true for write-caching (which of course is dangerous if the device is removable), while read-caching does not depend on whether the drive is removable or not. Seemed logical to me, but might be wrong. SS> Is there a difference between a key being present in your database with NULL value, and the key not being present ? Surely there is. But, actually, those 1 queries are made only to keys with a non-NULL value (usually less than 1 Kb, and maybe 20% are about 16 kb). 2015-01-08 16:24 GMT+05:00 Keith Medcalf : > > >My database weights a little less than 2 Gbs and contains 130'000 keys. > >When I put it on HDD and try to make 1 queries (extracting values for > >1 different keys) with some additional processing of extracted > >values, it takes about 4 seconds on my PC on any run except the first, > >with maybe half of that time being the "additional processing" mentioned; > >even when I perform every query 3 times in a row (making it 3 > queries), > >this time does not change. On the first run, thought, 1 queries take > >about 30 seconds! > > Sounds normal as you are loading up the OS cache with data. Accessing > data cached in RAM is much faster than accessing data on spinning disk. > Have you set SQLite's page cache equal to a reasonable working set since > access to application cached pages is even faster than access to the OS > file cache -- and it avoids a jump into the kernel to do I/O which puts you > at the mercy of the dispatcher. Are you 100% I/O bound during the long > (initial) run, and 100% CPU bound during the quicker runs? > > >When I put the file on a USB flash drive, somehow I always get about 45 > >seconds total time, on either first run and subsequent runs. When the > >queries are tripled, the total time is tripled as well (even though the > >portions of the file to be read should already be cached when every > >enuqie query is repeated 2 more times). > > >This leads me to the conclusion that 1) The delays are produced by > >physical reading of the file, not by searching for a key or returning > >the value; > > >2) file on USB never gets cached (why?? due to some file-mode flags used > by > >SQLite engine? or it's just a MS Windows 8 issue?) > > This is an OS "feature". In order for caching to work on a USB attached > device, it must not be mounted in "let the user pull it out at any time" > mode (the default), or what Microsoft calls "optimize for random user > yanking", but must be mounted as a connected device (called optimized for > performance). You should also make sure you have i
Re: [sqlite] Uncached performance
>My database weights a little less than 2 Gbs and contains 130'000 keys. >When I put it on HDD and try to make 1 queries (extracting values for >1 different keys) with some additional processing of extracted >values, it takes about 4 seconds on my PC on any run except the first, >with maybe half of that time being the "additional processing" mentioned; >even when I perform every query 3 times in a row (making it 3 queries), >this time does not change. On the first run, thought, 1 queries take >about 30 seconds! Sounds normal as you are loading up the OS cache with data. Accessing data cached in RAM is much faster than accessing data on spinning disk. Have you set SQLite's page cache equal to a reasonable working set since access to application cached pages is even faster than access to the OS file cache -- and it avoids a jump into the kernel to do I/O which puts you at the mercy of the dispatcher. Are you 100% I/O bound during the long (initial) run, and 100% CPU bound during the quicker runs? >When I put the file on a USB flash drive, somehow I always get about 45 >seconds total time, on either first run and subsequent runs. When the >queries are tripled, the total time is tripled as well (even though the >portions of the file to be read should already be cached when every >enuqie query is repeated 2 more times). >This leads me to the conclusion that 1) The delays are produced by >physical reading of the file, not by searching for a key or returning >the value; >2) file on USB never gets cached (why?? due to some file-mode flags used by >SQLite engine? or it's just a MS Windows 8 issue?) This is an OS "feature". In order for caching to work on a USB attached device, it must not be mounted in "let the user pull it out at any time" mode (the default), or what Microsoft calls "optimize for random user yanking", but must be mounted as a connected device (called optimized for performance). You should also make sure you have it formatted as NTFS and not as a FAT variant of some type. Further, USB itself is very slow, both in transfer time and especially in turnaround time. Flash drives are also very slow, usually because they do not need to operate at a speed faster than can be supported by the slow USB interface. Is this a USB 1.0, 2.0, or 3.0 connection and device? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Uncached performance
On 8 Jan 2015, at 10:04am, Максим Гумеров wrote: > When I put it on HDD and try to make 1 queries (extracting values for > 1 different keys) with some additional processing of extracted values, > it takes about 4 seconds on my PC on any run except the first, with maybe > half of that time being the "additional processing" mentioned; even when I > perform every query 3 times in a row (making it 3 queries), this time > does not change. On the first run, thought, 1 queries take about 30 > seconds! This suggests that you are filling up a cache, especially if your BLOBs are large. In terms of overall time, 1 queries in 2 seconds is 5000 queries a second. If you are using a rotating hard disk then this is not unexpected, given that your disk probably rotates at 5,400 rpm, that gives an average latency of 5.55ms /per read/. > CREATE TABLE global ( > > [key] VARCHAR (1024), > value BLOB, > level INTEGER NOT NULL, > original_name VARCHAR (1024), > id INTEGER PRIMARY KEY AUTOINCREMENT, > parent_id REFERENCES global (id) > > ); SQLite will interpret "VARCHAR (1024)" as TEXT and will not truncate. I'm just warning you. > value is a BLOB (and for 50% keys is just NULL) Is there a difference between a key being present in your database with NULL value, and the key not being present ? > And there are 3 separate indices: by level, key, and parent_id. An index which is only on level would probably be pointless. I suspect you're more likely to want to have that index include the key, too. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Uncached performance
Hello! Imagine I am trying to introduce a local key-value SQLite database for caching some data retrieved from a remote server. Key is a character string, value is a BLOB (and for 50% keys is just NULL). I will specify the details later. My database weights a little less than 2 Gbs and contains 130'000 keys. When I put it on HDD and try to make 1 queries (extracting values for 1 different keys) with some additional processing of extracted values, it takes about 4 seconds on my PC on any run except the first, with maybe half of that time being the "additional processing" mentioned; even when I perform every query 3 times in a row (making it 3 queries), this time does not change. On the first run, thought, 1 queries take about 30 seconds! When I put the file on a USB flash drive, somehow I always get about 45 seconds total time, on either first run and subsequent runs. When the queries are tripled, the total time is tripled as well (even though the portions of the file to be read should already be cached when every enuqie query is repeated 2 more times). This leads me to the conclusion that 1) The delays are produced by physical reading of the file, not by searching for a key or returning the value; 2) file on USB never gets cached (why?? due to some file-mode flags used by SQLite engine? or it's just a MS Windows 8 issue?) Now, does anybody know whether it's a known issue for SQLite that the access is slow when the DB file is not yet cached by OS (I mean, it's slower that one could expect: we have been previously using a hand-made engine with which those 1 queries took just 4 seconds!)? Or maybe that's an issue specific to something, like, string-based indexes, or this large databases (approx. 2 Gb), or to something else? The details: CREATE TABLE global ( [key] VARCHAR (1024), value BLOB, level INTEGER NOT NULL, original_name VARCHAR (1024), id INTEGER PRIMARY KEY AUTOINCREMENT, parent_id REFERENCES global (id) ); And there are 3 separate indices: by level, key, and parent_id. So the actual PK is integer, but most queries are like key=something. I am opening the DB with sqlite3_open_v2(fdatabase, fdb, 1{ SQLITE_OPEN_READONLY} , nil); and querying it with sqlite3_prepare(fdb, 'select value from global where key = ifnull(?,)', -1, fReadValueQuery, ptail); + sqlite3_bind_text + sqlite3_step + sqlite3_column_blob Maksim ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users