Re: [sqlite] The problem with index
> Another question: Is it correct that virtual tables can be created using > Perl but not Tcl? I don't have a current need (with the possible > exception of FTS1/2, which are already accessible from Tcl), but the > situation seemed curious. Wondering whether there was an undocumented > capability (or one I just missed in the docs). You're correct. Using the standard Tcl interface, there is no way to create a new virtual table type. But you can instantiate and access virtual tables for which the implementation is written in C (i.e. fts1/2). Dan. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Alternative index methods (hash key)
Andrew Finkenstadt wrote: On 6/20/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: "Scott Hess" <[EMAIL PROTECTED]> wrote: > On 6/20/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote: > > How difficult do you think it would be to support an alternative method of > > indexing within SQLite specifically to support O(1) retrieval of the rowid > > for a table, and then potentially O(1) retrieval of the row data for a > > table, when in-order retrieval is undesired? It's a major rewrite. SQLite assumes btree access in a number of important places. Thanks for that info. I will save myself the heartache then > > > My database design is highly specialized to never, ever retrieve data except > > by whole-key equality, and would benefit greatly from this optimization. > > I'm willing to code it up myself, although I'm not quite set up to do SQL > > syntax parser changes, preferring to use the amalgamation. > > Would you be able to live with converting your primary key to a 64-bit > hash value? In development of fts2, I found that direct rowid -> row > access is pretty fast. It's still technically O(logN), but the log's > base is absurdly large, so it comes reasonably close to O(1). I concur with Scott's assessment. The typical fanout in the table btrees used by SQLite is around 100 (assuming a 1K page) so you can get to one of 10 million pages in only 3 page reads. And the first page is almost guaranteed to be in cache, so really only 2 page reads. My default page size is 32k, with the number of rows entries at about a hundred thousand... assuming base-100 for 1k, that would be base 3000 or so for 32k, which would mean 1 I/O if uncached, followed by a search for the actual row containing the data. (I use an indirect table in order to keep my small data separate from my blobs, perhaps this was unnecessary.) create table index_data ( id integer not null primary key autoincrement, rod blob not null unique /* alternate key of 20-bytes fixed */, ... ); create table file_data ( id integer not null primary key, compression_method integer not null, file_size integer not null, file_bytes blob not null); The retrieval of looking up index_data.id via rod equality takes longer than I would like. file_data retrieval has its own issues which I am still tuning. --a I ran some trials on b-tree versus hashing on another project. It convinced me that the hashing access was only beneficial in a very low memory environment such as a database designed 40 years ago. With the upper levels of the B-tree always in cache and no clashes it was the clearly the right approach. What did give improved speed was larger node sizes and a binary search on keys within the node. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Alternative index methods (hash key)
On 6/20/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: "Scott Hess" <[EMAIL PROTECTED]> wrote: > On 6/20/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote: > > How difficult do you think it would be to support an alternative method of > > indexing within SQLite specifically to support O(1) retrieval of the rowid > > for a table, and then potentially O(1) retrieval of the row data for a > > table, when in-order retrieval is undesired? It's a major rewrite. SQLite assumes btree access in a number of important places. Thanks for that info. I will save myself the heartache then > > > My database design is highly specialized to never, ever retrieve data except > > by whole-key equality, and would benefit greatly from this optimization. > > I'm willing to code it up myself, although I'm not quite set up to do SQL > > syntax parser changes, preferring to use the amalgamation. > > Would you be able to live with converting your primary key to a 64-bit > hash value? In development of fts2, I found that direct rowid -> row > access is pretty fast. It's still technically O(logN), but the log's > base is absurdly large, so it comes reasonably close to O(1). I concur with Scott's assessment. The typical fanout in the table btrees used by SQLite is around 100 (assuming a 1K page) so you can get to one of 10 million pages in only 3 page reads. And the first page is almost guaranteed to be in cache, so really only 2 page reads. My default page size is 32k, with the number of rows entries at about a hundred thousand... assuming base-100 for 1k, that would be base 3000 or so for 32k, which would mean 1 I/O if uncached, followed by a search for the actual row containing the data. (I use an indirect table in order to keep my small data separate from my blobs, perhaps this was unnecessary.) create table index_data ( id integer not null primary key autoincrement, rod blob not null unique /* alternate key of 20-bytes fixed */, ... ); create table file_data ( id integer not null primary key, compression_method integer not null, file_size integer not null, file_bytes blob not null); The retrieval of looking up index_data.id via rod equality takes longer than I would like. file_data retrieval has its own issues which I am still tuning. --a
Re: [sqlite] Alternative index methods (hash key)
"Scott Hess" <[EMAIL PROTECTED]> wrote: > On 6/20/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote: > > How difficult do you think it would be to support an alternative method of > > indexing within SQLite specifically to support O(1) retrieval of the rowid > > for a table, and then potentially O(1) retrieval of the row data for a > > table, when in-order retrieval is undesired? It's a major rewrite. SQLite assumes btree access in a number of important places. > > > > My database design is highly specialized to never, ever retrieve data except > > by whole-key equality, and would benefit greatly from this optimization. > > I'm willing to code it up myself, although I'm not quite set up to do SQL > > syntax parser changes, preferring to use the amalgamation. > > Would you be able to live with converting your primary key to a 64-bit > hash value? In development of fts2, I found that direct rowid -> row > access is pretty fast. It's still technically O(logN), but the log's > base is absurdly large, so it comes reasonably close to O(1). I concur with Scott's assessment. The typical fanout in the table btrees used by SQLite is around 100 (assuming a 1K page) so you can get to one of 10 million pages in only 3 page reads. And the first page is almost guaranteed to be in cache, so really only 2 page reads. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Data structure for versioned data
We perform some versioning by holding column material in XML and using RCS to maintain reverse deltas and versions. Samuel R. Neff wrote: Not specific to SQLite, but we're working on an app that needs to keep versioned data (i.e., the current values plus all previous values). The versioning is integral to the app so it's more than just an audit trail or history. Can anyone share experiences with the database structure for this type of requirement or point me to helpful resources? Thanks, Sam --- We're Hiring! Seeking a passionate developer to join our team building products. Position is in the Washington D.C. metro area. If interested contact [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Cache invalidation after insert statements.
Joe Wilson wrote: A non-volatile RAM drive is the way to go if you got the bucks. 16 Processor machine ~40Gb ram EMC storage suggests he does. ;) I worked on a project where the end client had Sun kit of this spec, and they claimed the systems cost 7 figures GBP back in 2005. Martin - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Alternative index methods (hash key)
On 6/20/07, Scott Hess <[EMAIL PROTECTED]> wrote: On 6/20/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote: > How difficult do you think it would be to support an alternative method of > indexing within SQLite specifically to support O(1) retrieval of the rowid > for a table, and then potentially O(1) retrieval of the row data for a > table, when in-order retrieval is undesired? > > My database design is highly specialized to never, ever retrieve data except > by whole-key equality, and would benefit greatly from this optimization. > I'm willing to code it up myself, although I'm not quite set up to do SQL > syntax parser changes, preferring to use the amalgamation. Would you be able to live with converting your primary key to a 64-bit hash value? Yes. I could... or even smaller, if I can do collision exclusion by a select against the "real primary key". In development of fts2, I found that direct rowid -> row access is pretty fast. It's still technically O(logN), but the log's base is absurdly large, so it comes reasonably close to O(1). Then you could do something like: SELECT x FROM t WHERE rowid = hashfn(?0) AND real_primary = ?0 hashfn() would take a string and return a 64-bit rowid. The test against real_primary is to guard against hash collisions, which you may-or-may-not care to bother with (that said, it should be essentially free, at least in disk I/O terms). It's funny you should say that, as my current "best" (fastest) 64-bit hash of a string involves concatenating Ye Olde Standard CRC-32 and Paul Hsieh's SuperFastHash function, and I've been considering using that internally as an in-memory surrogate of what would otherwise be a long path/filename. BTW, this may be an example of a case where *dbm or bdb would be justified. It would also be interesting for someone to build a virtual table interface implementing this idiom. I gave those options thought as well, but ended up liking the vdbe/SQL interface for development convenience, and the fact that "out of the box" all of the cross-thread cross-process consistency issues were dealt with. I'm a 17-year veteran of Oracle databases, and (perl)dbm databases before that, so I really liked the lesser "mental surface area" needed to program real programs with these options. --a
Re: [sqlite] Alternative index methods (hash key)
On 6/20/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote: How difficult do you think it would be to support an alternative method of indexing within SQLite specifically to support O(1) retrieval of the rowid for a table, and then potentially O(1) retrieval of the row data for a table, when in-order retrieval is undesired? My database design is highly specialized to never, ever retrieve data except by whole-key equality, and would benefit greatly from this optimization. I'm willing to code it up myself, although I'm not quite set up to do SQL syntax parser changes, preferring to use the amalgamation. Would you be able to live with converting your primary key to a 64-bit hash value? In development of fts2, I found that direct rowid -> row access is pretty fast. It's still technically O(logN), but the log's base is absurdly large, so it comes reasonably close to O(1). Then you could do something like: SELECT x FROM t WHERE rowid = hashfn(?0) AND real_primary = ?0 hashfn() would take a string and return a 64-bit rowid. The test against real_primary is to guard against hash collisions, which you may-or-may-not care to bother with (that said, it should be essentially free, at least in disk I/O terms). Since SQLite does varint-encoding, you'd get somewhat better performance using a 32-bit or smaller key (so the interior nodes pack more densely). Of course, that raises your chances of collision. Not certain how to deal with collision. You could just have a second table with real_primary as the actual primary key, and put a flag in the main table. 99.999% of the time, the query will draw from the main table, and the few remaining hits should be fast because the secondary table should be very small. BTW, this may be an example of a case where *dbm or bdb would be justified. It would also be interesting for someone to build a virtual table interface implementing this idiom. -scott - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Alternative index methods (hash key)
How difficult do you think it would be to support an alternative method of indexing within SQLite specifically to support O(1) retrieval of the rowid for a table, and then potentially O(1) retrieval of the row data for a table, when in-order retrieval is undesired? My database design is highly specialized to never, ever retrieve data except by whole-key equality, and would benefit greatly from this optimization. I'm willing to code it up myself, although I'm not quite set up to do SQL syntax parser changes, preferring to use the amalgamation. --andy
[sqlite] Data structure for versioned data
Not specific to SQLite, but we're working on an app that needs to keep versioned data (i.e., the current values plus all previous values). The versioning is integral to the app so it's more than just an audit trail or history. Can anyone share experiences with the database structure for this type of requirement or point me to helpful resources? Thanks, Sam --- We're Hiring! Seeking a passionate developer to join our team building products. Position is in the Washington D.C. metro area. If interested contact [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Cache invalidation after insert statements.
I assumed he meant a volatile system RAM "drive", as opposed to a non-volatile external RAM drive by his wording. But no point speculating what he meant. A non-volatile RAM drive is the way to go if you got the bucks. --- Ken <[EMAIL PROTECTED]> wrote: > I think the performance of the ram drive (i'm guessing scsi based) will not > be as good as > physical system ram. But certainly better than the I/o speed of disk. > pompomJuice <[EMAIL PROTECTED]> wrote: > Now I need to get the system administrators to make me that ram drive. Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when. http://tv.yahoo.com/collections/222 - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Cache invalidation after insert statements.
> mmm, I was thinking that I decrease the cache_size to like 20 when using the > ram drive since I dont need caching anymore then. > > I have inserted more timeing code and I am now convinced I have an IO > problem. When I coax a OS to fully cache my (smaller 40 rows) db file ( > which takes like 2-3 runs ) sqlite can do lookups at about 5 per second. > With the file uncached this value falls as low as 500. > > Now I need to get the system administrators to make me that ram drive. The obvious problem with a RAM drive is that the data is not persisted, so if you lose power... you get the idea. I may be drowned as a witch for suggesting this, but since you have ample RAM and CPUs and you want the file to be in OS cache all day for quick ad-hoc lookups, just put the following in cron to be run every few minutes: cat your.db > /dev/null If the file is already in OS cache, this is a very quick operation. There may be OS-specific ways to keep the image of the file in RAM without the cron/cat hack. Some modern smart OS pagers may not keep the file cached in memory if it suspects it will not be used again, so see what cat alternative works on your OS. Whether you're using the RAM drive approach or the keep-the-db-in-OS-cache approach, do keep the cache_size low for all your sqlite processes, as you mention. Having large caches for multiple processes is a waste of system RAM, due to duplication. Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list=396545469 - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Cache invalidation after insert statements.
I understand where you are heading, by putting the entire db on a ram drive. I think the performance of the ram drive (i'm guessing scsi based) will not be as good as physical system ram. But certainly better than the I/o speed of disk. Let us know how it turns out. pompomJuice <[EMAIL PROTECTED]> wrote: mmm, I was thinking that I decrease the cache_size to like 20 when using the ram drive since I dont need caching anymore then. I have inserted more timeing code and I am now convinced I have an IO problem. When I coax a OS to fully cache my (smaller 40 rows) db file ( which takes like 2-3 runs ) sqlite can do lookups at about 5 per second. With the file uncached this value falls as low as 500. Now I need to get the system administrators to make me that ram drive. ken-33 wrote: > > The Ram drive is unlikely to work. It will still have the same cache > invalidation. > You need to get things logically working first. Ram drives are great to > help improve performance where seeks are and rotational access > requirements dictate. > > > pompomJuice wrote: > > AArrgh. > > That is the one thing that I wont be able to do. It would require a > complete > system redesign. I can adapt my program easy but now to get it to work in > the greater scheme of things would be a nightmare. > > My current efforts are being focussed into making a ram drive and putting > the file in there. I hope it works. > > > ken-33 wrote: >> >> Can you consolidate your multiple binaries to a Single Binary? >> Then Use threading and sqlite's shared caching to perform the Lookups and >> updates. >> That way the cache wouldn't get invalidated??? >> >> Someone else here correct me if this is a bad idea!!! >> >> >> >> pompomJuice wrote: >> I suspected something like this, as it makes sense. >> >> I have multiple binaries/different connections ( and I cannot make them >> share a connection ) using this one lookup table and depending on which >> connection checks first, it will update the table. >> >> My question is then, if any one connection makes any change to the >> database >> ( not neccesarily to the huge lookup table ) will all the other >> connections >> invalidate their entire cache? Or is it per table/btree that the cache is >> dropped? >> >> Thanks for that reponse. Already I can move ahead now with better >> knowlege >> of how the caching works. >> >> Regards. >> >> >> >> Dan Kennedy-4 wrote: >>> >>> On Tue, 2007-06-19 at 01:06 -0700, pompomJuice wrote: Hello there. I need some insight into how SQLite's caching works. I have a database that is quite large (5Gb) sitting on a production server that's IO is severely taxed. This causes my SQLite db to perform very poorly. Most of the time my application just sits there and uses about 10% of a CPU where it would use a 100% on test systems with idle IO. Effectively what the application does is constantly doing lookups as fast as it can. To counteract this I increased the page size to 8192 (Unix server with advfs having 8K block sizes) and increased the MAX_PAGES value in sqliteInt.h to 512000. This worked. My application starts at low memory usage and as it gradually gains more memory. As it gains more memory it uses more CPU and reaches a point where it finally uses 100% CPU and 5Gb of ram. Every now and then the lookup table is udpated. As soon as the application does this however the performance goes back to a crawl and slowly builds up again as described in the previous paragraph. The memory usage stays at 5Gb. All that I can think of is that the update invalidates the cache. >>> >>> Probably right. >>> The update is not very big, say 20 rows in a table that has about 45 million rows. What exactly is happening here? >>> >>> Are you using 3.3.17? And is it an external process (or at least a >>> different connection doing) doing the update? >>> >>> If so, the update is modifying the pager change-counter, invalidating >>> the pager cache held by the lookup application. The lookup app has >>> to start loading pages from the disk again, instead of just reading >>> it's cache. >>> >>> The only way around this performance hit is to do the UPDATE through >>> the lookup app, using the same database connection. >>> >>> Dan. >>> Regards. >>> >>> >>> - >>> To unsubscribe, send email to [EMAIL PROTECTED] >>> - >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Cache-invalidation-after-insert-statements.-tf3944908.html#a11192121 >> Sent from the SQLite mailing list archive at Nabble.com. >> >> >>
Re: [sqlite] The problem with index
[EMAIL PROTECTED] wrote: This gives a different answer because the EXCEPT operator makes the rows of the result set unique. So the result set will be: 1 2 111 where as formerly it was 1 1 2 2 111 111 Thank you for the clarification. That is not the behavior I would have expected. In my data everything is distinct anyway, but for others the difference would be important. BTW, http://sqlite.org/lang_select.html says "EXCEPT takes the result of left SELECT after removing the results of the right SELECT." Some further explanation there might help keep others from making the same mistake I did. Another question: Is it correct that virtual tables can be created using Perl but not Tcl? I don't have a current need (with the possible exception of FTS1/2, which are already accessible from Tcl), but the situation seemed curious. Wondering whether there was an undocumented capability (or one I just missed in the docs). Thanks again for a great product with incredible support. Gerry - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Cache invalidation after insert statements.
mmm, I was thinking that I decrease the cache_size to like 20 when using the ram drive since I dont need caching anymore then. I have inserted more timeing code and I am now convinced I have an IO problem. When I coax a OS to fully cache my (smaller 40 rows) db file ( which takes like 2-3 runs ) sqlite can do lookups at about 5 per second. With the file uncached this value falls as low as 500. Now I need to get the system administrators to make me that ram drive. ken-33 wrote: > > The Ram drive is unlikely to work. It will still have the same cache > invalidation. > You need to get things logically working first. Ram drives are great to > help improve performance where seeks are and rotational access > requirements dictate. > > > pompomJuice <[EMAIL PROTECTED]> wrote: > > AArrgh. > > That is the one thing that I wont be able to do. It would require a > complete > system redesign. I can adapt my program easy but now to get it to work in > the greater scheme of things would be a nightmare. > > My current efforts are being focussed into making a ram drive and putting > the file in there. I hope it works. > > > ken-33 wrote: >> >> Can you consolidate your multiple binaries to a Single Binary? >> Then Use threading and sqlite's shared caching to perform the Lookups and >> updates. >> That way the cache wouldn't get invalidated??? >> >> Someone else here correct me if this is a bad idea!!! >> >> >> >> pompomJuice wrote: >> I suspected something like this, as it makes sense. >> >> I have multiple binaries/different connections ( and I cannot make them >> share a connection ) using this one lookup table and depending on which >> connection checks first, it will update the table. >> >> My question is then, if any one connection makes any change to the >> database >> ( not neccesarily to the huge lookup table ) will all the other >> connections >> invalidate their entire cache? Or is it per table/btree that the cache is >> dropped? >> >> Thanks for that reponse. Already I can move ahead now with better >> knowlege >> of how the caching works. >> >> Regards. >> >> >> >> Dan Kennedy-4 wrote: >>> >>> On Tue, 2007-06-19 at 01:06 -0700, pompomJuice wrote: Hello there. I need some insight into how SQLite's caching works. I have a database that is quite large (5Gb) sitting on a production server that's IO is severely taxed. This causes my SQLite db to perform very poorly. Most of the time my application just sits there and uses about 10% of a CPU where it would use a 100% on test systems with idle IO. Effectively what the application does is constantly doing lookups as fast as it can. To counteract this I increased the page size to 8192 (Unix server with advfs having 8K block sizes) and increased the MAX_PAGES value in sqliteInt.h to 512000. This worked. My application starts at low memory usage and as it gradually gains more memory. As it gains more memory it uses more CPU and reaches a point where it finally uses 100% CPU and 5Gb of ram. Every now and then the lookup table is udpated. As soon as the application does this however the performance goes back to a crawl and slowly builds up again as described in the previous paragraph. The memory usage stays at 5Gb. All that I can think of is that the update invalidates the cache. >>> >>> Probably right. >>> The update is not very big, say 20 rows in a table that has about 45 million rows. What exactly is happening here? >>> >>> Are you using 3.3.17? And is it an external process (or at least a >>> different connection doing) doing the update? >>> >>> If so, the update is modifying the pager change-counter, invalidating >>> the pager cache held by the lookup application. The lookup app has >>> to start loading pages from the disk again, instead of just reading >>> it's cache. >>> >>> The only way around this performance hit is to do the UPDATE through >>> the lookup app, using the same database connection. >>> >>> Dan. >>> Regards. >>> >>> >>> - >>> To unsubscribe, send email to [EMAIL PROTECTED] >>> - >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Cache-invalidation-after-insert-statements.-tf3944908.html#a11192121 >> Sent from the SQLite mailing list archive at Nabble.com. >> >> >> - >> To unsubscribe, send email to [EMAIL PROTECTED] >> - >> >> >> >> > > -- > View this message in context: >
Re: Re[2]: [sqlite] The problem with index
Joe Wilson <[EMAIL PROTECTED]> wrote: --- "Sergey M. Brytsko" wrote: > But what about the following values: > > 1 1 2 2 100 100 100 100 100 100 100 100 100 100 100 111 111 I guess SQLite's query optimizer could take the cardinality of the column into account via its ANALYZE statistics for inequality comparisons. It's just the small matter of someone writing the code to do it. ** The result is a single row of the sqlite_stat1 table. The first ** two columns are the names of the table and index. The third column ** is a string composed of a list of integer statistics about the ** index. The first integer in the list is the total number of entires ** in the index. There is one additional integer in the list for each ** column of the table. This additional integer is a guess of how many ** rows of the table the index will select. If D is the count of distinct ** values and K is the total number of rows, then the integer is computed ** as: ** ** I = (K+D-1)/D ** ** If K==0 then no entry is made into the sqlite_stat1 table. ** If K>0 then it is always the case the D>0 so division by zero ** is never possible. Joe, you can go one further... By adding an additional table (or another set of rows) to store column level stats based upon an individual columns values. (Oracle calls these histograms). In that way the where clause values can be used to gain even better index selection. Table A has 1 1 100 100 100 100 100 100 100 100 100 101 102 select * from a where C=100... -> Ignor index, full scan select * from a where C= 101 -> Use index.
Re: [sqlite] Cache invalidation after insert statements.
The Ram drive is unlikely to work. It will still have the same cache invalidation. You need to get things logically working first. Ram drives are great to help improve performance where seeks are and rotational access requirements dictate. pompomJuice <[EMAIL PROTECTED]> wrote: AArrgh. That is the one thing that I wont be able to do. It would require a complete system redesign. I can adapt my program easy but now to get it to work in the greater scheme of things would be a nightmare. My current efforts are being focussed into making a ram drive and putting the file in there. I hope it works. ken-33 wrote: > > Can you consolidate your multiple binaries to a Single Binary? > Then Use threading and sqlite's shared caching to perform the Lookups and > updates. > That way the cache wouldn't get invalidated??? > > Someone else here correct me if this is a bad idea!!! > > > > pompomJuice wrote: > I suspected something like this, as it makes sense. > > I have multiple binaries/different connections ( and I cannot make them > share a connection ) using this one lookup table and depending on which > connection checks first, it will update the table. > > My question is then, if any one connection makes any change to the > database > ( not neccesarily to the huge lookup table ) will all the other > connections > invalidate their entire cache? Or is it per table/btree that the cache is > dropped? > > Thanks for that reponse. Already I can move ahead now with better knowlege > of how the caching works. > > Regards. > > > > Dan Kennedy-4 wrote: >> >> On Tue, 2007-06-19 at 01:06 -0700, pompomJuice wrote: >>> Hello there. >>> >>> I need some insight into how SQLite's caching works. I have a database >>> that >>> is quite large (5Gb) sitting on a production server that's IO is >>> severely >>> taxed. This causes my SQLite db to perform very poorly. Most of the time >>> my >>> application just sits there and uses about 10% of a CPU where it would >>> use a >>> 100% on test systems with idle IO. Effectively what the application does >>> is >>> constantly doing lookups as fast as it can. >>> >>> To counteract this I increased the page size to 8192 (Unix server with >>> advfs >>> having 8K block sizes) and increased the MAX_PAGES value in sqliteInt.h >>> to >>> 512000. This worked. My application starts at low memory usage and as it >>> gradually gains more memory. As it gains more memory it uses more CPU >>> and >>> reaches a point where it finally uses 100% CPU and 5Gb of ram. >>> >>> Every now and then the lookup table is udpated. As soon as the >>> application >>> does this however the performance goes back to a crawl and slowly builds >>> up >>> again as described in the previous paragraph. The memory usage stays at >>> 5Gb. >>> All that I can think of is that the update invalidates the cache. >> >> Probably right. >> >>> The update >>> is not very big, say 20 rows in a table that has about 45 million >>> rows. >>> >>> What exactly is happening here? >> >> Are you using 3.3.17? And is it an external process (or at least a >> different connection doing) doing the update? >> >> If so, the update is modifying the pager change-counter, invalidating >> the pager cache held by the lookup application. The lookup app has >> to start loading pages from the disk again, instead of just reading >> it's cache. >> >> The only way around this performance hit is to do the UPDATE through >> the lookup app, using the same database connection. >> >> Dan. >> >>> Regards. >> >> >> - >> To unsubscribe, send email to [EMAIL PROTECTED] >> - >> >> >> > > -- > View this message in context: > http://www.nabble.com/Cache-invalidation-after-insert-statements.-tf3944908.html#a11192121 > Sent from the SQLite mailing list archive at Nabble.com. > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > > > > -- View this message in context: http://www.nabble.com/Cache-invalidation-after-insert-statements.-tf3944908.html#a11208520 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] The problem with index
Gerry Snyder <[EMAIL PROTECTED]> wrote: > Igor Tandetnik wrote: > > Sergey M. Brytsko <[EMAIL PROTECTED]> > > wrote: > >> But what about the following values: > >> > >> 1 1 2 2 100 100 100 100 100 100 100 100 100 100 100 111 111 > > > > If you susptect your data is likely to look like this, you may want to > > rewrite your query as > > > > SELECT BBB FROM XXX WHERE BBB < 100 > > union all > > SELECT BBB FROM XXX WHERE BBB > 100; > > > > > Wouldn't this work as well?: > > SELECT BBB FROM XXX > except > SELECT BBB FROM XXX WHERE BBB = 100; > This gives a different answer because the EXCEPT operator makes the rows of the result set unique. So the result set will be: 1 2 111 where as formerly it was 1 1 2 2 111 111 -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Re[2]: The problem with index
Igor Tandetnik wrote: Sergey M. Brytsko <[EMAIL PROTECTED]> wrote: But what about the following values: 1 1 2 2 100 100 100 100 100 100 100 100 100 100 100 111 111 If you susptect your data is likely to look like this, you may want to rewrite your query as SELECT BBB FROM XXX WHERE BBB < 100 union all SELECT BBB FROM XXX WHERE BBB > 100; Igor Tandetnik - To unsubscribe, send email to [EMAIL PROTECTED] - Wouldn't this work as well?: SELECT BBB FROM XXX except SELECT BBB FROM XXX WHERE BBB = 100; Gerry - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Step Query
On 6/20/07, Dennis Cote <[EMAIL PROTECTED]> wrote: Andrew Finkenstadt wrote: > > > I ended up writing a (multi-thread aware) C++ framework to keep me > out of trouble. In the SQLite namespace I have Is there any chance that your framework is freely licensed open source so others could use it as well? It sounds interesting, and I would like to take a look at it if that is possible. Is there a link to the source? I still need to check on approval for its distribution. (I have approval for our local changes to sqlite.h/c, already.) I hope to be able to offer it under the single public domain license similar to drh's sqlite license. I may end up having to GPL or LGPL it. Please remind me in about 10 days if there hasn't been any word here. --a
Re: [sqlite] How do I close the command line window
[EMAIL PROTECTED] wrote: I'm doing a bulk insert by calling sqlite3 -init BulkinsertItems.sql mydatabasefile with the BulkinsertItems.sql file containing: .separator \t .import BulkItems.txt items .quit The command window opens and the import works, but then it does not close again. How can I have this clsoe automatically? Jan, All you need to do is move the .quit command from the init file to the command line like this: sqlite3 -init BulkinsertItems.sql mydatabasefile .quit with the BulkinsertItems.sql file containing: .separator \t .import BulkItems.txt items Or use a .read command on the command line instead of the -init option like this: sqlite3 mydatabasefile ".read BulkinsertItems.sql" with the BulkinsertItems.sql file containing: .separator \t .import BulkItems.txt items .quit HTH Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Step Query
Andrew Finkenstadt wrote: I ended up writing a (multi-thread aware) C++ framework to keep me out of trouble. In the SQLite namespace I have class exception; class database; class connection; class statement; class execution; class query_result; where the ownership model is well-defined, and the data-use paths are protected from coding mistakes at compile time. There can be only one execution attached to a statement at any one time, and the query result is owned by the execution. When the execution terminates (goes out of scope), the statement is reset automatically. Since I am a strong believer in "prepare once, use many" for performance reasons, I ended up having to write my own framework to keep me out of trouble, and to reduce the amount of "busy work" around the "C" interface to sqlite. Andrew, Is there any chance that your framework is freely licensed open source so others could use it as well? It sounds interesting, and I would like to take a look at it if that is possible. Is there a link to the source? Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: Re[2]: [sqlite] The problem with index
--- "Sergey M. Brytsko" wrote: > But what about the following values: > > 1 1 2 2 100 100 100 100 100 100 100 100 100 100 100 111 111 I guess SQLite's query optimizer could take the cardinality of the column into account via its ANALYZE statistics for inequality comparisons. It's just the small matter of someone writing the code to do it. ** The result is a single row of the sqlite_stat1 table. The first ** two columns are the names of the table and index. The third column ** is a string composed of a list of integer statistics about the ** index. The first integer in the list is the total number of entires ** in the index. There is one additional integer in the list for each ** column of the table. This additional integer is a guess of how many ** rows of the table the index will select. If D is the count of distinct ** values and K is the total number of rows, then the integer is computed ** as: ** **I = (K+D-1)/D ** ** If K==0 then no entry is made into the sqlite_stat1 table. ** If K>0 then it is always the case the D>0 so division by zero ** is never possible. Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. http://smallbusiness.yahoo.com/webhosting - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Recommend server for Windows?
Gilles Ganault wrote: At 20:47 19/06/2007 -0500, John Stanton wrote: Such a server can be made simpler then mine by making it single threaded. Is it publicly available from http://www.viacognis.com? Thanks G. No, but I can give you some code which might help your project. The components which service SQL requests as www-url-encode messages and return XML or JSON encapsulated DB rows might suit your purpose. - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Re: Re[2]: The problem with index
Sergey M. Brytsko <[EMAIL PROTECTED]> wrote: But what about the following values: 1 1 2 2 100 100 100 100 100 100 100 100 100 100 100 111 111 If you susptect your data is likely to look like this, you may want to rewrite your query as SELECT BBB FROM XXX WHERE BBB < 100 union all SELECT BBB FROM XXX WHERE BBB > 100; Igor Tandetnik - To unsubscribe, send email to [EMAIL PROTECTED] -
Re[2]: [sqlite] The problem with index
But what about the following values: 1 1 2 2 100 100 100 100 100 100 100 100 100 100 100 111 111 Thanks. -- Sergey Wednesday, June 20, 2007, 3:21:25 PM, wrote: JW> Say you have the following values for BBB: JW> 1 2 3 10 20 30 50 70 80 80 90 100 101 110 110 120 120 150 190 200 JW> How is an index going to help you with BBB <> 100 ? JW> You have to do a full table scan whether or not the column is indexed. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] The problem with index
--- "Sergey M. Brytsko" wrote: > The problem is the index is NOT used for query: > SELECT BBB FROM XXX WHERE BBB <> 100; > > but in case of query > SELECT BBB FROM XXX WHERE BBB > 100; > all is ok ... > The indices are very important for me, how should I build these queries? Say you have the following values for BBB: 1 2 3 10 20 30 50 70 80 80 90 100 101 110 110 120 120 150 190 200 How is an index going to help you with BBB <> 100 ? You have to do a full table scan whether or not the column is indexed. Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list=396545469 - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Re: The problem with index
Sergey M. Brytsko <[EMAIL PROTECTED]> wrote: I have the following DB schema: CREATE TABLE XXX(AAA TEXT, BBB INTEGER); CREATE INDEX AAA_IDX ON XXX(AAA); CREATE INDEX BBB_IDX ON XXX(BBB); SQLite 3.4.0 The problem is the index is NOT used for query: SELECT BBB FROM XXX WHERE BBB <> 100; An index is not helpful for this query. but in case of query SELECT BBB FROM XXX WHERE BBB > 100; all is ok Note that, if most records have BBB>100, using the index might actually be slower than a full scan of the table. The same problem: SELECT AAA FROM XXX WHERE AAA IN ('QWERTY'); // index used SELECT AAA FROM XXX WHERE AAA NOT IN ('QWERTY'); // index not used Same reason - an index cannot speed up inequality test. Igor Tandetnik - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Converting from 2.8.x to 3.x?
Gilles Ganault uttered: Hello As we move from a 2.8.x file-based solution to a 3.x c/s solution, we'll have to convert databases from one format to the other. What's the easiest way to do this? sqlite olddb .dump | sqlite3 newdb Thank you G. Christian -- /"\ \ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL X - AGAINST MS ATTACHMENTS / \ - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Converting from 2.8.x to 3.x?
Hello As we move from a 2.8.x file-based solution to a 3.x c/s solution, we'll have to convert databases from one format to the other. What's the easiest way to do this? Thank you G. - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] More SQLite Misuse, sorted i think
> Hope that is more clear. Perfectly. I get it now. As you say in the other post, every sqlite call needs to be inside the critical section, including sqlite3_finalize(). Dan. - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] More SQLite Misuse, sorted i think
In my sample that I supplied I illustrated how two threads does the following: Lock (Global Critical Section) Queryobject.Prepare (Sqlite3_prepare) QueryObject.Step (Sqlite3_step) QueryObject.Reset (Sqlite3_reset) Unlock QueryObject.Free; (Sqlite3_reset (the missing piece of the puzzle)) In the above example the call to these 3 functions are locked in a global critical section, so none of them can be executed at the same time, however: The last line of code I did not see I had an object that was destroyed that called sqlite3_reset. This is where the problem lied, the destructor of the object did something as follows: Destructor If FHandle <> nil then begin Sqlite3_reset; Sqlite3_finalize; FHandle := nil; end I understand that the call to sqlite3_reset is a bit pointless in the destructor here as Sqlite3_finalize takes care of all that, but it is just interesting to note that by the removal of sqlite3_reset OR by locking the call to sqlite3_reset it seemed to work, however locking the call to sqlite3_finalize did not seem to be necessary and did not produce the SQLITE_MISUSE error. Hope that is more clear. -Original Message- I'm not sure I completely understand, but anyway... :) - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] More SQLite Misuse, sorted i think
Sorry if I created any confusion there were some code that seemed to have called Sqlite_reset simultaneously from more than one thread, even though the statements were unique for each thread the call to the library was not locked. I know assumptions are bad but I thought that reset on a unique statement should not have to be locked and serialized, but now I think it might, so now every single call to the library gets locked in a critical section and it seems to work. However finalize worked because it seems that finalize can be called without synchronizing. -Original Message- From: Andre du Plessis [mailto:[EMAIL PROTECTED] Sent: 19 June 2007 07:21 PM To: sqlite-users@sqlite.org Subject: [sqlite] More SQLite Misuse DLL version Sqlite3.3.17 The os is windows After the last query of sqlite3_step I decided to so some more tests, with threads, if synchronized properly, it seems that you can use more than one thread without any problem as long as Sqlite3_finalize is called is this correct? Please note that this is a very simple query being executed : "select * from threads where id = 1" Imagine in the following scenarios both threads are executing simultaneously and will lock on the global critical section (so they are synchronized) Using the same DB handle. Scenario 1 THREAD1 THREAD2 LockGlobalCriticalSection LockGlobalCriticalSection Sqlite3_prepare Sqlite3_prepare Sqlite3_step Sqlite3_step< SQLITE_MISUSE: library routine called out of sequence here Sqlite3_reset Sqlite3_reset UnLockGlobalCriticalSection UnLockGlobalCriticalSection // The following code works fine though THREAD1 THREAD2 LockGlobalCriticalSection LockGlobalCriticalSection Sqlite3_prepare Sqlite3_prepare Sqlite3_step Sqlite3_step Sqlite3_finalize Sqlite3_finalize UnLockGlobalCriticalSection UnLockGlobalCriticalSection If my tests are correct it is not possible to retain a prepared statement across threads. And has to be reprepared each time ?? - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Cache invalidation after insert statements.
AArrgh. That is the one thing that I wont be able to do. It would require a complete system redesign. I can adapt my program easy but now to get it to work in the greater scheme of things would be a nightmare. My current efforts are being focussed into making a ram drive and putting the file in there. I hope it works. ken-33 wrote: > > Can you consolidate your multiple binaries to a Single Binary? > Then Use threading and sqlite's shared caching to perform the Lookups and > updates. > That way the cache wouldn't get invalidated??? > > Someone else here correct me if this is a bad idea!!! > > > > pompomJuice <[EMAIL PROTECTED]> wrote: > I suspected something like this, as it makes sense. > > I have multiple binaries/different connections ( and I cannot make them > share a connection ) using this one lookup table and depending on which > connection checks first, it will update the table. > > My question is then, if any one connection makes any change to the > database > ( not neccesarily to the huge lookup table ) will all the other > connections > invalidate their entire cache? Or is it per table/btree that the cache is > dropped? > > Thanks for that reponse. Already I can move ahead now with better knowlege > of how the caching works. > > Regards. > > > > Dan Kennedy-4 wrote: >> >> On Tue, 2007-06-19 at 01:06 -0700, pompomJuice wrote: >>> Hello there. >>> >>> I need some insight into how SQLite's caching works. I have a database >>> that >>> is quite large (5Gb) sitting on a production server that's IO is >>> severely >>> taxed. This causes my SQLite db to perform very poorly. Most of the time >>> my >>> application just sits there and uses about 10% of a CPU where it would >>> use a >>> 100% on test systems with idle IO. Effectively what the application does >>> is >>> constantly doing lookups as fast as it can. >>> >>> To counteract this I increased the page size to 8192 (Unix server with >>> advfs >>> having 8K block sizes) and increased the MAX_PAGES value in sqliteInt.h >>> to >>> 512000. This worked. My application starts at low memory usage and as it >>> gradually gains more memory. As it gains more memory it uses more CPU >>> and >>> reaches a point where it finally uses 100% CPU and 5Gb of ram. >>> >>> Every now and then the lookup table is udpated. As soon as the >>> application >>> does this however the performance goes back to a crawl and slowly builds >>> up >>> again as described in the previous paragraph. The memory usage stays at >>> 5Gb. >>> All that I can think of is that the update invalidates the cache. >> >> Probably right. >> >>> The update >>> is not very big, say 20 rows in a table that has about 45 million >>> rows. >>> >>> What exactly is happening here? >> >> Are you using 3.3.17? And is it an external process (or at least a >> different connection doing) doing the update? >> >> If so, the update is modifying the pager change-counter, invalidating >> the pager cache held by the lookup application. The lookup app has >> to start loading pages from the disk again, instead of just reading >> it's cache. >> >> The only way around this performance hit is to do the UPDATE through >> the lookup app, using the same database connection. >> >> Dan. >> >>> Regards. >> >> >> - >> To unsubscribe, send email to [EMAIL PROTECTED] >> - >> >> >> > > -- > View this message in context: > http://www.nabble.com/Cache-invalidation-after-insert-statements.-tf3944908.html#a11192121 > Sent from the SQLite mailing list archive at Nabble.com. > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > > > > -- View this message in context: http://www.nabble.com/Cache-invalidation-after-insert-statements.-tf3944908.html#a11208520 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Proper way to transfer a live sqlite database
On Tue, 2007-06-19 at 11:51 -0700, Gerry Snyder wrote: > Michael Hooker wrote: > > Many thanks for the explanation Dan. > Ditto the thanks. > > I suspected the purpose of ROLLBACK was as you say, but couldn't see > > why it was used here. You point out the "under the hood" difference > > between ROLLBACK and COMMIT, but what about END? My main (third-party, > > commercial) application may well have some data stored waiting to be > > fed into the database after the file has been copied, and if it is > > forced to discard its cache that presumably means these are lost, > > which wouldn't be a good idea. END is the same as COMMIT in sqlite. > It shouldn't have to. The cache Dan was referring to was an internal > copy of (part of) what is already in the data base. If the data base > file has been updated, that copy has to be discarded, since it may not > be valid--of course, it may be valid, but figuring out whether it is > would be a lot more work than just rereading it. Anyhow, this is all > happening at a much lower level than the application data you are > referring to, which is still valid and should be entered into the file. Right. If another app has some "writes" (dirty pages) in it's cache, then it will already have at least a RESERVED lock on the database file. If this is the case the "BEGIN IMMEDIATE" statement executed by the copy-file process will fail to obtain it's EXCLUSIVE database lock. So the only logic the file-copy process needs is "Do not do the file copy until after the BEGIN IMMEDIATE succeeds". Dan. > HTH, > > Gerry > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: How to use pragmas from code?
On Tue, 2007-06-19 at 15:39 -0400, Igor Tandetnik wrote: > Shane Harrelson > <[EMAIL PROTECTED]> wrote: > > To use pragmas from code, do I simply prepare them as a regular SQL > > statement and then execute them? > > Yes. Another thing to note: Some pragmas take effect during sqlite3_prepare(), not sqlite3_step() (examples: cache_size, temp_store). For this reason calling sqlite3_reset() and then reusing a PRAGMA statement has the potential to produce confusing results (or at least SQLITE_SCHEMA errors). Personally, I would use sqlite3_exec() to execute pragma statements from C code. Dan. > > And when can they/should they be done? As the first statement after > > an open? > > Some pragmas have to be set early, others may be changed at any time. > > > Are the pragma values stored with the database? > > Some pragmas affect the format of the database file - these are stored > in the database. Others only affect current connection - these are not > stored. > > > Is there are particular pragma you are worrying about? > > Igor Tandetnik > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > - To unsubscribe, send email to [EMAIL PROTECTED] -