[sqlite] Of shared cache, table locks and transactions
Hey folks... I have a situation that caused me a little head-scratching and I'm wondering if it's intended behavior or not. I'm running a server thread (roughly based on test_server.c in the distro) on top of the 3.3.6 library. The effectve call sequence in question (all from one thread) looks something like this: sqlite3_open("/path/to/db", _one); sqlite3_prepare(db_one, "CREATE TABLE foo (id);", -1, _one, NULL); sqlite3_step(stmt_one); sqlite3_finalize(stmt_one); sqlite3_prepare(db_one, "BEGIN DEFERRED;", -1, _one, NULL); sqlite3_step(stmt_one); sqlite3_finalize(stmt_one); sqlite3_prepare(db_one, "INSERT INTO foo VALUES (123);", -1, _one, NULL); sqlite3_step(stmt_one); sqlite3_finalize(stmt_one); sqlite3_prepare(db_one, "SELECT count(*) FROM foo;", -1, _one, NULL); sqlite3_step(stmt_one); // point of interest #1 sqlite3_column_int(stmt_one, 0); sqlite3_finalize(stmt_one); // new connection here, previous transaction still pending... sqlite3_open("/path/to/db", _two); sqlite3_prepare(db, "SELECT count(*) FROM foo;", -1, _two, NULL); // point of interest #2 sqlite3_step(stmt_two); // point of interest #3 sqlite3_column_int(stmt_two, 0); sqlite3_finalize(stmt_two); If shared cache is DISabled, then I get "1" on the first point of interest and "0" on the third point of interest, which is what I'd expect. The database file is at a RESERVED lock state in both locations, and the first point of interest gets uncommitted data since it's in the same connection, while the second point of interest can't yet see that data since it's a different connection and the transaction is not yet committed. On the other hand, if shared cache is ENabled, then I get "1" on the first point of interest and SQLITE_LOCKED at the second point of interest. This would seem to indicate an actual degradation of concurrency by using shared caching. Without shared caching, readers in the same thread as a pending writer are allowed. With shared caching, they are not. The EXPLAIN output seems to confirm that this is a result of the binary nature of table locks vs. the staged nature of sqlite file locks. This came up when I was running the JUnit tests for the SQLiteJDBC driver ( http://zentus.com/sqlitejdbc). TransactionTest.testInsert() failed out in the manner described above when using a server-based shared-cache backend. I scanned the CVS logs and didn't see anything obvious between 3.3.6 and 3.3.10 that would change this behavior. Thoughts? Thanks, Pete.
[sqlite] Re: How to optimize a select that gets 17 rows from a 700k row DB (via perl using DBI)?
First off, thanks for the help and sorry for the formatting of the message. I didn't know how it was going to turn out and I probably was overly optimistic as well as too verbose. Secondly, as I feared, seems like it was an XY question, so sorry for that as well. I'll address the two replies I can see so far, and some of the info in each section will likely overlap. Nico: I guess that's the trick, to have the "current" or at least "recent" database and then the historical one. As of now, the process of polling the 17 machines takes about 40 seconds or so (when I first started running the process minutely, it was 20, so you can see I have to do something soon :)) So assuming the two-db model, what's the trick to it? Here are some ideas off the top of my head--can you (or any reader) please give me your thoughts (be as brutal as you like--I'm under no illusion that I know what I'm talking about): 1) The "current" table only ever has 17 rows. a)Have some kind of thing built in to the script that runs minutely to copy the "current" data to the historical DB before kicking off the part that updates the current data. b)Add a trigger to the DB where the SQLite engine takes care of the copy somehow--this would probably be more difficult since I don't know how to add a trigger and I am thinking that the historical database will be in a different file altogether. c)Something I haven't thought of 2) The current table is only allowed to have a maximum on N rows. Upon reaching this size, data are moved to the historical database and only the most recent observations for each machine are left in the current DB. Not sure how I could do that. Is there a way to do this within SQLite? 3) A job runs every night or week (at a time when people are least likely to be using the page such as 3 am) that transfers the data from the current DB to the historical, leaving only the most recent observation for each machine. Jay: The closer to real-time, the better. The most often a cron job can run under Linux is minutely, and minutely is pretty good. I guess I could have the summary process occur at the end of the script that polls the machines. It could generate static HTML, which would presumably make the page load super fast. However, under the current regime, the process of creating that summary is going to take at least 10 seconds. 40 seconds for polling + 10 seconds for summarizing=50 seconds, and that number is only going to get bigger! So I'll have to figure out a better table structure anyway. Additional thoughts: In general, I think splitting the tables up is the way to go. Any further comments/suggestions appreciated! Jonathan - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: A little help with count
Great solution Igor! On 13/01/07, Igor Tandetnik <[EMAIL PROTECTED]> wrote: Lloyd Thomas wrote: > I wish to create a query where I do a number of counts on the same > table but > with different filters. > ie: > count(id) as numrows > count(id) as inrows where direction = 'In' > count(id) as outrows where direction = 'Out' > > Could I do the above in a single query? select count(*) as numrows, count(case direction when 'In' then 1 else NULL end) as inrows, count(case direction when 'Out' then 1 else NULL end) as outrows from mytable; Igor Tandetnik - To unsubscribe, send email to [EMAIL PROTECTED] - -- Cesar Rodas http://www.sf.net/projects/pagerank (The PageRank made easy...) Mobile Phone: 595 961 974165 Phone: 595 21 645590 [EMAIL PROTECTED] [EMAIL PROTECTED]
[sqlite] Re: A little help with count
Lloyd Thomas wrote: I wish to create a query where I do a number of counts on the same table but with different filters. ie: count(id) as numrows count(id) as inrows where direction = 'In' count(id) as outrows where direction = 'Out' Could I do the above in a single query? select count(*) as numrows, count(case direction when 'In' then 1 else NULL end) as inrows, count(case direction when 'Out' then 1 else NULL end) as outrows from mytable; Igor Tandetnik - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] A little help with count
select count(id), "numrows" as type from table union select count(id), "inrows" as type from table where direction = 'In' union select count(id) as id, "outrows" as type from table where direction = 'Out' This could be? I am sorry that i can't try here, because i am not in my work or my home, and here i don't have SQLite... try and tell if it works or not. Best Regards On 13/01/07, Lloyd Thomas <[EMAIL PROTECTED]> wrote: I wish to create a query where I do a number of counts on the same table but with different filters. ie: count(id) as numrows count(id) as inrows where direction = 'In' count(id) as outrows where direction = 'Out' Could I do the above in a single query? | id | date | direction | duration | cost | | 1 |2007-01-01|In| 56 | 0.00 | | 2 |2007-01-01| Out | 60 | 0.10 | | 3 |2007-01-02| Out | 47 | 0.10 | | 4 |2007-01-02|In| 120 | 0.20 | Thx Lloydie T - To unsubscribe, send email to [EMAIL PROTECTED] - -- Cesar Rodas http://www.sf.net/projects/pagerank (The PageRank made easy...) Mobile Phone: 595 961 974165 Phone: 595 21 645590 [EMAIL PROTECTED] [EMAIL PROTECTED]
[sqlite] A little help with count
I wish to create a query where I do a number of counts on the same table but with different filters. ie: count(id) as numrows count(id) as inrows where direction = 'In' count(id) as outrows where direction = 'Out' Could I do the above in a single query? | id | date | direction | duration | cost | | 1 |2007-01-01|In| 56 | 0.00 | | 2 |2007-01-01| Out | 60 | 0.10 | | 3 |2007-01-02| Out | 47 | 0.10 | | 4 |2007-01-02|In| 120 | 0.20 | Thx Lloydie T - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] How to optimize a select that gets 17 rows from a 700k row DB (via perl using DBI)?
Have you considered running a process that summarizes the data from the table and just fetching the last summary for display? Will a periodic snapshot work for your reporting or do you need realtime summarization? On 1/12/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Hi all. I will try to post all data I can think of. The basic issue is that I have a Perl CGI script that shows the most recent load stats for a bunch of servers we use at our site for long, computing-intense programs. The environment is all RedHat Enterprise Linux 4 ( 2.6.9-42.26.ELsmp #1 SMP i686 i686 i386 GNU/Linux) SQLite version is 3.2.2. As the number of records in my database has increased, the page load time has also increased, now to about 10 seconds, which is unacceptable, IMO. The idea is to give users an idea of the least loaded machine to help them make the decision of which to use for their next program. The whole concept of how to best do this is, of course, in and of itself a complicated issue. At my org. there is no other system in place so this is how I do it. I've gotten a big positive response from users. The last implementation used a CSV file and each iteration of gathering the data from the servers only allowed the cron job to run every five minutes. After upgrading to SQLite, the now minutely cron job has never failed to complete successfully in under one minute that I know of. I'd like to focus on optimizing my current implementation, and then, if relevant, talk about other, perhaps better ways to do it. Currently, both the old, 5-minutely CSV version and the current beta (SQLite) version run in parallel, and the CSV version is still more widely used since the page actually loads in a reasonable amount of time. The CSV data are overwritten each time so the process of reading in that data is always the same. The SQLite database is INSERTed into because I want to keep historical data. The rationale for this is explained later. The current system uses a DB with a table called stats that has more or less minutely data on various conditions for ~17 so-called peak machines. The table can be described thus: CREATE TABLE stats (Machine text not null,Load real,Scratch text(4), Mem int, MemPctFree int, Procs int, Users int, Timestamp text(20) not null, Message text); My CGI script fetches the most recent observation for each machine with a select statement of the form: select a.* from stats a, (select Machine, max(Timestamp) as M from stats group by machine) b where a.machine=b.machine and a.timestamp=b.M order by load, Mem*MemPctFree desc, Scratch desc; I floundered about trying to find a select statement that would simply give me the right answer; I suspect the main opportunity for optimization is in rewriting that statement. The CGI script uses DBI and the exact statement is my $aref = $dbh->selectall_arrayref($ss2); where $ss2 is the select statement above. The idea being that the least loaded machines are first in the list. The usefulness of the stats reported are of course debatable, but again, let's make sure the problem is addressed separately. Some of the machines have very large scratch partitions which make those machines particularly suitable for jobs that generate very large files. A typical group of rows might look like this: Machine Load Scratch space available Total memory (GB) Free memory (%) Running processes User(s) Timestamp Message peaklx6 0 21G 12 96 1 1 [EMAIL PROTECTED]:28:35 peaklx1 0 58G 12 88 1 1 [EMAIL PROTECTED]:29:03 peaklx5 0 19G 12 86 1 1 [EMAIL PROTECTED]:28:33 peaklx4 0 12 84 1 4 [EMAIL PROTECTED]:28:31 peaklx2 0 59G 12 65 1 2 [EMAIL PROTECTED]:29:05 peaklx21 0 4 14 1 2 [EMAIL PROTECTED]:29:07 peaklx3 0.1 12 78 1 3 [EMAIL PROTECTED]:28:27 fstlx1 0.2 20G 4 5 1 11 [EMAIL PROTECTED]:28:37 peaklx22 1 4 53 2 0 [EMAIL PROTECTED]:29:09 peaklx24 2 4 58 3 0 [EMAIL PROTECTED]:29:12 peaklx29 2 4 40 3 3 [EMAIL PROTECTED]:28:25 peaklx23 2 4 39 3 0 [EMAIL PROTECTED]:29:11 peaklx30 2 4 29 3 0 [EMAIL PROTECTED]:28:29 peaklx28 2 4 28 3 0 [EMAIL PROTECTED]:28:23 peaklx26 3 12 69 4 0 [EMAIL PROTECTED]:29:17 peaklx25 4 4 21 5 0 [EMAIL PROTECTED]:29:15 peaklx27 4 4 17 5 2 [EMAIL PROTECTED]:28:21 The problem I am having is that the page load times are getting very, very long. The table now has about 700,000 columns and the select statement is taking about 10 seconds to execute (I benchmarked various parts of the program and it's not the DB connect or loading the modules or whatnot). The DB file lives on an NFS mounted share but running the same query locally and interactively and both all take approximately the same amount of time. The size of the DB file itself is currently 89MB. The web server (and all the servers in question) all are dual processor 2.8GHz Xeons with a minimum of 4GB of memory (all running the same OS). Some have hyperthreading turned on and others don't. I tried putting a copy of the DB file in the same directory as the script;
Re: [sqlite] 3.3.10 data corruption on updating fts1 string table
Glad I could help find something, hope I put you in the right direction with the source sample. In the meantime I've moved to standard tables due to other limitations imposed by FTS1/2. Scott Hess wrote: > > OK, there's definite meat, here. I have other reports of users seeing > this problem. It's specifically related to doing UPDATE against an > fts1 or fts2 table. INSERT and DELETE both work fine. As far as I > can tell, UPDATE may have never worked, or may have worked only in > specific circumstances. More as it's available. > > -scott > > > On 1/12/07, ohadp <[EMAIL PROTECTED]> wrote: >> >> looks like the file came in without CRLF, here goes: >> >> -- >> #include "../sqlite-3_3_8/sqlite3.h" >> #include >> >> static sqlite3* db; >> >> void exec_dml(const TCHAR* cmd) >> { >> sqlite3_stmt* vm; >> >> sqlite3_prepare16(db, cmd, -1, , 0); >> sqlite3_step(vm); >> sqlite3_finalize(vm); >> } >> >> void exec_query(const TCHAR* cmd) >> { >> sqlite3_stmt* vm; >> >> sqlite3_prepare16(db, cmd, -1, , 0); >> if (sqlite3_step(vm) == SQLITE_ROW) >> { >> TCHAR* result = (TCHAR*)sqlite3_column_text16(vm, 0); >> result=result; >> } >> >> sqlite3_finalize(vm); >> } >> >> int _tmain(int argc, _TCHAR* argv[]) >> { >> sqlite3_open16(_T("test.db"), ); >> exec_dml(_T("CREATE VIRTUAL TABLE t USING fts1(content);")); >> exec_dml(_T("INSERT INTO t (rowid, content) VALUES (1, 'this is a >> test');")); >> >> exec_query(_T("SELECT content FROM t WHERE rowid = 1;")); >> exec_dml(_T("UPDATE t SET content = 'that was a test' WHERE rowid >> = 1;")); >> exec_query(_T("SELECT content FROM t WHERE rowid = 1;")); >> >> return 0; >> } >> -- >> -- >> View this message in context: >> http://www.nabble.com/3.3.10-data-corruption-on-updating-fts1-string-table-tf2960926.html#a8305111 >> Sent from the SQLite mailing list archive at Nabble.com. >> >> >> - >> To unsubscribe, send email to [EMAIL PROTECTED] >> - >> >> > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > > > -- View this message in context: http://www.nabble.com/3.3.10-data-corruption-on-updating-fts1-string-table-tf2960926.html#a8312740 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] SQLITE_ENABLE_LOCKING_STYLE
I am sorry if the question is too obvious but I haven't found any documentation about the SQLITE_ENABLE_LOCKING_STYLE macro (it is in os.c). Should I set it to 0 (default) or 1? I need to be able to access database files located on shared volumes on OS X. Which are the implications of setting it to 1? Thanks a lot. --- Marco Bambini http://www.sqlabs.net http://www.sqlabs.net/blog/ http://www.sqlabs.net/realsqlserver/ - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] How to optimize a select that gets 17 rows from a 700k row DB (via perl using DBI)?
On Sat, Jan 13, 2007 at 12:57:43AM -0500, [EMAIL PROTECTED] wrote: > The SQLite database > is INSERTed into because I want to keep historical data. The rationale > for this is explained later. For your main application (finding the least loaded machine) you don't need historical data, so put the historical data in a separate table or else add a column that you'll set to NULL for historical data and which will be part of an index/primary key so that historical data is not indexed. And since you have so few "live" rows you might as well just have two tables and for the non-historical table don't even bother with indexes. Nico -- - To unsubscribe, send email to [EMAIL PROTECTED] -