[sqlite] Prevent database file from being overwritten by other processes
On 04/08/2015 09:51 PM, R.Smith wrote: > > > On 2015-04-08 04:18 PM, Fabian Pr?bstl wrote: >> Hi there! >> >> Currently, we are using SQLite as our application file format for a >> Windows 7/C#/System.Data.SQLite based desktop application. We only >> allow one instance to open the file by running "set >> locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" when connecting to the >> database. > > BEGIN EXCLUSIVE - Locks the database from other SQLite3 database > connections for the time being. > COMMIT; - Unlocks it again - so calling all this in one go is pointless. Almost always correct. But the "PRAGMA locking_mode=EXCLUSIVE" thing changes the behaviour: https://www.sqlite.org/pragma.html#pragma_locking_mode > > That said, database locking serves only to protect from other database > changes... There is no way to prevent a user from intentional messing > with any file if they have the privileges to do so. Best practice is > to keep the file in your program's assigned /programdata folder or the > user folders (/AppData/Roaming/yourApp/ is the usual) - the typical > user won't go mess there or even know to look there. Other than that, > the entire point of an operating system is to serve its user, not your > program - as it should, so you cannot unfortunately protect users > against themselves. > > If this is to do with your own security being a concern (i.e. you are > not trying to safeguard the user) then I would strongly suggest an > encryption module or using a DB with user-level locking. (Even then > you still won't be able to protect against a willful user deleting, > moving, overwriting or otherwise accessing a file). > > At a tangent: > How would you feel if your operating system disallowed you those > privileges because some program you installed asked it to? I would > change operating systems immediately - Viruses are a big enough > problem as it is - imagine being unable to get rid of them... > > Good luck! > Ryan > > >> >> This all works fine, however a user can still open Windows Explorer >> and copy paste a file with the same name but different content (e.g. >> an empty file) over an existing, exclusively locked database. From >> what I found out with the OpenedFilesView tool, SQLite seems to open >> the file with SHARED_WRITE, which explains why *any* process can >> overwrite the contents. >> >> Is there an easy way of configuring / changing this so that >> SHARED_WRITE is not acquired? Will SQLite even function? Is it just >> easier to create a hidden copy and work on that? >> >> Thanks for the advice >> Fabian >> ___ >> sqlite-users mailing list >> sqlite-users at mailinglists.sqlite.org >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Prevent database file from being overwritten by other processes
On 2015-04-08 06:00 PM, Dan Kennedy wrote: > On 04/08/2015 10:52 PM, R.Smith wrote: >> >> >> On 2015-04-08 05:38 PM, Dan Kennedy wrote: >>> On 04/08/2015 09:51 PM, R.Smith wrote: On 2015-04-08 04:18 PM, Fabian Pr?bstl wrote: > Hi there! > > Currently, we are using SQLite as our application file format for > a Windows 7/C#/System.Data.SQLite based desktop application. We > only allow one instance to open the file by running "set > locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" when connecting to > the database. BEGIN EXCLUSIVE - Locks the database from other SQLite3 database connections for the time being. COMMIT; - Unlocks it again - so calling all this in one go is pointless. >>> >>> Almost always correct. But the "PRAGMA locking_mode=EXCLUSIVE" thing >>> changes the behaviour: >>> >>> https://www.sqlite.org/pragma.html#pragma_locking_mode >> >> But you need an actual SELECT to get a shared lock and an actual >> write operation to lock it exclusively, just starting the transaction >> and ending it does nothing to that effect? Or is my understanding wrong? > > That's the usual case. But "BEGIN EXCLUSIVE" actually does take an > exclusive lock: > > https://www.sqlite.org/lang_transaction.html Goodness... Where was my mind?! Thank you Dan and apologies for the round trip, and for misleading the OP! BEGIN EXCLUSIVE will of course instill the lock immediately.
[sqlite] Prevent database file from being overwritten by other processes
I have the same problem but in order to prevent overriding I actually make a hidden copy of the file in the local app directory when opening it. This allows me to recover it if the application breaks and to implement save, save As etc. and if it is no longer there I can still save where it was. just an idea Regards, Maarten, > "Fabian Pr?bstl" | > Hi there! > > Currently, we are using SQLite as our application file format for a Windows > 7/C#/System.Data.SQLite based desktop application. We only allow one instance > to > open the file by running "set locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" > when > connecting to the database. > > This all works fine, however a user can still open Windows Explorer and copy > paste > a file with the same name but different content (e.g. an empty file) over an > existing, exclusively locked database. From what I found out with the > OpenedFilesView tool, SQLite seems to open the file with SHARED_WRITE, which > explains why *any* process can overwrite the contents. > > Is there an easy way of configuring / changing this so that SHARED_WRITE is > not > acquired? Will SQLite even function? Is it just easier to create a hidden > copy and > work on that? > > Thanks for the advice > Fabian > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Prevent database file from being overwritten by other processes
On 2015-04-08 05:38 PM, Dan Kennedy wrote: > On 04/08/2015 09:51 PM, R.Smith wrote: >> >> >> On 2015-04-08 04:18 PM, Fabian Pr?bstl wrote: >>> Hi there! >>> >>> Currently, we are using SQLite as our application file format for a >>> Windows 7/C#/System.Data.SQLite based desktop application. We only >>> allow one instance to open the file by running "set >>> locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" when connecting to >>> the database. >> >> BEGIN EXCLUSIVE - Locks the database from other SQLite3 database >> connections for the time being. >> COMMIT; - Unlocks it again - so calling all this in one go is pointless. > > Almost always correct. But the "PRAGMA locking_mode=EXCLUSIVE" thing > changes the behaviour: > > https://www.sqlite.org/pragma.html#pragma_locking_mode But you need an actual SELECT to get a shared lock and an actual write operation to lock it exclusively, just starting the transaction and ending it does nothing to that effect? Or is my understanding wrong?
[sqlite] Prevent database file from being overwritten by other processes
In the Amalgamation Source search for the line (around 37836): dwShareMode = FILE_SHARE_READ | FILE_SHARE_WRITE; If you change this to: dwShareMode = 0 then use this version of sqlite3.c in your application. This will open the file for "exclusive" access and not shared access. This means that the file will not be able to be read/written/deleted while your application has the file open. "Locking Mode" is not the same as "ShareMode", and the standard library does not have the capability to open a database for exclusive (non-shared) access. It should be noted that even if you open the file in exclusive (non-shared) mode, it can still be deleted or renamed while in use -- but the contents cannot be changed (or read) by another process while it is open. --- Theory is when you know everything but nothing works. Practice is when everything works but no one knows why. Sometimes theory and practice are combined: nothing works and no one knows why. >-Original Message- >From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users- >bounces at mailinglists.sqlite.org] On Behalf Of Fabian Pr?bstl >Sent: Wednesday, 8 April, 2015 08:18 >To: sqlite-users at mailinglists.sqlite.org >Subject: [sqlite] Prevent database file from being overwritten by other >processes > >Hi there! > >Currently, we are using SQLite as our application file format for a >Windows 7/C#/System.Data.SQLite based desktop application. We only allow >one instance to open the file by running "set >locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" when connecting to the >database. > >This all works fine, however a user can still open Windows Explorer and >copy paste a file with the same name but different content (e.g. an empty >file) over an existing, exclusively locked database. From what I found >out with the OpenedFilesView tool, SQLite seems to open the file with >SHARED_WRITE, which explains why *any* process can overwrite the >contents. > >Is there an easy way of configuring / changing this so that SHARED_WRITE >is not acquired? Will SQLite even function? Is it just easier to create a >hidden copy and work on that? > >Thanks for the advice >Fabian >___ >sqlite-users mailing list >sqlite-users at mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Prevent database file from being overwritten by other processes
On 2015-04-08 04:18 PM, Fabian Pr?bstl wrote: > Hi there! > > Currently, we are using SQLite as our application file format for a Windows > 7/C#/System.Data.SQLite based desktop application. We only allow one instance > to open the file by running "set locking_mode=EXCLUSIVE;BEGIN > EXCLUSIVE;COMMIT" when connecting to the database. BEGIN EXCLUSIVE - Locks the database from other SQLite3 database connections for the time being. COMMIT; - Unlocks it again - so calling all this in one go is pointless. That said, database locking serves only to protect from other database changes... There is no way to prevent a user from intentional messing with any file if they have the privileges to do so. Best practice is to keep the file in your program's assigned /programdata folder or the user folders (/AppData/Roaming/yourApp/ is the usual) - the typical user won't go mess there or even know to look there. Other than that, the entire point of an operating system is to serve its user, not your program - as it should, so you cannot unfortunately protect users against themselves. If this is to do with your own security being a concern (i.e. you are not trying to safeguard the user) then I would strongly suggest an encryption module or using a DB with user-level locking. (Even then you still won't be able to protect against a willful user deleting, moving, overwriting or otherwise accessing a file). At a tangent: How would you feel if your operating system disallowed you those privileges because some program you installed asked it to? I would change operating systems immediately - Viruses are a big enough problem as it is - imagine being unable to get rid of them... Good luck! Ryan > > This all works fine, however a user can still open Windows Explorer and copy > paste a file with the same name but different content (e.g. an empty file) > over an existing, exclusively locked database. From what I found out with the > OpenedFilesView tool, SQLite seems to open the file with SHARED_WRITE, which > explains why *any* process can overwrite the contents. > > Is there an easy way of configuring / changing this so that SHARED_WRITE is > not acquired? Will SQLite even function? Is it just easier to create a hidden > copy and work on that? > > Thanks for the advice > Fabian > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Prevent database file from being overwritten by other processes
On 8 Apr 2015, at 3:51pm, R.Smith wrote: > BEGIN EXCLUSIVE - Locks the database from other SQLite3 database connections > for the time being. > COMMIT; - Unlocks it again - so calling all this in one go is pointless. > > That said, database locking serves only to protect from other database > changes... There is no way to prevent a user from intentional messing with > any file if they have the privileges to do so. Correct. Part of my testing for setup security is as follows: Open a sqlite database file in a text editor (NOTEPAD.EXE, TextEdit, whatever). Type some gibberish characters at a few places in the file. Wait until an app is using it via SQLite calls. Hit 'Save'. You can't do anything to prevent it. But your business procedure (whatever software you run, whatever corruption testing you do, whatever corruption testing your software does) needs to be able to detect the problem and raise an alarm. Simon.
[sqlite] Performance issue
On 2015-04-08 01:57 PM, Dominique Devienne wrote: >> No Dominique, it's not that - >> > Perhaps. But that still doesn't get to my point. With a LIMIT clause, in > such a GROUP BY ORDER BY returning a large result set, would SQLite: > 1) sort the whole result-set and then keep only the first top-N rows? > 2) or instead do a partial-sort of the first top-N rows only, as in > http://www.cplusplus.com/reference/algorithm/partial_sort/? > > I'm interested in finding out for sure. Perhaps that's highjacking this > thread a bit, but in case of SQLite doing #1, and MySQL doing #2, it could > explain some of the difference. (although sorting a 1M array is so fast > nowadays, I doubt it.). I think the partial sort algorithm only finds the first N items and then stops sorting, but for that to be possible the result set must be present in full and finished calculating in full already. The partial sort itself might save a millisecond or two from a complete sort in large lists. Either way, SQLite is more clever than that as Richard pointed out. > His rows are "fatter", since he mentioned 41 columns. Which might make it > go over some threshold(s) (page cache?) slowing things down once past it. > > But indeed, sharing the DB (if not sensitive data) would be the way to go. No no, we know his rows' fatness exactly, he did send the schema, they are 41 integer values, i.e it doesn't matter. So yeah, there must be some trivial thing which the OP (and I) are missing. Even an obscured values DB that still causes the slow query will work...
[sqlite] Prevent database file from being overwritten by other processes
Even if there were a simple way to protect an SQLite db file from being casually (or even maliciously) overwritten by a user (which there isn't), it is quite impossible to prevent a user with "root privileges" from accessing/altering/deleting/moving/renaming ANY file on any operating system (that is, after all, what root privileges are designed to do...). SQLite is designed to work with multiple processes/threads on the same system accessing the same file on local storage. If you need to have a special version that only allows one thread of one process to open the file you can change the appropriate lines in the source code und build your own image. I would strongly recommend statically linking your "special" copy of SQLite with your application. -Urspr?ngliche Nachricht- Von: Fabian Pr?bstl [mailto:Fabian.Proebstl at nanotemper.de] Gesendet: Mittwoch, 08. April 2015 16:18 An: sqlite-users at mailinglists.sqlite.org Betreff: [sqlite] Prevent database file from being overwritten by other processes Hi there! Currently, we are using SQLite as our application file format for a Windows 7/C#/System.Data.SQLite based desktop application. We only allow one instance to open the file by running "set locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" when connecting to the database. This all works fine, however a user can still open Windows Explorer and copy paste a file with the same name but different content (e.g. an empty file) over an existing, exclusively locked database. From what I found out with the OpenedFilesView tool, SQLite seems to open the file with SHARED_WRITE, which explains why *any* process can overwrite the contents. Is there an easy way of configuring / changing this so that SHARED_WRITE is not acquired? Will SQLite even function? Is it just easier to create a hidden copy and work on that? Thanks for the advice Fabian ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick Software Engineer Scientific Games International GmbH FN 157284 a, HG Wien Klitschgasse 2-4, A-1130 Vienna, Austria Tel: +43 1 80100 0 E-Mail: hick at scigames.at This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.
[sqlite] Performance issue
On Wed, Apr 8, 2015 at 2:09 PM, Richard Hipp wrote: > On 4/8/15, Dominique Devienne wrote: > > With a LIMIT clause, in > > such a GROUP BY ORDER BY returning a large result set, would SQLite: > > 1) sort the whole result-set and then keep only the first top-N rows? > > 2) or instead do a partial-sort of the first top-N rows only, > > SQLite must examine all rows of output, obviously. But it only keeps > the top-N in memory and only sorts the top-N. If there are a total of > M candidate rows and only the top-N are to be displayed, then the > algorithm is O(M*logN) in time and O(N) in space. > Thank you Richard. --DD
[sqlite] Prevent database file from being overwritten by other processes
Hi there! Currently, we are using SQLite as our application file format for a Windows 7/C#/System.Data.SQLite based desktop application. We only allow one instance to open the file by running "set locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" when connecting to the database. This all works fine, however a user can still open Windows Explorer and copy paste a file with the same name but different content (e.g. an empty file) over an existing, exclusively locked database. From what I found out with the OpenedFilesView tool, SQLite seems to open the file with SHARED_WRITE, which explains why *any* process can overwrite the contents. Is there an easy way of configuring / changing this so that SHARED_WRITE is not acquired? Will SQLite even function? Is it just easier to create a hidden copy and work on that? Thanks for the advice Fabian
[sqlite] Performance issue
On Wed, Apr 8, 2015 at 1:24 PM, R.Smith wrote: > On 2015-04-08 11:35 AM, Dominique Devienne wrote: > >> On Wed, Apr 8, 2015 at 11:16 AM, Jeff Roux wrote: >> >>> time echo 'SELECT ... FROM flows WHERE timestamp>=1383770600 AND \ >>> timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \ >>> ORDER BY vol DESC LIMIT 6;' | >>> mysql testperf >>> >> > If you dropped the LIMIT 6 from both, are the results >> significantly different? >> > > No Dominique, it's not that - > Perhaps. But that still doesn't get to my point. With a LIMIT clause, in such a GROUP BY ORDER BY returning a large result set, would SQLite: 1) sort the whole result-set and then keep only the first top-N rows? 2) or instead do a partial-sort of the first top-N rows only, as in http://www.cplusplus.com/reference/algorithm/partial_sort/? I'm interested in finding out for sure. Perhaps that's highjacking this thread a bit, but in case of SQLite doing #1, and MySQL doing #2, it could explain some of the difference. (although sorting a 1M array is so fast nowadays, I doubt it.). there must be another thing wrong with his setup. (He might not see my > replies because he uses gmail). > Sure. I don't dispute that. > In fact, the entire script, making the table, adding the index, populating > it with a million rows (with computed values no less) and then doing the > query plus posting the output - ALL of it takes less than 4 seconds > together: (Try it yourself) > His rows are "fatter", since he mentioned 41 columns. Which might make it go over some threshold(s) (page cache?) slowing things down once past it. But indeed, sharing the DB (if not sensitive data) would be the way to go. --DD
[sqlite] Performance issue
On 2015-04-08 11:35 AM, Dominique Devienne wrote: > On Wed, Apr 8, 2015 at 11:16 AM, Jeff Roux wrote: > >> # For mysql, I use: >> /etc/init.d/mysql stop; /etc/init.d/mysql start; \ >> time echo 'SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan \ >> as item FROM flows WHERE timestamp>=1383770600 AND \ >> timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \ >> ORDER BY vol DESC LIMIT 6;' | >> mysql testperf >> > If you dropped the LIMIT 6 from both, are the results significantly > different? > > I.e. does actually getting the full result set result in closer numbers > between SQLite and MySQL? > > I'm not sure SQLite's implementation of LIMIT prevents the whole result-set > from being sorted before being truncated to 6 rows. Avoiding LIMIT would be > one round-about way to find out. --DD No Dominique, it's not that - there must be another thing wrong with his setup. (He might not see my replies because he uses gmail). We will need an actual DB file from the OP with the 1 mil records to test and understand the problem because there is something that isn't clear or not being said (for deemed unimportant probably). Again, here is a script that makes a similar table, populate it with 1 million rows, adds the correct index and then do the aggregate query on all of them. It takes less than 500 milliseconds - faster than MySQL sans caching. (Obviously my pragmas/compile options may make a difference). Even if there are much more data in the table per row, it can't cause an order of magnitude increase in time - and it doesn't seem to be the case from the schema the OP posted. In fact, the entire script, making the table, adding the index, populating it with a million rows (with computed values no less) and then doing the query plus posting the output - ALL of it takes less than 4 seconds together: (Try it yourself) Important to note here is that the query I adapted to actually include the entire dataset (all 1 mil rows) so the Index is of little value. It only matters where the WHERE clause refers less rows - in which case the time decreases linearly. -- DROP TABLE IF EXISTS flows; CREATE TABLE flows( id INTEGER PRIMARY KEY, ipLan TEXT, ipWan TEXT, portLan INT, portWan INT, protocol INT, nbBytesDecompOut INT, nbBytesCompIn INT, tos INT, timestamp INT ); WITH acc(x,mx8,dx8,mxK,dxK,rK) AS ( SELECT 0,0,0,0,0,100 UNION ALL SELECT x+1, (x%8), CAST(round(x/8,0) AS INT), (x%1024), CAST(round(x/1024,0) AS INT), CAST(abs(round(random()/10240,0)) AS INT) FROM acc WHERE (x<100) -- Testing 1 million rows ) INSERT INTO flows SELECT x, '192.168.1.'||mx8, '55.66.77.'||(dx8%256), -- ipLan, ipWan 1000+mx8, 5000+mx8, (x%18),-- portlan, portWan, protocol 64+(rk%1024000), -- nbBytesDecompOut 1024+(rk%1024000), -- nbBytesDecompIn (dx8%3), (138000+x)-- tos, timestamp FROM acc; CREATE INDEX idxFlowTimeProt ON flows (timestamp, protocol); SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item, count(*) as 'No.' FROM flows WHERE timestamp>=1373770600 AND timestamp<=1484770600 AND protocol IN (17, 6) GROUP BY portLan ORDER BY vol DESC LIMIT 6; -- volitem -- --- -- 140205511341001 -- 139645375081000 -- 139401283301003 -- 139381342321002 -- 139308617041005 -- 139138037521006 --Item Stats: Item No: 5 Query Size (Chars): 232 -- Result Columns:3 Result Rows: 6 -- VM Work Steps: 8833543 Rows Modified: 0 -- Sort Operations: 2 Table-Scan Steps:0 -- Prepare Time: -- --- --- --- --. -- Query Run: 0d 00h 00m and 00.408s -- Full Query Time: 0d 00h 00m and 00.408s -- Query Result: Success. -- -- Script Stats: Total Script Execution Time: 0d 00h 00m and 03.742s -- Total Script Query Time: 0d 00h 00m and 03.620s -- Total Database Rows Changed: 101 -- Total Virtual-Machine Steps: 76833701 -- Last executed Item Index:5 -- Last Script Error: --
[sqlite] fts5
On Wed, Apr 8, 2015 at 12:32 PM, Dan Kennedy wrote: > On 04/08/2015 04:49 AM, Scott Hess wrote: >> Something that bugged me a lot was that I had used deletion markers to >> cancel out hits, but did not provide a way for deletion markers to >> cancel out. The main problem with this was that a large delete would >> stay in the system until it reached the final segment, even if it had >> already overtaken all of the original inserts. I wished that I had >> either maintained a separate structure tracking _document_ deletion >> (which would make merges somewhat more complicated because they >> wouldn't be term-centric), or code updates as "delete+insert". In the >> latter case deletes could drop out at the point where they reached the >> original insert. > > Thanks for this. The "delete+insert" idea sounds like quite an interesting > one. > > So instead of just "delete" and "insert" keys, the merge tree now also > contains "delete+insert" keys (call them "update" keys). Then maintain the > tree so that > > (a) for each "insert", the next youngest duplicate key must either not > exist or be a "delete", > (b) for each "update", the next youngest duplicate key must exist and must > be an "insert" or "update", and > (c) for each "delete", the next youngest duplicate key must exist and must > be an "insert" or "update". > > And as a result, when a "delete" catches up with an "insert" while merging > they can both be discarded. Instead of the current situation, where we > retain the "delete" unless the output segment is the oldest in the database. > Cool. Yes, I think that's it. I have notes somewhere (no chance I'll find them in this decade) noodling on how to handle it :-). I think you can model it as a stream of alternating (oldest to newest) INSERT, DELETE, INSERT, DELETE, ... When a DELETE catches an INSERT they cancel, but when an INSERT catches a DELETE it "pushes" the DELETE forward until it finds the original INSERT and cancels, leaving the new INSERT alone. Serializing the update as an actual DELETE/INSERT pair made merges work naturally (the DELETE lines up with the earlier INSERT, both are skipped, and the new INSERT is in the right place to carry forward). The space cost was two bytes per term updated, one for the delete's POS_END varint(0), one for the insert's docid delta varint(0). That could be dropped to one byte by adding a POS_UPDATE value in the encoding, with a bit more merge complexity. The main alternative I could see would be to add additional docid arrays per segment to track documents present and documents deleted, which you could intersect to create an additional filter to apply during segment merge. -scott
[sqlite] Performance issue
On 8 Apr 2015, at 10:16am, Jeff Roux wrote: > SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan \ > as item FROM flows WHERE timestamp>=1383770600 AND \ > timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \ > ORDER BY vol DESC LIMIT 6 If you want us to understand what's happening please run ANALYZE then give us the output of EXPLAIN QUERY PLAN Simon.
[sqlite] Performance issue
On Wed, Apr 8, 2015 at 11:16 AM, Jeff Roux wrote: > # For mysql, I use: > /etc/init.d/mysql stop; /etc/init.d/mysql start; \ > time echo 'SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan \ > as item FROM flows WHERE timestamp>=1383770600 AND \ > timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \ > ORDER BY vol DESC LIMIT 6;' | > mysql testperf > If you dropped the LIMIT 6 from both, are the results significantly different? I.e. does actually getting the full result set result in closer numbers between SQLite and MySQL? I'm not sure SQLite's implementation of LIMIT prevents the whole result-set from being sorted before being truncated to 6 rows. Avoiding LIMIT would be one round-about way to find out. --DD
[sqlite] Performance issue
Thanks everyone for your answers, I made some changes to the database according to the information you gave me. It improved the performance of the query by about 20% (the request now takes 4 seconds instead of 5). Here are some more information, regarding all the suggestions I received: - The version of SQLite I used is the one provided by Debian (current stable: wheezy). - I need the database to be opened in Read Only mode, so I did not use the WAL mode. - All the existing indexes cannot be removed because they are used by other queries. - I however removed unused indexes for the following tests results Note that my benchmarks are run in batch, with sqlite3 as with mysql. I stop and start the mysql daemon to avoid most caching (I hope). # For mysql, I use: /etc/init.d/mysql stop; /etc/init.d/mysql start; \ time echo 'SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan \ as item FROM flows WHERE timestamp>=1383770600 AND \ timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \ ORDER BY vol DESC LIMIT 6;' | mysql testperf 2783898050 33722 1374153827 33736 886842830 39155 655809252 51800 363040479 53153 358988337 59757 real0m1.067s user0m0.000s sys 0m0.000s # For sqlite, I use: time echo 'SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan as item FROM flows WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan ORDER BY vol DESC LIMIT 6;' | sqlite3 /var/db/udcast/flow_stats_OA_1M.db 2783898050|33722 1374153827|33736 886842830|39155 655809252|51800 363040479|53153 358988337|59757 real0m4.405s user0m1.812s sys 0m2.580s Here is the time spent in the query according to the number of lines matching the where clause (ANALYZE has been run before): PERIOD (s) MIN TS MAX TS LINESTIME --- 36001384767000 1384770600 351130:00.06 --- 72001384763400 1384770600 676110:00.11 --- 21600 1384749000 1384770600 154592 0:00.69 --- 43200 1384727400 1384770600 270728 0:01.18 --- 86400 1384684200 1384770600 501871 0:02.20 --- all 1383770600 1384770600 100 0:04.44 The 20% improvement is nice, but mysql (even without caching) is still far ahead for the moment (4 times faster). Other ideas are welcome. Thanks again! 2015-04-01 12:52 GMT+02:00 GB : > In case of SELECTing "all available" I recommend invoking a different > statement without the timestamp-part instead of providing some min and max > values for timestamp. This avoids tricking the query planner into some > wrong decisions (if you have an index with protocol as the first column). > > And how about WAL mode? If concurrency is of any concern for you, this > definitely is something worth a try. > > -- GB > > > Jeff Roux schrieb am 31.03.2015 um 12:48: > >> Thanks everyone for the answers. >> >> I won't be able to make some tests today, I will come back to you soon >> with >> additional information. Just to say that, in the worst case, the WHERE >> clause selects the entire data, i.e 100 entries. The user can select a >> time range of 1 hour, 3 hours, 1 day, and ? all available ?. Note: before >> being added in the database, the time stamps are aggregated on a 180 >> second >> period and a lot of rows has the same time stamp (~ one row per TCP >> session). >> >> All the columns are defined as INTEGER. There are 41 columns in total in >> the flow table. If I remember well, there is no primary key defined for >> this table. >> >> 2015-03-31 8:32 GMT+02:00 GB : >> >> From what I see, I assume that timestamp gives the highest selectivity. >>> Taking into account that protocol is SELECTed for and portLan is GROUPed >>> BY, I'd try an index (timestamp, protocol, portLan) (not sure if portLan >>> helps here, but it's worth a try, I think). Don't forget to ANALYZE, of >>> course. Are your colums of INTEGER affinity? If the are of TEXT, they >>> will >>> store anything as TEXT. May make a difference in both space consumption >>> and >>> speed. Is your SQLite lib built with SQLITE_ENABLE_STAT4 enabled? If not, >>> give it a try. It sometimes makes a big difference. >>> >>> Is it possible that data collection and retrieval happen at the same >>> time? >>> If so, try running the database in WAL mode, it should help with >>> concurrency issues. >>> >>> -- GB >>> >>> >>> Jeff Roux schrieb am 30.03.2015 um 11:46: >>> >>> Hi everyone, I have a daemon that collects information and stores it in a SQLite database. The table has 1 million rows. This daemon is
[sqlite] Transpose selected rows into columns
You are welcome. The crosstab followed by a calculation is a common pattern in statistics, political science and accounting (if you added a percent change column after the diff you would have the classic "accounting variance" report). Using an intermediate TABLE or VIEW is an "information hiding" tactic analogous to putting a complex calculation in a subroutine. Jim Callahan Orlando, FL On Wed, Apr 8, 2015 at 1:36 AM, Drago, William @ CSG - NARDA-MITEQ < William.Drago at l-3com.com> wrote: > Jim, > > This works quite well. Thank you. > > And thanks to all others who replied. > > -- > Bill Drago > Senior Engineer > L3 Narda-MITEQ > 435 Moreland Road > Hauppauge, NY 11788 > 631-272-5947 / William.Drago at L-3COM.com > > > > -Original Message- > > From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite- > > users-bounces at mailinglists.sqlite.org] On Behalf Of Jim Morris > > Sent: Tuesday, April 07, 2015 10:07 AM > > To: sqlite-users at mailinglists.sqlite.org > > Subject: Re: [sqlite] Transpose selected rows into columns > > > > You might try > > > > select SerialNumber, V0, V5, V5-V0 > > from > > (select SerialNumber, > > max(case Stim when 'V0' then Resp else null end) V0, > > max(case Stim when 'V5' then Resp else null end) V5 from MyTable > > group by SerialNumber) > > > > > > > > On 4/7/2015 6:58 AM, Drago, William @ CSG - NARDA-MITEQ wrote: > > > Igor, > > > > > > Your solution works well. What I can't figure out is how to > > efficiently create a column representing V5-V0. > > > > > > SerialNumber | V0 | V5 | Vdiff > > > -|---|--|--- > > > 123 | 0.2 | 0.6 | 0.4 > > > > > > > > > This is what I'm using, but it takes twice as long: > > > > > > select SerialNumber, > > > > > > max(case Stim when 'V0' then Resp else null end) V0, > > > max(case Stim when 'V5' then Resp else null end) V5, > > > > > > (max(case Stim when 'V0' then Resp else null end) - > > > max(case Stim when 'V5' then Resp else null end)) Vdiff > > > > > > from MyTable group by SerialNumber; > > > > > > > > > There must be a more efficient way. (I tried V5-V0 and assigning > > > intermediate values to variables but got nothing but errors.) > > > > > > Thanks, > > > -- > > > Bill Drago > > > Senior Engineer > > > L3 Narda-MITEQ > > > 435 Moreland Road > > > Hauppauge, NY 11788 > > > 631-272-5947 / William.Drago at L-3COM.com > > > > > > > > > > > >> -Original Message- > > >> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite- > > >> users-bounces at mailinglists.sqlite.org] On Behalf Of Igor Tandetnik > > >> Sent: Friday, March 27, 2015 3:20 PM > > >> To: sqlite-users at mailinglists.sqlite.org > > >> Subject: Re: [sqlite] Transpose selected rows into columns > > >> > > >> On 3/27/2015 11:48 AM, Drago, William @ CSG - NARDA-MITEQ wrote: > > >>> I want the rows containing V0 and V5 to become columns like this: > > >>> > > >>> SerialNumber | V0 | V5 > > >>> -|---|--- > > >>> 123 | 0.136 | 0.599 > > >>> 124 | 0.126 | 0.587 > > >>> 125 | 0.119 | 0.602 > > >> select SerialNumber, > > >> max(case Stim when 'V0' then Resp else null end) V0, > > >> max(case Stim when 'V5' then Resp else null end) V5 from MyTable > > >> group by SerialNumber; > > >> > > >> -- > > >> Igor Tandetnik > > >> > > >> ___ > > >> sqlite-users mailing list > > >> sqlite-users at mailinglists.sqlite.org > > >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > CONFIDENTIALITY, EXPORT CONTROL AND DISCLAIMER NOTE:This e-mail and > > any attachments are solely for the use of the addressee and may contain > > information that is privileged or confidential. Any disclosure, use or > > distribution of the information contained herein is prohibited. In the > > event this e-mail contains technical data within the definition of the > > International Traffic in Arms Regulations or Export Administration > > Regulations, it is subject to the export control laws of the > > U.S.Government. The recipient should check this e-mail and any > > attachments for the presence of viruses as L-3 does not accept any > > liability associated with the transmission of this e-mail. If you have > > received this communication in error, please notify the sender by reply > > e-mail and immediately delete this message and any attachments. > > > ___ > > > sqlite-users mailing list > > > sqlite-users at mailinglists.sqlite.org > > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > > CONFIDENTIALITY, EXPORT CONTROL AND DISCLAIMER NOTE:This e-mail and any > attachments are solely for the use of the addressee and may contain > information that is privileged or confidential. Any disclosure, use or > distribution of the information contained herein is prohibited. In the > event this e-mail contains
[sqlite] Prevent database file from being overwritten by other processes
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/08/2015 07:18 AM, Fabian Pr?bstl wrote: > Is there an easy way of configuring / changing this so that > SHARED_WRITE is not acquired? Will SQLite even function? Is it just > easier to create a hidden copy and work on that? For something this critical, an alternate approach is to restructure it so you have a service which does the database and similar work, and a separate gui that talks to the service. (Note I mean service in the Windows sense, much like a daemon on Unix.) You can run the service as a different user, which means the gui application user can't even access the file directly, nor cause mayhem. That separate structuring also has other benefits such as allowing for a separate command line client, makes testing easier, and you can put the service and gui on different machines. Multiple instances of the gui can run too which may be very useful. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1 iEYEARECAAYFAlUlVvkACgkQmOOfHg372QTvbgCg2cy3bsDgRj8TiOLbsvWbr8cQ 1VoAn2ZAajXnlQwlsd9mzkf3R7k/racZ =kyDP -END PGP SIGNATURE-
[sqlite] Performance issue
On 4/8/15, Dominique Devienne wrote: > With a LIMIT clause, in > such a GROUP BY ORDER BY returning a large result set, would SQLite: > 1) sort the whole result-set and then keep only the first top-N rows? > 2) or instead do a partial-sort of the first top-N rows only, SQLite must examine all rows of output, obviously. But it only keeps the top-N in memory and only sorts the top-N. If there are a total of M candidate rows and only the top-N are to be displayed, then the algorithm is O(M*logN) in time and O(N) in space. -- D. Richard Hipp drh at sqlite.org
[sqlite] Transpose selected rows into columns
Jim, This works quite well. Thank you. And thanks to all others who replied. -- Bill Drago Senior Engineer L3 Narda-MITEQ 435 Moreland Road Hauppauge, NY 11788 631-272-5947 / William.Drago at L-3COM.com > -Original Message- > From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite- > users-bounces at mailinglists.sqlite.org] On Behalf Of Jim Morris > Sent: Tuesday, April 07, 2015 10:07 AM > To: sqlite-users at mailinglists.sqlite.org > Subject: Re: [sqlite] Transpose selected rows into columns > > You might try > > select SerialNumber, V0, V5, V5-V0 > from > (select SerialNumber, > max(case Stim when 'V0' then Resp else null end) V0, > max(case Stim when 'V5' then Resp else null end) V5 from MyTable > group by SerialNumber) > > > > On 4/7/2015 6:58 AM, Drago, William @ CSG - NARDA-MITEQ wrote: > > Igor, > > > > Your solution works well. What I can't figure out is how to > efficiently create a column representing V5-V0. > > > > SerialNumber | V0 | V5 | Vdiff > > -|---|--|--- > > 123 | 0.2 | 0.6 | 0.4 > > > > > > This is what I'm using, but it takes twice as long: > > > > select SerialNumber, > > > > max(case Stim when 'V0' then Resp else null end) V0, > > max(case Stim when 'V5' then Resp else null end) V5, > > > > (max(case Stim when 'V0' then Resp else null end) - > > max(case Stim when 'V5' then Resp else null end)) Vdiff > > > > from MyTable group by SerialNumber; > > > > > > There must be a more efficient way. (I tried V5-V0 and assigning > > intermediate values to variables but got nothing but errors.) > > > > Thanks, > > -- > > Bill Drago > > Senior Engineer > > L3 Narda-MITEQ > > 435 Moreland Road > > Hauppauge, NY 11788 > > 631-272-5947 / William.Drago at L-3COM.com > > > > > > > >> -Original Message- > >> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite- > >> users-bounces at mailinglists.sqlite.org] On Behalf Of Igor Tandetnik > >> Sent: Friday, March 27, 2015 3:20 PM > >> To: sqlite-users at mailinglists.sqlite.org > >> Subject: Re: [sqlite] Transpose selected rows into columns > >> > >> On 3/27/2015 11:48 AM, Drago, William @ CSG - NARDA-MITEQ wrote: > >>> I want the rows containing V0 and V5 to become columns like this: > >>> > >>> SerialNumber | V0 | V5 > >>> -|---|--- > >>> 123 | 0.136 | 0.599 > >>> 124 | 0.126 | 0.587 > >>> 125 | 0.119 | 0.602 > >> select SerialNumber, > >> max(case Stim when 'V0' then Resp else null end) V0, > >> max(case Stim when 'V5' then Resp else null end) V5 from MyTable > >> group by SerialNumber; > >> > >> -- > >> Igor Tandetnik > >> > >> ___ > >> sqlite-users mailing list > >> sqlite-users at mailinglists.sqlite.org > >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > CONFIDENTIALITY, EXPORT CONTROL AND DISCLAIMER NOTE:This e-mail and > any attachments are solely for the use of the addressee and may contain > information that is privileged or confidential. Any disclosure, use or > distribution of the information contained herein is prohibited. In the > event this e-mail contains technical data within the definition of the > International Traffic in Arms Regulations or Export Administration > Regulations, it is subject to the export control laws of the > U.S.Government. The recipient should check this e-mail and any > attachments for the presence of viruses as L-3 does not accept any > liability associated with the transmission of this e-mail. If you have > received this communication in error, please notify the sender by reply > e-mail and immediately delete this message and any attachments. > > ___ > > sqlite-users mailing list > > sqlite-users at mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > CONFIDENTIALITY, EXPORT CONTROL AND DISCLAIMER NOTE:This e-mail and any attachments are solely for the use of the addressee and may contain information that is privileged or confidential. Any disclosure, use or distribution of the information contained herein is prohibited. In the event this e-mail contains technical data within the definition of the International Traffic in Arms Regulations or Export Administration Regulations, it is subject to the export control laws of the U.S.Government. The recipient should check this e-mail and any attachments for the presence of viruses as L-3 does not accept any liability associated with the transmission of this e-mail. If you have received this communication in error, please notify the sender by reply e-mail and immediately delete this message and any attachments.