[sqlite] Prevent database file from being overwritten by other processes

2015-04-08 Thread Dan Kennedy
On 04/08/2015 09:51 PM, R.Smith wrote:
>
>
> On 2015-04-08 04:18 PM, Fabian Pr?bstl wrote:
>> Hi there!
>>
>> Currently, we are using SQLite as our application file format for a 
>> Windows 7/C#/System.Data.SQLite based desktop application. We only 
>> allow one instance to open the file by running "set 
>> locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" when connecting to the 
>> database.
>
> BEGIN EXCLUSIVE - Locks the database from other SQLite3 database 
> connections for the time being.
> COMMIT; - Unlocks it again - so calling all this in one go is pointless.

Almost always correct. But the "PRAGMA locking_mode=EXCLUSIVE" thing 
changes the behaviour:

   https://www.sqlite.org/pragma.html#pragma_locking_mode






>
> That said, database locking serves only to protect from other database 
> changes... There is no way to prevent a user from intentional messing 
> with any file if they have the privileges to do so. Best practice is 
> to keep the file in your program's assigned /programdata folder or the 
> user folders (/AppData/Roaming/yourApp/ is the usual) - the typical 
> user won't go mess there or even know to look there. Other than that, 
> the entire point of an operating system is to serve its user, not your 
> program - as it should, so you cannot unfortunately protect users 
> against themselves.
>
> If this is to do with your own security being a concern (i.e. you are 
> not trying to safeguard the user) then I would strongly suggest an 
> encryption module or using a DB with user-level locking. (Even then 
> you still won't be able to protect against a willful user deleting, 
> moving, overwriting or otherwise accessing a file).
>
> At a tangent:
> How would you feel if your operating system disallowed you those 
> privileges because some program you installed asked it to? I would 
> change operating systems immediately - Viruses are a big enough 
> problem as it is - imagine being unable to get rid of them...
>
> Good luck!
> Ryan
>
>
>>
>> This all works fine, however a user can still open Windows Explorer 
>> and copy paste a file with the same name but different content (e.g. 
>> an empty file) over an existing, exclusively locked database. From 
>> what I found out with the OpenedFilesView tool, SQLite seems to open 
>> the file with SHARED_WRITE, which explains why *any* process can 
>> overwrite the contents.
>>
>> Is there an easy way of configuring / changing this so that 
>> SHARED_WRITE is not acquired? Will SQLite even function? Is it just 
>> easier to create a hidden copy and work on that?
>>
>> Thanks for the advice
>> Fabian
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Prevent database file from being overwritten by other processes

2015-04-08 Thread R.Smith


On 2015-04-08 06:00 PM, Dan Kennedy wrote:
> On 04/08/2015 10:52 PM, R.Smith wrote:
>>
>>
>> On 2015-04-08 05:38 PM, Dan Kennedy wrote:
>>> On 04/08/2015 09:51 PM, R.Smith wrote:


 On 2015-04-08 04:18 PM, Fabian Pr?bstl wrote:
> Hi there!
>
> Currently, we are using SQLite as our application file format for 
> a Windows 7/C#/System.Data.SQLite based desktop application. We 
> only allow one instance to open the file by running "set 
> locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" when connecting to 
> the database.

 BEGIN EXCLUSIVE - Locks the database from other SQLite3 database 
 connections for the time being.
 COMMIT; - Unlocks it again - so calling all this in one go is 
 pointless.
>>>
>>> Almost always correct. But the "PRAGMA locking_mode=EXCLUSIVE" thing 
>>> changes the behaviour:
>>>
>>>   https://www.sqlite.org/pragma.html#pragma_locking_mode
>>
>> But you need an actual SELECT to get a shared lock and an actual 
>> write operation to lock it exclusively, just starting the transaction 
>> and ending it does nothing to that effect? Or is my understanding wrong?
>
> That's the usual case. But "BEGIN EXCLUSIVE" actually does take an 
> exclusive lock:
>
>   https://www.sqlite.org/lang_transaction.html

Goodness... Where was my mind?!  Thank you Dan and apologies for the 
round trip, and for misleading the OP!
BEGIN EXCLUSIVE will of course instill the lock immediately.





[sqlite] Prevent database file from being overwritten by other processes

2015-04-08 Thread maarten.most...@stakepoint.com

I have the same problem but in order to prevent overriding I actually make a 
hidden copy of the file in the local app directory when opening it. This allows 
me to recover it if the application breaks and to implement save, save As etc. 
and if it is no longer there I can still save where it was.

just an idea

Regards,

Maarten,


> "Fabian Pr?bstl"  |



> Hi there!
> 
> Currently, we are using SQLite as our application file format for a Windows
> 7/C#/System.Data.SQLite based desktop application. We only allow one instance 
> to
> open the file by running "set locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" 
> when
> connecting to the database.
> 
> This all works fine, however a user can still open Windows Explorer and copy 
> paste
> a file with the same name but different content (e.g. an empty file) over an
> existing, exclusively locked database. From what I found out with the
> OpenedFilesView tool, SQLite seems to open the file with SHARED_WRITE, which
> explains why *any* process can overwrite the contents.
> 
> Is there an easy way of configuring / changing this so that SHARED_WRITE is 
> not
> acquired? Will SQLite even function? Is it just easier to create a hidden 
> copy and
> work on that?
> 
> Thanks for the advice
> Fabian
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> 


[sqlite] Prevent database file from being overwritten by other processes

2015-04-08 Thread R.Smith


On 2015-04-08 05:38 PM, Dan Kennedy wrote:
> On 04/08/2015 09:51 PM, R.Smith wrote:
>>
>>
>> On 2015-04-08 04:18 PM, Fabian Pr?bstl wrote:
>>> Hi there!
>>>
>>> Currently, we are using SQLite as our application file format for a 
>>> Windows 7/C#/System.Data.SQLite based desktop application. We only 
>>> allow one instance to open the file by running "set 
>>> locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" when connecting to 
>>> the database.
>>
>> BEGIN EXCLUSIVE - Locks the database from other SQLite3 database 
>> connections for the time being.
>> COMMIT; - Unlocks it again - so calling all this in one go is pointless.
>
> Almost always correct. But the "PRAGMA locking_mode=EXCLUSIVE" thing 
> changes the behaviour:
>
>   https://www.sqlite.org/pragma.html#pragma_locking_mode

But you need an actual SELECT to get a shared lock and an actual write 
operation to lock it exclusively, just starting the transaction and 
ending it does nothing to that effect? Or is my understanding wrong?



[sqlite] Prevent database file from being overwritten by other processes

2015-04-08 Thread Keith Medcalf

In the Amalgamation Source search for the line (around 37836):

  dwShareMode = FILE_SHARE_READ | FILE_SHARE_WRITE;

If you change this to:

  dwShareMode = 0

then use this version of sqlite3.c in your application.  This will open the 
file for "exclusive" access and not shared access.  This means that the file 
will not be able to be read/written/deleted while your application has the file 
open.

"Locking Mode" is not the same as "ShareMode", and the standard library does 
not have the capability to open a database for exclusive (non-shared) access.

It should be noted that even if you open the file in exclusive (non-shared) 
mode, it can still be deleted or renamed while in use -- but the contents 
cannot be changed (or read) by another process while it is open.

---
Theory is when you know everything but nothing works.  Practice is when 
everything works but no one knows why.  Sometimes theory and practice are 
combined:  nothing works and no one knows why.

>-Original Message-
>From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
>bounces at mailinglists.sqlite.org] On Behalf Of Fabian Pr?bstl
>Sent: Wednesday, 8 April, 2015 08:18
>To: sqlite-users at mailinglists.sqlite.org
>Subject: [sqlite] Prevent database file from being overwritten by other
>processes
>
>Hi there!
>
>Currently, we are using SQLite as our application file format for a
>Windows 7/C#/System.Data.SQLite based desktop application. We only allow
>one instance to open the file by running "set
>locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" when connecting to the
>database.
>
>This all works fine, however a user can still open Windows Explorer and
>copy paste a file with the same name but different content (e.g. an empty
>file) over an existing, exclusively locked database. From what I found
>out with the OpenedFilesView tool, SQLite seems to open the file with
>SHARED_WRITE, which explains why *any* process can overwrite the
>contents.
>
>Is there an easy way of configuring / changing this so that SHARED_WRITE
>is not acquired? Will SQLite even function? Is it just easier to create a
>hidden copy and work on that?
>
>Thanks for the advice
>Fabian
>___
>sqlite-users mailing list
>sqlite-users at mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users





[sqlite] Prevent database file from being overwritten by other processes

2015-04-08 Thread R.Smith


On 2015-04-08 04:18 PM, Fabian Pr?bstl wrote:
> Hi there!
>
> Currently, we are using SQLite as our application file format for a Windows 
> 7/C#/System.Data.SQLite based desktop application. We only allow one instance 
> to open the file by running "set locking_mode=EXCLUSIVE;BEGIN 
> EXCLUSIVE;COMMIT" when connecting to the database.

BEGIN EXCLUSIVE - Locks the database from other SQLite3 database 
connections for the time being.
COMMIT; - Unlocks it again - so calling all this in one go is pointless.

That said, database locking serves only to protect from other database 
changes... There is no way to prevent a user from intentional messing 
with any file if they have the privileges to do so. Best practice is to 
keep the file in your program's assigned /programdata folder or the user 
folders (/AppData/Roaming/yourApp/ is the usual) - the typical user 
won't go mess there or even know to look there. Other than that, the 
entire point of an operating system is to serve its user, not your 
program - as it should, so you cannot unfortunately protect users 
against themselves.

If this is to do with your own security being a concern (i.e. you are 
not trying to safeguard the user) then I would strongly suggest an 
encryption module or using a DB with user-level locking. (Even then you 
still won't be able to protect against a willful user deleting, moving, 
overwriting or otherwise accessing a file).

At a tangent:
How would you feel if your operating system disallowed you those 
privileges because some program you installed asked it to? I would 
change operating systems immediately - Viruses are a big enough problem 
as it is - imagine being unable to get rid of them...

Good luck!
Ryan


>
> This all works fine, however a user can still open Windows Explorer and copy 
> paste a file with the same name but different content (e.g. an empty file) 
> over an existing, exclusively locked database. From what I found out with the 
> OpenedFilesView tool, SQLite seems to open the file with SHARED_WRITE, which 
> explains why *any* process can overwrite the contents.
>
> Is there an easy way of configuring / changing this so that SHARED_WRITE is 
> not acquired? Will SQLite even function? Is it just easier to create a hidden 
> copy and work on that?
>
> Thanks for the advice
> Fabian
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Prevent database file from being overwritten by other processes

2015-04-08 Thread Simon Slavin

On 8 Apr 2015, at 3:51pm, R.Smith  wrote:

> BEGIN EXCLUSIVE - Locks the database from other SQLite3 database connections 
> for the time being.
> COMMIT; - Unlocks it again - so calling all this in one go is pointless.
> 
> That said, database locking serves only to protect from other database 
> changes... There is no way to prevent a user from intentional messing with 
> any file if they have the privileges to do so.

Correct.  Part of my testing for setup security is as follows:

Open a sqlite database file in a text editor (NOTEPAD.EXE, TextEdit, whatever). 
 Type some gibberish characters at a few places in the file.  Wait until an app 
is using it via SQLite calls.  Hit 'Save'.

You can't do anything to prevent it.  But your business procedure (whatever 
software you run, whatever corruption testing you do, whatever corruption 
testing your software does) needs to be able to detect the problem and raise an 
alarm.

Simon.


[sqlite] Performance issue

2015-04-08 Thread R.Smith


On 2015-04-08 01:57 PM, Dominique Devienne wrote:
>> No Dominique, it's not that -
>>
> Perhaps. But that still doesn't get to my point. With a LIMIT clause, in
> such a GROUP BY ORDER BY returning a large result set, would SQLite:
> 1) sort the whole result-set and then keep only the first top-N rows?
> 2) or instead do a partial-sort of the first top-N rows only, as in
> http://www.cplusplus.com/reference/algorithm/partial_sort/?
>
> I'm interested in finding out for sure. Perhaps that's highjacking this
> thread a bit, but in case of SQLite doing #1, and MySQL doing #2, it could
> explain some of the difference. (although sorting a 1M array is so fast
> nowadays, I doubt it.).

I think the partial sort algorithm only finds the first N items and then 
stops sorting, but for that to be possible the result set must be 
present in full and finished calculating in full already. The partial 
sort itself might save a millisecond or two from a complete sort in 
large lists. Either way, SQLite is more clever than that as Richard 
pointed out.


> His rows are "fatter", since he mentioned 41 columns. Which might make it
> go over some threshold(s) (page cache?) slowing things down once past it.
>
> But indeed, sharing the DB (if not sensitive data) would be the way to go.
No no, we know his rows' fatness exactly, he did send the schema, they 
are 41 integer values, i.e it doesn't matter.

So yeah, there must be some trivial thing which the OP (and I) are 
missing. Even an obscured values DB that still causes the slow query 
will work...




[sqlite] Prevent database file from being overwritten by other processes

2015-04-08 Thread Hick Gunter
Even if there were a simple way to protect an SQLite db file from being 
casually (or even maliciously) overwritten by a user (which there isn't), it is 
quite impossible to prevent a user with "root privileges" from 
accessing/altering/deleting/moving/renaming ANY file on any operating system 
(that is, after all, what root privileges are designed to do...).

SQLite is designed to work with multiple processes/threads on the same system 
accessing the same file on local storage. If you need to have a special version 
that only allows one thread of one process to open the file you can change the 
appropriate lines in the source code und build your own image. I would strongly 
recommend statically linking your "special" copy of SQLite with your 
application.

-Urspr?ngliche Nachricht-
Von: Fabian Pr?bstl [mailto:Fabian.Proebstl at nanotemper.de]
Gesendet: Mittwoch, 08. April 2015 16:18
An: sqlite-users at mailinglists.sqlite.org
Betreff: [sqlite] Prevent database file from being overwritten by other 
processes

Hi there!

Currently, we are using SQLite as our application file format for a Windows 
7/C#/System.Data.SQLite based desktop application. We only allow one instance 
to open the file by running "set locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" 
when connecting to the database.

This all works fine, however a user can still open Windows Explorer and copy 
paste a file with the same name but different content (e.g. an empty file) over 
an existing, exclusively locked database. From what I found out with the 
OpenedFilesView tool, SQLite seems to open the file with SHARED_WRITE, which 
explains why *any* process can overwrite the contents.

Is there an easy way of configuring / changing this so that SHARED_WRITE is not 
acquired? Will SQLite even function? Is it just easier to create a hidden copy 
and work on that?

Thanks for the advice
Fabian
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: hick at scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.




[sqlite] Performance issue

2015-04-08 Thread Dominique Devienne
On Wed, Apr 8, 2015 at 2:09 PM, Richard Hipp  wrote:

> On 4/8/15, Dominique Devienne  wrote:
> >  With a LIMIT clause, in
> > such a GROUP BY ORDER BY returning a large result set, would SQLite:
> > 1) sort the whole result-set and then keep only the first top-N rows?
> > 2) or instead do a partial-sort of the first top-N rows only,
>
> SQLite must examine all rows of output, obviously.  But it only keeps
> the top-N in memory and only sorts the top-N.  If there are a total of
> M candidate rows and only the top-N are to be displayed, then the
> algorithm is O(M*logN) in time and O(N) in space.
>

Thank you Richard. --DD


[sqlite] Prevent database file from being overwritten by other processes

2015-04-08 Thread Fabian Pröbstl
Hi there!

Currently, we are using SQLite as our application file format for a Windows 
7/C#/System.Data.SQLite based desktop application. We only allow one instance 
to open the file by running "set locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" 
when connecting to the database.

This all works fine, however a user can still open Windows Explorer and copy 
paste a file with the same name but different content (e.g. an empty file) over 
an existing, exclusively locked database. From what I found out with the 
OpenedFilesView tool, SQLite seems to open the file with SHARED_WRITE, which 
explains why *any* process can overwrite the contents.

Is there an easy way of configuring / changing this so that SHARED_WRITE is not 
acquired? Will SQLite even function? Is it just easier to create a hidden copy 
and work on that?

Thanks for the advice
Fabian


[sqlite] Performance issue

2015-04-08 Thread Dominique Devienne
On Wed, Apr 8, 2015 at 1:24 PM, R.Smith  wrote:

> On 2015-04-08 11:35 AM, Dominique Devienne wrote:
>
>> On Wed, Apr 8, 2015 at 11:16 AM, Jeff Roux  wrote:
>>
>>> time echo 'SELECT ... FROM flows WHERE timestamp>=1383770600 AND \
>>>   timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \
>>>   ORDER BY vol DESC LIMIT 6;' |
>>>   mysql testperf
>>>
>>

>  If you dropped the LIMIT 6 from both, are the results
>> significantly different?
>>
>

> No Dominique, it's not that -
>

Perhaps. But that still doesn't get to my point. With a LIMIT clause, in
such a GROUP BY ORDER BY returning a large result set, would SQLite:
1) sort the whole result-set and then keep only the first top-N rows?
2) or instead do a partial-sort of the first top-N rows only, as in
http://www.cplusplus.com/reference/algorithm/partial_sort/?

I'm interested in finding out for sure. Perhaps that's highjacking this
thread a bit, but in case of SQLite doing #1, and MySQL doing #2, it could
explain some of the difference. (although sorting a 1M array is so fast
nowadays, I doubt it.).


there must be another thing wrong with his setup. (He might not see my
> replies because he uses gmail).
>

Sure. I don't dispute that.


> In fact, the entire script, making the table, adding the index, populating
> it with a million rows (with computed values no less) and then doing the
> query plus posting the output - ALL of it takes less than 4 seconds
> together: (Try it yourself)
>

His rows are "fatter", since he mentioned 41 columns. Which might make it
go over some threshold(s) (page cache?) slowing things down once past it.

But indeed, sharing the DB (if not sensitive data) would be the way to go.
--DD


[sqlite] Performance issue

2015-04-08 Thread R.Smith


On 2015-04-08 11:35 AM, Dominique Devienne wrote:
> On Wed, Apr 8, 2015 at 11:16 AM, Jeff Roux  wrote:
>
>> # For mysql, I use:
>> /etc/init.d/mysql stop; /etc/init.d/mysql start; \
>> time echo 'SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan \
>>   as item FROM flows WHERE timestamp>=1383770600 AND \
>>   timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \
>>   ORDER BY vol DESC LIMIT 6;' |
>>   mysql testperf
>>
> If you dropped the LIMIT 6 from both, are the results significantly
> different?
>
> I.e. does actually getting the full result set result in closer numbers
> between SQLite and MySQL?
>
> I'm not sure SQLite's implementation of LIMIT prevents the whole result-set
> from being sorted before being truncated to 6 rows. Avoiding LIMIT would be
> one round-about way to find out. --DD

No Dominique, it's not that - there must be another thing wrong with his 
setup. (He might not see my replies because he uses gmail).

We will need an actual DB file from the OP with the 1 mil records to 
test and understand the problem because there is something that isn't 
clear or not being said (for deemed unimportant probably).

Again, here is a script that makes a similar table, populate it with 1 
million rows, adds the correct index and then do the aggregate query on 
all of them. It takes less than 500 milliseconds - faster than MySQL 
sans caching. (Obviously my pragmas/compile options may make a 
difference). Even if there are much more data in the table per row, it 
can't cause an order of magnitude increase in time - and it doesn't seem 
to be the case from the schema the OP posted.

In fact, the entire script, making the table, adding the index, 
populating it with a million rows (with computed values no less) and 
then doing the query plus posting the output - ALL of it takes less than 
4 seconds together:
(Try it yourself)

Important to note here is that the query I adapted to actually include 
the entire dataset (all 1 mil rows) so the Index is of little value. It 
only matters where the WHERE clause refers less rows - in which case the 
time decreases linearly.

   -- 


DROP TABLE IF EXISTS flows;
CREATE TABLE flows(
   id INTEGER PRIMARY KEY,
   ipLan TEXT,
   ipWan TEXT,
   portLan INT,
   portWan INT,
   protocol INT,
   nbBytesDecompOut INT,
   nbBytesCompIn INT,
   tos INT,
   timestamp INT
);

WITH acc(x,mx8,dx8,mxK,dxK,rK) AS (
 SELECT 0,0,0,0,0,100
   UNION ALL
 SELECT x+1, (x%8), CAST(round(x/8,0) AS INT), (x%1024), 
CAST(round(x/1024,0) AS INT),
CAST(abs(round(random()/10240,0)) AS INT)
   FROM acc
  WHERE (x<100)  -- Testing 1 million rows
)
INSERT INTO flows SELECT
   x, '192.168.1.'||mx8, '55.66.77.'||(dx8%256),  -- ipLan, ipWan
   1000+mx8, 5000+mx8, (x%18),-- portlan, portWan, 
protocol
   64+(rk%1024000),   -- nbBytesDecompOut
   1024+(rk%1024000), -- nbBytesDecompIn
   (dx8%3), (138000+x)-- tos, timestamp
   FROM acc;

CREATE INDEX idxFlowTimeProt ON flows (timestamp, protocol);
SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item, 
count(*) as 'No.'
   FROM  flows
  WHERE timestamp>=1373770600 AND timestamp<=1484770600 AND protocol IN 
(17, 6)
  GROUP BY portLan ORDER BY vol DESC
  LIMIT 6;

   -- volitem
   -- ---
   -- 140205511341001
   -- 139645375081000
   -- 139401283301003
   -- 139381342321002
   -- 139308617041005
   -- 139138037521006

   --Item Stats:  Item No:   5 Query Size 
(Chars):  232
   -- Result Columns:3 Result Rows: 6
   -- VM Work Steps: 8833543   Rows 
Modified:   0
   -- Sort Operations:   2 Table-Scan Steps:0
   -- Prepare Time:  -- --- --- --- --.
   -- Query Run: 0d 00h 00m and 00.408s
   -- Full Query Time:   0d 00h 00m and 00.408s
   -- Query Result:  Success.
   -- 


   --   Script Stats: Total Script Execution Time: 0d 00h 00m and 
03.742s
   -- Total Script Query Time: 0d 00h 00m and 
03.620s
   -- Total Database Rows Changed: 101
   -- Total Virtual-Machine Steps: 76833701
   -- Last executed Item Index:5
   -- Last Script Error:
   -- 








[sqlite] fts5

2015-04-08 Thread Scott Hess
On Wed, Apr 8, 2015 at 12:32 PM, Dan Kennedy  wrote:
> On 04/08/2015 04:49 AM, Scott Hess wrote:
>> Something that bugged me a lot was that I had used deletion markers to
>> cancel out hits, but did not provide a way for deletion markers to
>> cancel out.  The main problem with this was that a large delete would
>> stay in the system until it reached the final segment, even if it had
>> already overtaken all of the original inserts.  I wished that I had
>> either maintained a separate structure tracking _document_ deletion
>> (which would make merges somewhat more complicated because they
>> wouldn't be term-centric), or code updates as "delete+insert".  In the
>> latter case deletes could drop out at the point where they reached the
>> original insert.
>
> Thanks for this. The "delete+insert" idea sounds like quite an interesting
> one.
>
> So instead of just "delete" and "insert" keys, the merge tree now also
> contains "delete+insert" keys (call them "update" keys). Then maintain the
> tree so that
>
>   (a) for each "insert", the next youngest duplicate key must either not
> exist or be a "delete",
>   (b) for each "update", the next youngest duplicate key must exist and must
> be an "insert" or "update", and
>   (c) for each "delete", the next youngest duplicate key must exist and must
> be an "insert" or "update".
>
> And as a result, when a "delete" catches up with an "insert" while merging
> they can both be discarded. Instead of the current situation, where we
> retain the "delete" unless the output segment is the oldest in the database.
> Cool.

Yes, I think that's it.  I have notes somewhere (no chance I'll find
them in this decade) noodling on how to handle it :-).  I think you
can model it as a stream of alternating (oldest to newest) INSERT,
DELETE, INSERT, DELETE, ...  When a DELETE catches an INSERT they
cancel, but when an INSERT catches a DELETE it "pushes" the DELETE
forward until it finds the original INSERT and cancels, leaving the
new INSERT alone.  Serializing the update as an actual DELETE/INSERT
pair made merges work naturally (the DELETE lines up with the earlier
INSERT, both are skipped, and the new INSERT is in the right place to
carry forward).  The space cost was two bytes per term updated, one
for the delete's POS_END varint(0), one for the insert's docid delta
varint(0).  That could be dropped to one byte by adding a POS_UPDATE
value in the encoding, with a bit more merge complexity.

The main alternative I could see would be to add additional docid
arrays per segment to track documents present and documents deleted,
which you could intersect to create an additional filter to apply
during segment merge.

-scott


[sqlite] Performance issue

2015-04-08 Thread Simon Slavin

On 8 Apr 2015, at 10:16am, Jeff Roux  wrote:

> SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan \
> as item FROM flows WHERE timestamp>=1383770600 AND \
> timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \
> ORDER BY vol DESC LIMIT 6

If you want us to understand what's happening please run ANALYZE then give us 
the output of

EXPLAIN QUERY PLAN 

Simon.


[sqlite] Performance issue

2015-04-08 Thread Dominique Devienne
On Wed, Apr 8, 2015 at 11:16 AM, Jeff Roux  wrote:

> # For mysql, I use:
> /etc/init.d/mysql stop; /etc/init.d/mysql start; \
> time echo 'SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan \
>  as item FROM flows WHERE timestamp>=1383770600 AND \
>  timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \
>  ORDER BY vol DESC LIMIT 6;' |
>  mysql testperf
>

If you dropped the LIMIT 6 from both, are the results significantly
different?

I.e. does actually getting the full result set result in closer numbers
between SQLite and MySQL?

I'm not sure SQLite's implementation of LIMIT prevents the whole result-set
from being sorted before being truncated to 6 rows. Avoiding LIMIT would be
one round-about way to find out. --DD


[sqlite] Performance issue

2015-04-08 Thread Jeff Roux
Thanks everyone for your answers,

I made some changes to the database according to the information you
gave me. It improved the performance of the query by about 20% (the
request now takes 4 seconds instead of 5).

Here are some more information, regarding all the suggestions I
received:

  - The version of SQLite I used is the one provided by Debian (current
stable: wheezy).

  - I need the database to be opened in Read Only mode, so I did not
use the WAL mode.

  - All the existing indexes cannot be removed because they are used
by other queries.

  - I however removed unused indexes for the following tests results

Note that my benchmarks are run in batch, with sqlite3 as with
mysql. I stop and start the mysql daemon to avoid most caching (I
hope).

# For mysql, I use:
/etc/init.d/mysql stop; /etc/init.d/mysql start; \
time echo 'SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan \
 as item FROM flows WHERE timestamp>=1383770600 AND \
 timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \
 ORDER BY vol DESC LIMIT 6;' |
 mysql testperf

2783898050  33722
1374153827  33736
886842830   39155
655809252   51800
363040479   53153
358988337   59757

real0m1.067s
user0m0.000s
sys 0m0.000s


# For sqlite, I use:
time echo 'SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan
 as item FROM flows WHERE timestamp>=1383770600 AND
 timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan
 ORDER BY vol DESC LIMIT 6;' |
 sqlite3 /var/db/udcast/flow_stats_OA_1M.db

2783898050|33722
1374153827|33736
886842830|39155
655809252|51800
363040479|53153
358988337|59757

real0m4.405s
user0m1.812s
sys 0m2.580s

Here is the time spent in the query according to the number of lines
matching the where clause (ANALYZE has been run before):

PERIOD (s)  MIN TS   MAX TS   LINESTIME
---
36001384767000   1384770600   351130:00.06
---
72001384763400   1384770600   676110:00.11
---
21600   1384749000   1384770600   154592   0:00.69
---
43200   1384727400   1384770600   270728   0:01.18
---
86400   1384684200   1384770600   501871   0:02.20
---
all 1383770600   1384770600   100  0:04.44


The 20% improvement is nice, but mysql (even without caching) is still
far ahead for the moment (4 times faster). Other ideas are welcome.

Thanks again!


2015-04-01 12:52 GMT+02:00 GB :

> In case of SELECTing "all available" I recommend invoking a different
> statement without the timestamp-part instead of providing some min and max
> values for timestamp. This avoids tricking the query planner into some
> wrong decisions (if you have an index with protocol as the first column).
>
> And how about WAL mode? If concurrency is of any concern for you, this
> definitely is something worth a try.
>
> -- GB
>
>
> Jeff Roux schrieb am 31.03.2015 um 12:48:
>
>> Thanks everyone for the answers.
>>
>> I won't be able to make some tests today, I will come back to you soon
>> with
>> additional information. Just to say that, in the worst case, the WHERE
>> clause selects the entire data, i.e 100 entries. The user can select a
>> time range of 1 hour, 3 hours, 1 day, and ? all available ?. Note: before
>> being added in the database, the time stamps are aggregated on a 180
>> second
>> period and a lot of rows has the same time stamp (~ one row per TCP
>> session).
>>
>> All the columns are defined as INTEGER. There are 41 columns in total in
>> the flow table. If I remember well, there is no primary key defined for
>> this table.
>>
>> 2015-03-31 8:32 GMT+02:00 GB :
>>
>>   From what I see, I assume that timestamp gives the highest selectivity.
>>> Taking into account that protocol is SELECTed for and portLan is GROUPed
>>> BY, I'd try an index (timestamp, protocol, portLan) (not sure if portLan
>>> helps here, but it's worth a try, I think). Don't forget to ANALYZE, of
>>> course. Are your colums of INTEGER affinity? If the are of TEXT, they
>>> will
>>> store anything as TEXT. May make a difference in both space consumption
>>> and
>>> speed. Is your SQLite lib built with SQLITE_ENABLE_STAT4 enabled? If not,
>>> give it a try. It sometimes makes a big difference.
>>>
>>> Is it possible that data collection and retrieval happen at the same
>>> time?
>>> If so, try running the database in WAL mode, it should help with
>>> concurrency issues.
>>>
>>> -- GB
>>>
>>>
>>> Jeff Roux schrieb am 30.03.2015 um 11:46:
>>>
>>>  Hi everyone,

 I have a daemon that collects information and stores it in a SQLite
 database. The table has 1 million rows.

 This daemon is 

[sqlite] Transpose selected rows into columns

2015-04-08 Thread Jim Callahan
You are welcome.

The crosstab followed by a calculation is a common pattern in statistics,
political science and accounting (if you added a percent change column
after the diff you would have the classic "accounting variance" report).

Using an intermediate TABLE or VIEW is an "information hiding" tactic
analogous to putting a complex calculation in a subroutine.

Jim Callahan
Orlando, FL



On Wed, Apr 8, 2015 at 1:36 AM, Drago, William @ CSG - NARDA-MITEQ <
William.Drago at l-3com.com> wrote:

> Jim,
>
> This works quite well. Thank you.
>
> And thanks to all others who replied.
>
> --
> Bill Drago
> Senior Engineer
> L3 Narda-MITEQ
> 435 Moreland Road
> Hauppauge, NY 11788
> 631-272-5947 / William.Drago at L-3COM.com
>
>
> > -Original Message-
> > From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-
> > users-bounces at mailinglists.sqlite.org] On Behalf Of Jim Morris
> > Sent: Tuesday, April 07, 2015 10:07 AM
> > To: sqlite-users at mailinglists.sqlite.org
> > Subject: Re: [sqlite] Transpose selected rows into columns
> >
> > You might try
> >
> > select SerialNumber, V0, V5, V5-V0
> > from
> > (select SerialNumber,
> > max(case Stim when 'V0' then Resp else null end) V0,
> > max(case Stim when 'V5' then Resp else null end) V5 from MyTable
> > group by SerialNumber)
> >
> >
> >
> > On 4/7/2015 6:58 AM, Drago, William @ CSG - NARDA-MITEQ wrote:
> > > Igor,
> > >
> > > Your solution works well. What I can't figure out is how to
> > efficiently create a column representing V5-V0.
> > >
> > > SerialNumber |  V0   |  V5  | Vdiff
> > > -|---|--|---
> > > 123  |  0.2  |  0.6 |  0.4
> > >
> > >
> > > This is what I'm using, but it takes twice as long:
> > >
> > > select SerialNumber,
> > >
> > > max(case Stim when 'V0' then Resp else null end) V0,
> > > max(case Stim when 'V5' then Resp else null end) V5,
> > >
> > > (max(case Stim when 'V0' then Resp else null end) -
> > > max(case Stim when 'V5' then Resp else null end)) Vdiff
> > >
> > > from MyTable group by SerialNumber;
> > >
> > >
> > > There must be a more efficient way. (I tried V5-V0 and assigning
> > > intermediate values to variables but got nothing but errors.)
> > >
> > > Thanks,
> > > --
> > > Bill Drago
> > > Senior Engineer
> > > L3 Narda-MITEQ
> > > 435 Moreland Road
> > > Hauppauge, NY 11788
> > > 631-272-5947 / William.Drago at L-3COM.com
> > >
> > >
> > >
> > >> -Original Message-
> > >> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-
> > >> users-bounces at mailinglists.sqlite.org] On Behalf Of Igor Tandetnik
> > >> Sent: Friday, March 27, 2015 3:20 PM
> > >> To: sqlite-users at mailinglists.sqlite.org
> > >> Subject: Re: [sqlite] Transpose selected rows into columns
> > >>
> > >> On 3/27/2015 11:48 AM, Drago, William @ CSG - NARDA-MITEQ wrote:
> > >>> I want the rows containing V0 and V5 to become columns like this:
> > >>>
> > >>> SerialNumber |  V0   |  V5
> > >>> -|---|---
> > >>> 123  | 0.136 | 0.599
> > >>> 124  | 0.126 | 0.587
> > >>> 125  | 0.119 | 0.602
> > >> select SerialNumber,
> > >> max(case Stim when 'V0' then Resp else null end) V0,
> > >> max(case Stim when 'V5' then Resp else null end) V5 from MyTable
> > >> group by SerialNumber;
> > >>
> > >> --
> > >> Igor Tandetnik
> > >>
> > >> ___
> > >> sqlite-users mailing list
> > >> sqlite-users at mailinglists.sqlite.org
> > >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > > CONFIDENTIALITY, EXPORT CONTROL AND DISCLAIMER NOTE:This e-mail and
> > any attachments are solely for the use of the addressee and may contain
> > information that is privileged or confidential. Any disclosure, use or
> > distribution of the information contained herein is prohibited. In the
> > event this e-mail contains technical data within the definition of the
> > International Traffic in Arms Regulations or Export Administration
> > Regulations, it is subject to the export control laws of the
> > U.S.Government. The recipient should check this e-mail and any
> > attachments for the presence of viruses as L-3 does not accept any
> > liability associated with the transmission of this e-mail. If you have
> > received this communication in error, please notify the sender by reply
> > e-mail and immediately delete this message and any attachments.
> > > ___
> > > sqlite-users mailing list
> > > sqlite-users at mailinglists.sqlite.org
> > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
>
> CONFIDENTIALITY, EXPORT CONTROL AND DISCLAIMER NOTE:This e-mail and any
> attachments are solely for the use of the addressee and may contain
> information that is privileged or confidential. Any disclosure, use or
> distribution of the information contained herein is prohibited. In the
> event this e-mail contains 

[sqlite] Prevent database file from being overwritten by other processes

2015-04-08 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/08/2015 07:18 AM, Fabian Pr?bstl wrote:
> Is there an easy way of configuring / changing this so that
> SHARED_WRITE is not acquired? Will SQLite even function? Is it just
> easier to create a hidden copy and work on that?

For something this critical, an alternate approach is to restructure
it so you have a service which does the database and similar work, and
a separate gui that talks to the service.  (Note I mean service in the
Windows sense, much like a daemon on Unix.)

You can run the service as a different user, which means the gui
application user can't even access the file directly, nor cause mayhem.

That separate structuring also has other benefits such as allowing for
a separate command line client, makes testing easier, and you can put
the service and gui on different machines.  Multiple instances of the
gui can run too which may be very useful.

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAlUlVvkACgkQmOOfHg372QTvbgCg2cy3bsDgRj8TiOLbsvWbr8cQ
1VoAn2ZAajXnlQwlsd9mzkf3R7k/racZ
=kyDP
-END PGP SIGNATURE-


[sqlite] Performance issue

2015-04-08 Thread Richard Hipp
On 4/8/15, Dominique Devienne  wrote:
>  With a LIMIT clause, in
> such a GROUP BY ORDER BY returning a large result set, would SQLite:
> 1) sort the whole result-set and then keep only the first top-N rows?
> 2) or instead do a partial-sort of the first top-N rows only,

SQLite must examine all rows of output, obviously.  But it only keeps
the top-N in memory and only sorts the top-N.  If there are a total of
M candidate rows and only the top-N are to be displayed, then the
algorithm is O(M*logN) in time and O(N) in space.
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Transpose selected rows into columns

2015-04-08 Thread Drago, William @ CSG - NARDA-MITEQ
Jim,

This works quite well. Thank you.

And thanks to all others who replied.

--
Bill Drago
Senior Engineer
L3 Narda-MITEQ
435 Moreland Road
Hauppauge, NY 11788
631-272-5947 / William.Drago at L-3COM.com


> -Original Message-
> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-
> users-bounces at mailinglists.sqlite.org] On Behalf Of Jim Morris
> Sent: Tuesday, April 07, 2015 10:07 AM
> To: sqlite-users at mailinglists.sqlite.org
> Subject: Re: [sqlite] Transpose selected rows into columns
>
> You might try
>
> select SerialNumber, V0, V5, V5-V0
> from
> (select SerialNumber,
> max(case Stim when 'V0' then Resp else null end) V0,
> max(case Stim when 'V5' then Resp else null end) V5 from MyTable
> group by SerialNumber)
>
>
>
> On 4/7/2015 6:58 AM, Drago, William @ CSG - NARDA-MITEQ wrote:
> > Igor,
> >
> > Your solution works well. What I can't figure out is how to
> efficiently create a column representing V5-V0.
> >
> > SerialNumber |  V0   |  V5  | Vdiff
> > -|---|--|---
> > 123  |  0.2  |  0.6 |  0.4
> >
> >
> > This is what I'm using, but it takes twice as long:
> >
> > select SerialNumber,
> >
> > max(case Stim when 'V0' then Resp else null end) V0,
> > max(case Stim when 'V5' then Resp else null end) V5,
> >
> > (max(case Stim when 'V0' then Resp else null end) -
> > max(case Stim when 'V5' then Resp else null end)) Vdiff
> >
> > from MyTable group by SerialNumber;
> >
> >
> > There must be a more efficient way. (I tried V5-V0 and assigning
> > intermediate values to variables but got nothing but errors.)
> >
> > Thanks,
> > --
> > Bill Drago
> > Senior Engineer
> > L3 Narda-MITEQ
> > 435 Moreland Road
> > Hauppauge, NY 11788
> > 631-272-5947 / William.Drago at L-3COM.com
> >
> >
> >
> >> -Original Message-
> >> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-
> >> users-bounces at mailinglists.sqlite.org] On Behalf Of Igor Tandetnik
> >> Sent: Friday, March 27, 2015 3:20 PM
> >> To: sqlite-users at mailinglists.sqlite.org
> >> Subject: Re: [sqlite] Transpose selected rows into columns
> >>
> >> On 3/27/2015 11:48 AM, Drago, William @ CSG - NARDA-MITEQ wrote:
> >>> I want the rows containing V0 and V5 to become columns like this:
> >>>
> >>> SerialNumber |  V0   |  V5
> >>> -|---|---
> >>> 123  | 0.136 | 0.599
> >>> 124  | 0.126 | 0.587
> >>> 125  | 0.119 | 0.602
> >> select SerialNumber,
> >> max(case Stim when 'V0' then Resp else null end) V0,
> >> max(case Stim when 'V5' then Resp else null end) V5 from MyTable
> >> group by SerialNumber;
> >>
> >> --
> >> Igor Tandetnik
> >>
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users at mailinglists.sqlite.org
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > CONFIDENTIALITY, EXPORT CONTROL AND DISCLAIMER NOTE:This e-mail and
> any attachments are solely for the use of the addressee and may contain
> information that is privileged or confidential. Any disclosure, use or
> distribution of the information contained herein is prohibited. In the
> event this e-mail contains technical data within the definition of the
> International Traffic in Arms Regulations or Export Administration
> Regulations, it is subject to the export control laws of the
> U.S.Government. The recipient should check this e-mail and any
> attachments for the presence of viruses as L-3 does not accept any
> liability associated with the transmission of this e-mail. If you have
> received this communication in error, please notify the sender by reply
> e-mail and immediately delete this message and any attachments.
> > ___
> > sqlite-users mailing list
> > sqlite-users at mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

CONFIDENTIALITY, EXPORT CONTROL AND DISCLAIMER NOTE:This e-mail and any 
attachments are solely for the use of the addressee and may contain information 
that is privileged or confidential. Any disclosure, use or distribution of the 
information contained herein is prohibited. In the event this e-mail contains 
technical data within the definition of the International Traffic in Arms 
Regulations or Export Administration Regulations, it is subject to the export 
control laws of the U.S.Government. The recipient should check this e-mail and 
any attachments for the presence of viruses as L-3 does not accept any 
liability associated with the transmission of this e-mail. If you have received 
this communication in error, please notify the sender by reply e-mail and 
immediately delete this message and any attachments.