[sqlite] Stuck and drowning trying to work out some SQL.

2014-11-01 Thread Rob Willett
I have a SQL problem I’m trying to solve and realise its now gone wyyy 
beyond my meagre SQL talents.

A bit of context

The dataset I’m looking at looks at traffic problems. Every five mins I get a 
new dataset of traffic updates. Some of the new dataset is a continuation of 
the old data, e.g. there may be an ongoing collision, some of the data is new, 
e.g. a new collision has just happened and some of the data is saying the 
collision is now over. This can be likened to a simple state table. There is no 
state of New, just the fact that something is now active and has appeared.

The simplified data table (Table_1) where this is recorded looks like this. 
I’ve split the Time Events to highlight each file import.

Disruption_id | Time_Event | State

1 | 1  | Active
2 | 1  | Active
3 | 1  | Active

1 | 2  | Active
3 | 2  | Closed
4 | 2  | Active

1 | 3  | Active
2 | 3  | Active
3 | 3  | Closed
4 | 3  | Active

1 | 4  | Closed
2 | 4  | Active
3 | 4  | Closed
4 | 4  | Closed
5 | 4  | Active

1 | 5  | Closed
2 | 5  | Closed
3 | 5  | Closed
4 | 5  | Closed
5 | 5  | Active

1. Disruption_Id is always correct, a Disruption_Id only appears once in any 
file import and is never reused. 

2. The Time Event is the file in which the data comes in. In the above example 
there are five input files spread across 25 mins, I have simplified the 
Time_Event to be a sequential set of numbers.

3. The omission of the Disruption_id=2 in Time_Event=2 is deliberate as the 
data is not always complete.

4. A disruption_id may continue to appear as Closed in subsequent file imports, 
e.g. DisruptionId=3. I am only really interested in the first Close.

What I’m trying to do is to find out how long each disruption occurred for.

I’d like to produce an output table based on the above data set that looks a 
bit like this.

Disruption_id | Start Time | End Time
1 |  1 |   4
2 |  1 |   5
3 |  1 |   2
4 |  2 |   4

1. Disruption_id would only appear in one row.

2. Any events that are still Active, e.g. Disruption_Id=5 in the above example 
would not be shown.

3. The first Closed event is the one that is chosen as the end_time

This would tell me that Disruption_id = 1 ran between Time_Event 1 and 4, 
Disruption_id = 3 ran between Time_Event 1 and 2.

Even I can easily create a simple piece of SQL to find the Max or Min of each 
disruption_id for a specific id, e.g. select max(Time_Event) from Table_1 where 
Disruption_id = 1.

I struggle when I try to pull it all together so that I have one piece of SQL 
that does all the work. I’ve tried searching for this on Google and in this 
mail group, but I can’t even describe what I want to do without giving an 
example which does limit my searches down. I’ve spent a day looking and writing 
junk SQL code and come to the conclusion I haven’t a clue.

Any suggestions gratefully welcomed along with pointers to where to look. Is 
this even possible? 

I’m happy to sit down and work through some examples but I can’t even find them 
to learn from.

Thanks for taking the time to read this far.

Rob.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Stuck and drowning trying to work out some SQL. - Thanks

2014-11-01 Thread Rob Willett
Simon, Ivar, Clemens, Aleksey,

Thank you for taking the time to reply to a drowning man. I have been offered 
four lifebelts. I've tested three of them out and they all work slightly 
differently but I'm very happy to work through them and to see where the 
advantages and disadvantages of each of them are. 

As an aside the table it works on currently has 2,090,518 records in it, with 
1,667 unique Disruption_ids.  The time to execute the three different proof of 
concept queries were approx each an order of 0.7 of a magnitude greater than 
each other, one took 700 secs, one took around 93 seconds and one took around 3 
secs.  I cannot say that the faster is the better answer or the slowest is the 
worse as further work is needed to refine the query and it would be churlish, 
unfair and inappropriate to say which was whose as people would jump to the 
wrong conclusion.

I really appreciate the excellent and helpful answers provided. I have improved 
my SQL and learnt new stuff today which is always nice.

Best wishes, 

Rob

On 1 Nov 2014, at 10:19, Rob Willett <rob.sql...@robertwillett.com> wrote:

> I have a SQL problem I’m trying to solve and realise its now gone wyyy 
> beyond my meagre SQL talents.
> 
> A bit of context
> 
> The dataset I’m looking at looks at traffic problems. Every five mins I get a 
> new dataset of traffic updates. Some of the new dataset is a continuation of 
> the old data, e.g. there may be an ongoing collision, some of the data is 
> new, e.g. a new collision has just happened and some of the data is saying 
> the collision is now over. This can be likened to a simple state table. There 
> is no state of New, just the fact that something is now active and has 
> appeared.
> 
> The simplified data table (Table_1) where this is recorded looks like this. 
> I’ve split the Time Events to highlight each file import.
> 
> Disruption_id | Time_Event | State
> 
> 1 | 1  | Active
> 2 | 1  | Active
> 3 | 1  | Active
> 
> 1 | 2  | Active
> 3 | 2  | Closed
> 4 | 2  | Active
> 
> 1 | 3  | Active
> 2 | 3  | Active
> 3 | 3  | Closed
> 4 | 3  | Active
> 
> 1 | 4  | Closed
> 2 | 4  | Active
> 3 | 4  | Closed
> 4 | 4  | Closed
> 5 | 4  | Active
> 
> 1 | 5  | Closed
> 2 | 5  | Closed
> 3 | 5  | Closed
> 4 | 5  | Closed
> 5 | 5  | Active
> 
> 1. Disruption_Id is always correct, a Disruption_Id only appears once in any 
> file import and is never reused. 
> 
> 2. The Time Event is the file in which the data comes in. In the above 
> example there are five input files spread across 25 mins, I have simplified 
> the Time_Event to be a sequential set of numbers.
> 
> 3. The omission of the Disruption_id=2 in Time_Event=2 is deliberate as the 
> data is not always complete.
> 
> 4. A disruption_id may continue to appear as Closed in subsequent file 
> imports, e.g. DisruptionId=3. I am only really interested in the first Close.
> 
> What I’m trying to do is to find out how long each disruption occurred for.
> 
> I’d like to produce an output table based on the above data set that looks a 
> bit like this.
> 
> Disruption_id | Start Time | End Time
> 1 |  1 |   4
> 2 |  1 |   5
> 3 |  1 |   2
> 4 |  2 |   4
> 
> 1. Disruption_id would only appear in one row.
> 
> 2. Any events that are still Active, e.g. Disruption_Id=5 in the above 
> example would not be shown.
> 
> 3. The first Closed event is the one that is chosen as the end_time
> 
> This would tell me that Disruption_id = 1 ran between Time_Event 1 and 4, 
> Disruption_id = 3 ran between Time_Event 1 and 2.
> 
> Even I can easily create a simple piece of SQL to find the Max or Min of each 
> disruption_id for a specific id, e.g. select max(Time_Event) from Table_1 
> where Disruption_id = 1.
> 
> I struggle when I try to pull it all together so that I have one piece of SQL 
> that does all the work. I’ve tried searching for this on Google and in this 
> mail group, but I can’t even describe what I want to do without giving an 
> example which does limit my searches down. I’ve spent a day looking and 
> writing junk SQL code and come to the conclusion I haven’t a clue.
> 
> Any suggestions gratefully welcomed along with pointers to where to look. Is 
> this even possible? 
> 
> I’m happy to si

Re: [sqlite] Stuck and drowning trying to work out some SQL.

2014-11-01 Thread Rob Willett
OK, thanks. A little more optimisation :)

> On 1 Nov 2014, at 17:39, Igor Tandetnik  wrote:
> 
> On 11/1/2014 11:52 AM, Luuk wrote:
>> Is the 'else null' part needed??, or can it be deleted
> 
> Yes, it can be removed. CASE expression returns null when no case matches.
> -- 
> Igor Tandetnik
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Messages posted on Nabble not getting to list

2016-05-27 Thread Rob Willett

I agree with Tim.

I filter all my SQLite messages to its own folder and read as needed.

I prefer this method to a forum as I can then keep the messages with me. 
I’m often off the internet and its good to have them for reading. Also 
I can look back through them and get hints and tips about issues.


Its not perfect but it works for me. I get around the distraction of 
emails coming in through client filtering, I did start filtering at the 
server so that only important stuff came through immediately but that 
didn’t work well.  I now use Mailmate and like the power and lack of 
graphical eye candy it offers.


I’d actively vote against a SQLite forum rather than e-mail (if 
anybody asked me, which I doubt…) SQLite is important to our work so 
having it around is fine by me,


YMMV

Rob


On 27 May 2016, at 10:34, Tim Streater wrote:

On 27 May 2016 at 08:56, Darren Duncan  
wrote:



On 2016-05-26 9:00 PM, Balaji Ramanathan wrote:
The main advantage of forums, and I follow a bunch of them, is that 
I choose
when I want to stop my regular day job and be distracted by them 
rather than

emails coming in and distracting me all the time.


That's not an argument for web forums, rather that's an argument for 
not using
your work email to subscribe to non-work discussion lists; use a 
non-work email
for the discussion lists instead.  You can also configure your email 
client to
only check email when you tell it to rather than constantly. -- 
Darren Duncan


Filter the sqlite mails into their own mailbox. They can then be read 
at a convenient moment.


I certainly don't want a whole lot of sub-forums and the like, each of 
which has to be checked in case there's something interesting there.


--
Cheers  --  Tim
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Regarding testing

2015-04-27 Thread Rob Willett
You didn?t read the link properly. I?ll highlight the relevant sections 
The TCL Tests are the oldest set of tests for SQLite. They are contained in the 
same source tree as the SQLite core and like the SQLite core are in the public 
domain. The TCL tests are the primary tests used during development. The TCL 
tests are written using the TCL scripting language . The 
TCL test harness itself consists of 23.1 KSLOC of C code used to create the TCL 
interface. The test scripts are contained in 825 files totaling 10.9MB in size. 
There are 33402 distinct test cases, but many of the test cases are 
parameterized and run multiple times (with different parameters) so that on a 
full test run millions of separate tests are performed.

The TH3  test harness is a set of proprietary 
tests, written in C that provide 100% branch test coverage (and 100% MC/DC test 
coverage) to the core SQLite library. The TH3 tests are designed to run on 
embedded and specialized platforms that would not easily support TCL or other 
workstation services. TH3 tests use only the published SQLite interfaces. TH3 
is free to SQLite Consortium  members 
and is available by license to others. TH3 consists of about 53.1 MB or 718.2 
KSLOC of C code implementing 36618 distinct test cases. TH3 tests are heavily 
parameterized, though, so a full-coverage test runs about 1.5 million different 
test instances. The cases that provide 100% branch test coverage constitute a 
subset of the total TH3 test suite. A soak test prior to release does hundreds 
of millions of tests. Additional information on TH3 is available separately 
.


Does that help?

Rob


> On 27 Apr 2015, at 13:36, Sairam Gaddam  wrote:
> 
> Yeah I read that link previously but how do i get all those test cases?
> 
> On Mon, Apr 27, 2015 at 5:27 PM, Simon Slavin  wrote:
> 
>> 
>> On 27 Apr 2015, at 12:54pm, Sairam Gaddam  wrote:
>> 
>>> How SQLite is tested and can I get those test cases?
>> 
>> <
>> http://lmgtfy.com/?q=How+SQLite+is+tested+and+can+I+get+those+test+cases%3F
>>> 
>> 
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Lawyers, encryption, and RC4

2015-08-11 Thread Rob Willett
Eric,

The humorous side of me would argue that RC4 isn?t encryption anymore than 
ROT13 is these days.

The serious side of me says that exportation of encryption stuff has changed 
significantly and is full of weird and vagueness. Since you don?t state your 
country of origin its difficult to know what to suggest to look for, my 
assumption is that you are US based in which case you should probably look at 
the US Export restrictions available from the Dept of Commerce (?). From 
memory, encryption got controlled more when the bit length got longer and 
longer. I recall that it used to be that 40 bit RC4 was OK and I *think* that 
the bit length is now longer (128bit?) as it has been shown that 40 bit RC4 is 
as much use as a chocolate fireguard. 

It also depends on where you are exporting to. The US has a list of countries 
that are forbidden to trade with (with various exceptions), e.g. North Korea, 
Sudan can?t even get ROT13 legally. 

I know that RC4 used to have US and export versions which had different key 
lengths. I *think* that has now changed. 

This link talks about the relaxation of export regulations on encryption but as 
a normal piece of govt literature its rather opaque (to say the least). Give it 
to the laywers through to keep them busy.

http://www.gpo.gov/fdsys/pkg/FR-2010-06-25/pdf/2010-15072.pdf 


This is also a good site 

http://www.cryptolaw.org 

One paragraph sticks out

"On 7 January 2011 <>, a minor amendment was made to the EAR (Federal Register? 
Vol. 76, No. 5, 
p. 1059). Publicly available mass-market encryption object code software (with 
symmetric key length exceeding 64 bits), and publicly available encryption 
object code of which the corresponding source code falls under License 
Exception TSU (i.e., when the source code ies publicly available), are no 
longer subject to the EAR. The amendment includes some minor specific 
revisions.?

I read this as if your use of RC4 is 64 bit (or less) and its publicly 
available than you may have a license Exception anyway.

My view is that you need to work with the lawyers on this and ?educate? them as 
to what encryption is and what generating random numbers is. 

You can also demonstrate that most browsers have RC4 with what appears to be 
128 bit encryption by going to 

https://cc.dcsec.uni-hannover.de 

It tells you the encryption supported by your browser. Here?s mine, I have 
128bit RC4 encryption (its near the bottom).

Cipher Suites Supported by Your Browser (ordered by preference):

SpecCipher Suite NameKey SizeDescription
(00,ff)EMPTY-RENEGOTIATION-INFO-SCSV0 BitUsed for secure renegotation.
(c0,24)ECDHE-ECDSA-AES256-SHA384256 BitKey exchange: ECDH 
, encryption: AES 
, MAC: SHA384 
.
(c0,23)ECDHE-ECDSA-AES128-SHA256128 BitKey exchange: ECDH 
, encryption: AES 
, MAC: SHA256 
.
(c0,0a)ECDHE-ECDSA-AES256-SHA256 BitKey exchange: ECDH 
, encryption: AES 
, MAC: SHA1 
.
(c0,09)ECDHE-ECDSA-AES128-SHA128 BitKey exchange: ECDH 
, encryption: AES 
, MAC: SHA1 
.
(c0,08)ECDHE-ECDSA-3DES-EDE-SHA168 BitKey exchange: ECDH 
, encryption: 3DES 
, MAC: SHA1 
.
(c0,28)ECDHE-RSA-AES256-SHA384256 BitKey exchange: ECDH 
, encryption: AES 
, MAC: SHA384 
.
(c0,27)ECDHE-RSA-AES128-SHA256128 BitKey exchange: ECDH 
, encryption: AES 
, MAC: SHA256 
.
(c0,14)ECDHE-RSA-AES256-SHA256 BitKey exchange: ECDH 
, encryption: AES 
, MAC: SHA1 
.
(c0,13)ECDHE-RSA-AES128-SHA128 BitKey exchange: ECDH 
, encryption: AES 

[sqlite] Database sybchronisation

2015-08-13 Thread Rob Willett
Simon,

Your example is a very simple, yet elegant example of why database 
synchronisation is a ?difficult? problem to solve. 

As you say at the bottom of your e-mail, for the database to be correct, it has 
to determine the intentions of two sets of users to work out the solution. Even 
if somehow ?the database? manages to know what the users intended, using your 
example what is the correct answer, should town be ?Cambridge? or should town 
be ?Camberley?? One user would say Cambridge and one would say Camberley. So 
even knowing the users intentions would not be enough to work out the correct 
answer. A timing log simply shows who was first and second and has to ignore 
intentions as the database has no other information to work with.

This problem has been going on for years and I am not aware of a solution to 
the problem, however I do not claim to be an expert (or even a knowledgable 
user), I suspect that the problem has been proven to insolvable if we make the 
assumption the databases are equal in status. if we have a master database and 
a slave database then the master always wins (kind of why they are the master) 
and the problem is simple. 

As a simple example of a large company who could not solve this, consider IBM 
and Lotus Notes. Lotus Notes has been running for around 25 years, it has a 
client database on the users machine and a remote database on a server. IBM 
could not guarantee that the client database and the remote server database 
were always synced correctly and would produce replication conflict errors 
(sync errors really) when your databases did not match. It presented the sync 
errors in time order (as Simon has suggested) and allowed the user to choose 
which was the correct version to keep and which to delete. IBM never resolved 
this problem after 25 years and I suspect nobody else has or will. 

I?d be interested if anybody has a counter example to this to show how syncing 
can be made to work, as I have said I do not consider myself an expert here, 
but I have  sold  DB2 to customers and have more than once stood 
up in presentations to talk about our DB2 technical solution. Its amazing how 
much one can cram in the evening before the client briefing ?. :)

Rob

> On 13 Aug 2015, at 17:20, Simon Slavin  wrote:
> 
> 
> On 13 Aug 2015, at 4:03pm, sqlite-mail  wrote:
> 
>> With that would be trivial to log the statements that change the database to
>> replicate elsewhere. 
> 
> As Jean-Christophe wrote, it's not that simple.  There are huge books written 
> on the problems involved in synchronising two copies of a database, and none 
> of them end with a chapter called "Solution".
> 
> Here's an example of one of many problems.
> 
> Two copies of a database are maintained.  In one copy, the following command 
> is issued:
> 
> UPDATE contacts SET town = "Cambridge" WHERE town = "Grantebrycge"
> 
> In the other copy the following command is issued:
> 
> UPDATE contacts SET town = "Camberley" WHERE town = "Cambridge"
> 
> So when the databases are synchronised, one copy has command 1 then command 
> 2, whereas the other copy has command 2 then command 1.  Here's the result 
> after the synchronisation:
> 
> Original  Copy A  Copy B
>   --  --
> Grantebrycge  Camberley   Cambridge
> 
> As you can see, the two copies are not identical.  And the resolution of the 
> problem (deciding what /should/ happen) depends on the intentions of the 
> users, which is something the software can't know.
> 
> This is normally resolved by logging the time each command was issued and 
> keeping a central copy of the database which remains unchanged when changes 
> are made to the live copies.  Upon synchronisation the change logs for all 
> copies are merged in time order, and the resulting log is applied to the 
> central unchanged copy.  Then copies of this new version of the database are 
> copied to each client.
> 
> So now you need some sort of 'boss' node which does extra work.  And even 
> this still doesn't please users, some of whom will see their changes be 
> ignored or changed again by other people.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] SQLite database becomes corrupt on iOS

2015-08-14 Thread Rob Willett
Simon,

Thanks very much for this information.

Rob.

> On 14 Aug 2015, at 17:38, Simon Slavin  wrote:
> 
> 
> On 14 Aug 2015, at 4:17pm, skywind mailing lists  
> wrote:
> 
>> this is the problem. I could never reproduce it by myself and even my 
>> customers have normally no problems. But it happens once a year or so to one 
>> of my customers.
>> Still this is very annoying as it results in data loss.
> 
> I have seen a problem like this only in one context, and it had nothing to do 
> with SQLite.  When an iDevice's power starts running low, it sends out 
> notifications to all running Apps and expects their cooperation in shutting 
> down.  An App is meant to react to the notification and shut down in less 
> than (IIRC) 6 seconds, without depending on network connections and without 
> using unusual amounts of power.
> 
> If one of the running Apps does not do this properly then the operating 
> system is allowed to terminate it at (IIRC) 10 seconds, even if it is still 
> working.  But if the application is really annoying it can block other apps 
> -- like yours -- from shutting down before the power is lost, by hogging 
> resources like CPU or file storage.  So your app gets terminated at 10 
> seconds even though it wasn't given a chance to close its files.  Either by 
> the OS or because the device just ran out of power.
> 
> This could happen because of a bug.  But I saw it happen in a big corporate 
> application where the programmer decided that it just had to do a ton of 
> communication with the server and clean up files every time the app quit.  
> Reasonable on a desktop computer and a disaster on a phone.
> 
> Testing correct quit behaviour is part of Apple's approval process.  But it 
> can't test every state the app may be in when it receives a quit notification.
> 
> So the thing that might be triggering your corruption is that your customer 
> is running another application at the same time -- one which hogs resources 
> so much that your own application doesn't get a chance to shut down cleanly 
> before the power runs out.  You're never going to be able to figure out the 
> problem in your own app, because there isn't one.  You need to look at the 
> environment as a whole.
> 
> One possible way to proceed is to ask the customer what other apps they were 
> likely to have had running when the device ran out of power.  Then google 
> those apps and see if there is any suggestion that any of them might be badly 
> written or badly behaved on shutdown.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Problem with accumulating decimal values

2015-12-11 Thread Rob Willett
I *think* this is due to you creating an integer when you first create 
the entries

Try changing from

INSERT INTO fmtemp VALUES (1, 0);

to

INSERT INTO fmtemp VALUES (1, 0.0);

Just did

macpro:js rwillett$ sqlite3
SQLite version 3.8.10.2 2015-05-20 18:17:19
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE TABLE fmtemp (acno INT, balance DECIMAL);
sqlite>  INSERT INTO fmtemp VALUES (1, 0.0);
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> SELECT bal FROM fmtemp;
Error: no such column: bal
sqlite> SELECT balance FROM fmtemp;
123.45
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> SELECT balance FROM fmtemp;
1234.5
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> SELECT balance FROM fmtemp;
1357.95
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> SELECT balance FROM fmtemp;
4567.65
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;
sqlite> UPDATE fmtemp SET balance = balance + 123.45;

[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Hi,

We?re trying to understand whether or not we have a performance problem with 
our Sqlite database. It may well be that we are ?optimal? for some value of 
optimal or it may be that we can improve our system. I was hoping to use the 
experience here to help focus our attention or give some guidance based on real 
world usage.

It may well be that we don?t have a problem and our performance is appropriate. 
Thats also a good result as we can stop worrying about things :)

This e-mail has quite a lot of information in as we want to give as much 
information as appropriate. We don?t want to swamp you but we can?t simplify it 
too much.

Background

Our database takes a five minute feed of traffic information in London. This 
dated is XML based and has approximately 500-700 items in it. These items cover 
everything from a new traffic jam through to three year old roadworks. Yes 
there are roadworks in London that go back three years!

We process this XML feed and pull out chunks of the XML, process it and update 
our database with it. This database is currently 16GB in size. Our concern is 
the time taken to process each file every five minutes. It has been getting 
steadily longer and longer. It started out at around 6 seconds and is now 
around 22-24 seconds. Whilst we expect processing time to get bigger, we are 
getting concerned about the performance and decided to have a look.

Our XML processing system is written in Perl, which has advantages and 
disadvantages, Probably the biggest advantage is its forgiving for development 
and allows us to move quickly with processing significantly large amounts of 
text within the XML. The biggest disadvantage to use is the lack of 
multithreading so its limited by the clock speed of the CPU. The database (DB) 
we use is only ever read and written by a single process, we do not network the 
database or allow any access apart from processing the XML file and getting an 
output JSON file.  

As our processing time has got longer, we decided to have a look at where our 
Perl program was spending most time. For the interested we use the Perl module 
Devel::NYTProf. This is a well known and well respected performance profiling 
tool. 

http://search.cpan.org/~timb/Devel-NYTProf-6.01/lib/Devel/NYTProf.pm 


We do not need to adapt our perl code to use it, you simply involve Perl with 
-d:NTYProd  .

The output is a nice html web page that allows you to drill down into the 
sections of the code that take the most time up. We ran the program over ten 
iterations of files to even things out on a 16GB Sqlite database and looked at 
the output. We were very surprised to see where most of the time was spent, out 
of a two minute run 70% of the time was spent in an innocuous function. This 
was odd!

We drilled down even further and found that a single select call was occupying 
90% of that 70%. The performance profiler was indicating that a select on a 
table was taking around 13ms/call and we were doing around 5,000 calls in our 
ten file run. The approx 5,000 calls was about right as there are around 500 
distinct chunks of information per file. So the numbers seemed to add up. The 
issue for us was the 13ms per call.

The actual SQL called 5,000 times is 

?select Id,Calculation,Peak,Red,Amber,Green from RAG where text = ??

RAG is a Red/Amber/Green table and sentence is a simple sentence that describes 
road conditions. See below for examples.

The RAG table schema is 

CREATE TABLE "RAG" (
 "Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
 "Count" integer NOT NULL DEFAULT 0,
 "Text" TEXT,
 "Peak" integer,
 "Calculation" integer NOT NULL DEFAULT 0,
 "Red" integer DEFAULT 0,
 "Amber" integer DEFAULT 0,
 "Green" integer DEFAULT 0,
 "BayesAttributes" TEXT
);

It has four indexes on it 

CREATE UNIQUE INDEX "RAG_Idx1" ON RAG ("Text" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "RAG_Idx2" ON RAG ("Calculation" COLLATE NOCASE ASC, "Text" 
COLLATE NOCASE ASC);
CREATE INDEX "Rag_Idx3" ON RAG ("Calculation" COLLATE NOCASE ASC, 
"BayesAttributes" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Rag_Idx4" ON RAG ("Id" COLLATE NOCASE ASC, "Calculation" 
COLLATE NOCASE ASC, "Peak" COLLATE NOCASE ASC, "Red" COLLATE NOCASE ASC, 
"Amber" COLLATE NOCASE ASC, "Green" COLLATE NOCASE ASC, "Text" COLLATE NOCASE 
ASC);

The table has approximately 43,000 rows in it and doesn?t grow very much now. 
We don?t have complex keys linking things together, we have a simple table.

Our first thought was that the Perl program was in the way

We made sure we prepared statements in advance, we analysed the indexes and 
rebuilt them, we checked the query plan again and did normal simple housekeeping

The query plan showed 

SCAN TABLE RAG USING COVERING INDEX Rag_Idx4

which we think is OK.

To try and isolate the problem away from the Perl program, we then generated 
the 

[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Ward,

Thanks for this. 

We are already doing ANALYZE and sadly its not made any difference,.

Rob

> On 3 Jul 2015, at 00:04, Ward Willats  wrote:
> 
> 
>> On Jul 2, 2015, at 3:16 PM, Rob Willett  
>> wrote:
>> 
>> We?re trying to understand whether or not we have a performance problem with 
>> our Sqlite database.
> 
> It may or may not apply to your situation, but after doing lots of inserts, 
> running ANALYZE can sometimes work wonders.
> 
> -- Ward
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Thanks for the reply

> I can tell you that the SQLite3 command line program uses threading and is 
> extremely fast and optimised, it is no surprise that the query takes much 
> longer in many other programs using a library or even compiled-in SQLite 
> code. If it takes 50 seconds piped into SQLite then 100+ seconds in Perl is 
> not completely weird.
> 
> You say that the SELECT on the Table is taking 13ms/call, but I assume you 
> mean the SELECT statement takes 13ms to run from start to finish and copy all 
> the selected DB content into Perl text arrays... this may be a bottleneck. 
> How many lines (records) are returned by the queries on average? For 10K+ 
> lines, 13ms would be a fantastic result - but on a 43K row table I doubt that 
> very much.

I have no doubt that sqlite3 is fast and optimised, thats why we like it. 

My comments before on the speed difference of 50 secs for doing 5,000 selects 
on the database (approx) vs 120 seconds for doing ALL the Perl work which 
includes the 5,000 individual SELECT?s. My assumption was that sqlite3 is very 
good at its job and that using sqlite3 directly is as good a performance as we 
can get. So thats my baseline to work from.

Each select will return either one row or no rows. The text column is unique 
across the table. 

We see the difference of 10ms and 13ms as acceptable in relative terms. If Perl 
adds 0.03ms to the select query we can understand that as thats the overhead of 
using Perl as the database manager. What we are trying to understand, and I 
apologise for stating it inelegantly or not being clear, is whether 10ms is 
acceptable performance to pull a single line out of a database using a select 
on a text field in a table that has an index on it. 

If 10ms is the very best sqlite3 can do using nothing but a select statement 
directly onto the database using sqlite3, with no overhead of Perl then we are 
comfortable with the13ms it takes in Perl, We understand the 0.03ms extra that 
Perl adds, we?re trying to see if the 10ms is acceptable. That only works out 
at 100 selects per second, but the test pages we have seen suggests we should 
get far, far faster. 

> 
> The most immediate optimisation I can see is making the column collation 
> NOCASE (as opposed to the Index). This will avoid running the collation 
> function on every item. SQLite may actually optimise this already and I don't 
> know if it does, but it's worth testing.

We will try this and report back.

> Another thing I would do is add a column which uses tokens for strings, as it 
> seems these are standard repeating strings which may be reduced to a simple 
> Integer, which will be much faster to look up than long string comparisons.

Sadly the text column is not standard repeating strings. The aim of the table 
is to hold all the myriad of ways that Transport for London can say similar 
things, e.g. they will state ?Traffic is normal for the time of day?, ?Traffic 
is normal?, ?Normal Traffic?, Traffic is normal this morning?, "trafic is slow? 
(the spelling mistake is deliberate) and so on and so on. 

Add in all the various ways you can descriptor lane closures, traffic jams, 
traffic lights being out, bridge closures and openings. A person writes the 
update for each disruption and they often do use the same words but they 
misspell something, they change things round, they add new stuff in. 

The RAG table is a way to ?normalise? that information so that we have a 
database of what has been said, what we classified it as (Red/Amber/Green), how 
we calculated that RAG status, we also shorten and stem the words and do 
Bayesian analysis on new versions of phrases to try and work out what is meant. 
Its a key table for us and a lot of pre-processing work goes into creating it. 

We wish there was a standard set of phrases but there isn?t. The 43K lines 
demonstrate it, we are constantly surprised by the new ways people find to 
describe exactly the same situation :) We had looked at your idea previously 
and assumed that we could use a similar approach, but the wide range of text we 
found stopped that working. Using natural language stemming and Bayesian 
analysis got us closer but its still imperfect

> If it isn't part of the usual maintenance, try re-pack the DB file using 
> VACUUM.

We?ll check tor see if we do that and if not we?ll try it and report back.
> 
> In normal Code we would of course compute and re-use the prepared statements 
> shaving off another significant bit of time, but not sure if the Perl 
> interface allows for this or if it will reduce significant time for those 
> straight-forward queries.

We already use prepared statements wherever possible. Its pretty much standard 
coding practise for us. We also tried removing the prepared statement for this 
particular query and found it made little difference which surprised us. 

> Looking at your system specs, it's hard to imagine your results NOT being 
> significantly faster than the quoted 

[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Gerd,

Thanks for this. 

We?ll add this to the to-do list for the day. We?ve had a lot of responses back 
and we need to put a plan together to work through them one by one and see the 
impact, It?ll be an interesting day.

Thank you for taking the time to look at our e-mail and replying

Rob

> On 3 Jul 2015, at 07:06, GB  wrote:
> 
> 
> 
> Rob Willett schrieb am 03.07.2015 um 00:16:
>> SCAN TABLE RAG USING COVERING INDEX Rag_Idx4
>> 
>> 
> 
> Since Rag_Idx4 is quite wide and not primarily ordered by "Text", an index 
> scan might not be significantly faster than a table scan. As already 
> mentioned, ANALYZE might help.
> 
> For curiosity you may try to force it to use "Rag_Idx1" by executing
> 
> "select Id,Calculation,Peak,Red,Amber,Green from RAG INDEXED BY Rag_Idx1 
> where text = ?"
> 
> and see if it makes any difference. If it does you may consider rearranging 
> Rag_Idx4.
> 
> 
> hth
> Gerd
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Simon, Clemens,

> Do you ever depend on any indexing on the "Text" column which is not COLLATE 
> NOCASE ASC ?  If not, then you should be able to speed up your search by 
> defining the column the way you think of it.  So in your table definition use
> 
>"Text" TEXT COLLATE NOCASE ASC,

We?re just checking this out and we?re getting an error on creating the table. 
We?ve just checked the SQlite spec 
(https://www.sqlite.org/syntax/column-constraint.html 
) and can?t see how to 
add ASC to it unless we create the Text field as a primary key which is a 
significant coding change for us. 

CREATE TABLE "RAG" (
 "Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
 "Count" integer NOT NULL DEFAULT 0,
 "Text" TEXT COLLATE NOCASE ,
 "Peak" integer,
 "Calculation" integer NOT NULL DEFAULT 0,
 "Red" integer DEFAULT 0,
 "Amber" integer DEFAULT 0,
 "Green" integer DEFAULT 0,
 "BayesAttributes" TEXT
);

works

and 

CREATE TABLE "RAG" (
 "Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
 "Count" integer NOT NULL DEFAULT 0,
 "Text" TEXT COLLATE NOCASE ASC , 

[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Clemens,

Fine, thank you. We thought we had misunderstood and we were right, we had!

We are just checking the COLLATE NOCASE option on the RAG table and re-running 
the tests again as the improvement was so dramatic we don?t believe it :)

Rob.

> On 3 Jul 2015, at 09:26, Clemens Ladisch  wrote:
> 
> Rob Willett wrote:
>>> Do you ever depend on any indexing on the "Text" column which is not 
>>> COLLATE NOCASE ASC ?  If not, then you should be able to speed up your 
>>> search by defining the column the way you think of it.  So in your table 
>>> definition use
>>> 
>>>  "Text" TEXT COLLATE NOCASE ASC,
>> 
>> We don?t think adding ASC is allowable
> 
> Indeed it isn't; ASC makes sense only for (sorting) an index.
> 
> 
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
 to 
run 10,551 selects. This implies that we had issues on the database before with 
either fragmentation or indexes being poorly setup.  

When we look at the graphical output for the performance monitoring program we 
find that the bottleneck has gone from there and another area now takes up the 
most time, which is exactly what we would expect, something has to be the 
slowest part of the system. So long as we don?t optimise the idle loop 
(http://c2.com/cgi/wiki?OptimizingTheIdleLoop 
<http://c2.com/cgi/wiki?OptimizingTheIdleLoop>) :)

We are delighted with the results and would have taken a 30% decrease as great 
news, a 250% decrease (I know, I know you can?t take 250% off something) is 
fantastic. We need to add this to our almost production database and add in the 
dumping and rebuilding on a monthly basis. 

Hopefully this may help other people with tuning their databases.

We?d like to say thanks to all the people who offered help and suggestions, 

Simon, Ward, R Smith, Gerd, Clemens

We really appreciate it and if you ever get to London the beers and pizzas are 
on us. 

All the best,

Rob

> On 2 Jul 2015, at 23:16, Rob Willett  wrote:
> 
> Hi,
> 
> We?re trying to understand whether or not we have a performance problem with 
> our Sqlite database. It may well be that we are ?optimal? for some value of 
> optimal or it may be that we can improve our system. I was hoping to use the 
> experience here to help focus our attention or give some guidance based on 
> real world usage.
> 
> It may well be that we don?t have a problem and our performance is 
> appropriate. Thats also a good result as we can stop worrying about things :)
> 
> This e-mail has quite a lot of information in as we want to give as much 
> information as appropriate. We don?t want to swamp you but we can?t simplify 
> it too much.
> 
> Background
> 
> Our database takes a five minute feed of traffic information in London. This 
> dated is XML based and has approximately 500-700 items in it. These items 
> cover everything from a new traffic jam through to three year old roadworks. 
> Yes there are roadworks in London that go back three years!
> 
> We process this XML feed and pull out chunks of the XML, process it and 
> update our database with it. This database is currently 16GB in size. Our 
> concern is the time taken to process each file every five minutes. It has 
> been getting steadily longer and longer. It started out at around 6 seconds 
> and is now around 22-24 seconds. Whilst we expect processing time to get 
> bigger, we are getting concerned about the performance and decided to have a 
> look.
> 
> Our XML processing system is written in Perl, which has advantages and 
> disadvantages, Probably the biggest advantage is its forgiving for 
> development and allows us to move quickly with processing significantly large 
> amounts of text within the XML. The biggest disadvantage to use is the lack 
> of multithreading so its limited by the clock speed of the CPU. The database 
> (DB) we use is only ever read and written by a single process, we do not 
> network the database or allow any access apart from processing the XML file 
> and getting an output JSON file.  
> 
> As our processing time has got longer, we decided to have a look at where our 
> Perl program was spending most time. For the interested we use the Perl 
> module Devel::NYTProf. This is a well known and well respected performance 
> profiling tool. 
> 
> http://search.cpan.org/~timb/Devel-NYTProf-6.01/lib/Devel/NYTProf.pm 
> <http://search.cpan.org/~timb/Devel-NYTProf-6.01/lib/Devel/NYTProf.pm>
> 
> We do not need to adapt our perl code to use it, you simply involve Perl with 
> -d:NTYProd  .
> 
> The output is a nice html web page that allows you to drill down into the 
> sections of the code that take the most time up. We ran the program over ten 
> iterations of files to even things out on a 16GB Sqlite database and looked 
> at the output. We were very surprised to see where most of the time was 
> spent, out of a two minute run 70% of the time was spent in an innocuous 
> function. This was odd!
> 
> We drilled down even further and found that a single select call was 
> occupying 90% of that 70%. The performance profiler was indicating that a 
> select on a table was taking around 13ms/call and we were doing around 5,000 
> calls in our ten file run. The approx 5,000 calls was about right as there 
> are around 500 distinct chunks of information per file. So the numbers seemed 
> to add up. The issue for us was the 13ms per call.
> 
> The actual SQL called 5,000 times is 
> 
> ?select Id,Calculation,Peak,Red,Amber,Green from RAG where text = ??
> 
> RAG is a Red/Amber/Green table and sentence is a simple sentence that 
>

[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
I forgot to add one thing in to my last e-mail. I just checked the function 
call time which as approx 13ms, it is now approx 110 micro seconds, over a 100x 
faster. 

This looks like the speed we can get out of SQLite.

Thanks again,

Rob.


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Simon,

We had missed the incorrect defn of Calculate in the index. We?ve been changing 
around stuff and suspect we omitted to check as carefully as we we should when 
changing :(

We?ll also update BayesAttribute as well.

We?re actually rebuilding the database from scratch anyway, it?ll take around 
3-4 days now as opposed to 12-16 days as before :) 

We?ll check if we get any improvements from rebuilding using .dump by copying 
the database to another server and doing it there. Thats one of the great 
things about SQLite, pick the database up and copy it somewhere else :)

is Vacuum the same as doing the .dump and restore or is it different? We like 
the .dump as it gives us a nice easy to use backup :)

Rob

> On 3 Jul 2015, at 13:15, Simon Slavin  wrote:
> 
> 
> On 3 Jul 2015, at 11:35am, Rob Willett  
> wrote:
> 
>> CREATE TABLE "RAG" (
>>"Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
>>"Count" integer NOT NULL DEFAULT 0,
>>"Text" TEXT COLLATE NOCASE ,
>>"Peak" integer,
>>"Calculation" integer NOT NULL DEFAULT 0,
>>"Red" integer DEFAULT 0,
>>"Amber" integer DEFAULT 0,
>>"Green" integer DEFAULT 0,
>>"BayesAttributes" TEXT
>> );
> 
> [...]
> 
>> CREATE UNIQUE INDEX "RAG_Idx1" ON RAG ("Text" COLLATE NOCASE ASC);
>> CREATE UNIQUE INDEX "RAG_Idx2" ON RAG ("Calculation" COLLATE NOCASE ASC, 
>> "Text" COLLATE NOCASE ASC);
>> CREATE INDEX "Rag_Idx3" ON RAG ("Calculation" COLLATE NOCASE ASC, 
>> "BayesAttributes" COLLATE NOCASE ASC);
> 
> To explain further, now that you have declared your "Text" column as being 
> COLLATE NOCASE, NOCASE is now the default collation for it, and all sorting 
> and matching on Text will be NOCASE unless you state otherwise.  So it's not 
> necessary to state COLLATE NOCASE when you create indexes on it or mention it 
> elsewhere.  It won't do any harm, though.  Shout out to Clemens for his 
> corrections to my earlier post.
> 
> I notice that "Calculation" is an INTEGER column, and that "BayesAttributes" 
> is a TEXT column but in "Rag_Ids3" you have used "Calculation" COLLATE 
> NOCASE.  I was wondering whether you had meant to use "BayesAttributes" TEXT 
> COLLATE NOCASE in your table definition instead, and rewrite the index 
> accordingly.  There doesn't seem to be any point in having "BayesAttributes" 
> be case-sensitive.
> 
> Since you've significantly re-done your table definition and indexes, now is 
> a good time to run ANALYZE.  It may or may not help, but it will overwrite 
> data which is no longer corrent.
> 
>> The second biggest improvement was the 10 secs saved by dumping the database 
>> and rebuilding from the dump file. We may have got a better increase if we 
>> did that first and then the COLLATE NOCASE which would probably then have a 
>> lesser increase :) 
> 
> While the improvement from rewriting the table definition will be permanent, 
> the improvement from defragmentation will gradually reduce as the database 
> gets changed in a fragmented way.  So you may find out !
> 
> However, it should not be necessary to dump and rebuild your database on a 
> monthly basis.  SQLite is unusually efficient at dealing with fragmented 
> database pages, and OS X, of course, has background defragmentation built in. 
>  I doubt that any difference .dump & .read would make is worth doing that 
> amount of clunky data processing on a regular basis.  However, if you do 
> decide to do it, you should do an ANALYZE before using the newly remade 
> database.  Including when you do this to your production database after 
> implementing the changes you've decided on in this thread.
> 
> Delighted we could make such a big difference.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-04 Thread Rob Willett
Thanks for the comments.

Yes we did look at this before we posted our original plea for help. We have a 
BEGIN/COMMIT around all the relevant code.

Rob

> On 4 Jul 2015, at 20:28, droedel  wrote:
> 
> Rob Willett  writes:
> 
> [snip]
>> The headline figures are we have gone from 213 secs to process 20 files
> down to 90 secs to process 20 files. We
>> are running approx 2.5x faster. To get this improvement the biggest change
> was simply adding COLLATE
>> NOCASE to the table schema. This saved around 120-130 secs which was
> brilliant. 
>> 
> 
> Did you consider putting all SELECT-statements in an explicit transaction
> (BEGIN TRANSACTION / COMMIT) ? When the statements are in one transaction,
> sqlite only needs to acquire/release the lock once. During some benchmark
> tests I did a few years ago, this reduced query time from 640 ms to 210 ms
> for 10 000 trivial SELECT statements.
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-06 Thread Rob Willett
James,

I didn?t say it explicitly but we did understand where the speed up came from. 
As you say, changing the collation was the key element in this. This speedup 
also meant that the a dirty, nasty, vile hack (and thats the nicest thing we 
can say about it and its our code) so-called optimisation we put in the Perl 
script to try and fix things stopped being effective. 

The way the optimisation worked was to simply remove the number of SELECT calls 
made. This optimisation was done before we updated the COLLATION on the index. 
After the COLLATION was added, our optimisation no longer had the same effect. 

The reason was that the number of searches of the binary tree had been 
dramatically reduced, this mean our ?optimisation' was no longer worth it as 
the ?optimisation' now only saved a few searches as opposed to 40K of them. The 
code worked but we were now only saying 13 or so searches as opposed to 40,000. 
We?ve pulled the ?optimisation? out as it made the code untidy and it was a 
dirty hack. 

Thanks for helping,

Rob

> On 5 Jul 2015, at 21:36, James K. Lowden  wrote:
> 
> On Fri, 3 Jul 2015 11:35:21 +0100
> Rob Willett  wrote:
> 
>> It basically has taken no time to run 10,551 selects. This implies
>> that we had issues on the database before with either fragmentation
>> or indexes being poorly setup.  
> 
> Perhaps you said so and I missed it, but to me it's clear that almost
> all the improvement comes from using RAG_Idx1.  Because it indexes the
> Text column, you're now doing a binary search instead of a linear
> scan.  
> 
> By changing the collation, you went from scanning every row to
> searching a binary tree.  That took you from O(n) to O(log2 n).  If
> each row in your 16 GB database is 100 bytes, allowing for overhead you
> might have 80 million rows?  To satisfy your query, on average that
> would be an improvement from 40,000,000 I/O operations to 13.  
> 
> When something that used to take 122 seconds starts going a million
> times faster, it begins to look pretty much instantaneous.   :-)  
> 
> --jkl
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Anybody any experience of ODBC drivers for Mac (Yosemite)/Excel

2015-03-31 Thread Rob Willett
Hi,

I?m trying to access a SQLite database I?ve written using Excel. In the Windows 
world we used to use ODBC drivers which were clunky, cumbersome, sometimes 
worked, sometimes didn?t, had arcane issues and generally were a PITA.

So I?m trying to do the same on our development Macs and it appears that ODBS 
on a Mac is clunky, cumbersome, sometimes works, sometimes doesn?t AND has the 
advantages of being old as well now :) Some of the ODBC?s out there were 
written for PowerPC which is around 10 years old.

Whilst we could extract the data in SQL, manipulate it in Perl (stop sniggering 
at the back there), we?d prefer to use Excel, bring our data into a Pivot Table 
and work with it using the right tools. 

We don?t actually mind spending money (well we do, but we realise we might have 
to), but I wondered if anybody had any recommendations of things that they know 
work. 

We?ve seen these open sources ones

http://www.ch-werner.de/sqliteodbc/ 

These paid ones

http://www.openlinksw.com 

Actual Technologies 

Simba Technologies 

and various other websites for the Mac ODBC administrator application that was 
removed around five years ago.

If we know that the open source ODBC stuff works we?ll put the time in and make 
it, however we?ve probably all got stories to tell of how we burnt a week 
getting some weird open source system thats now been abandoned and will never 
work on anything later than a 286 CPU. If we know that the Actual tech drivers 
?just work? we?ll go and buy that. I?m more concerned about burning time than 
money.

Thanks,

Rob.


[sqlite] What software is deployed more than SQLite?

2015-05-03 Thread Rob Willett
Richard,

Every copy of iPhoto, Aperture and Photos as well. Photo?s is the new 
replacement for IPhoto and Aperture. I?ve just checked a couple of the 
libraries and they open as SQLite.

I can?t speak for the iOS versions of these.  

In reference to the png library I would assume that that is on all of the 
previous as well.

I think it would be a reasonable bet to reckon on there being 2-3 copies of 
SQLite per Mac out there. 

Rob.

> On 3 May 2015, at 19:18, Richard Hipp  wrote:
> 
> I'm trying to update the "Most Deployed Database" page
> (https://www.sqlite.org/mostdeployed.html) in the SQLite documentation
> (which has not been touched in close to a decade) and I began to
> wonder what other software libraries (database or otherwise) might be
> deployed more than SQLite.  SQLite is in a lot of things.  My
> conjecture is that SQLite is in the top-10 most deployed software
> components in the world today.  Check my work, please.  SQLite is in:
> 
>  *  Every Android phone and device
>  *  Every iPhone and iOS device
>  *  Every Mac within the past 10 years
>  *  Every Firefox, Chrome, or Safari browser
>  *  Every copy of Skype
>  *  Every copy of iTunes
>  *  Most Python and PHP installations
>  *  Every Dropbox client
>  *  Every TurboTax and QuickBooks
> 
> And more.  But just from the list above, there are not too many
> computing devices that omit SQLite.  I'm wonder what other software
> components have a greater reach?
> 
> The original Jean-loup Gailly and Mark Adler implementation of the
> zlib compression library might be deployed more.  Anything else?
> 
> What about libjpeg and libpng?  I think there might be multiple
> independent implementations of libjpeg in circulation, but I am
> unclear on that point - perhaps a reader more knowledgable about this
> can correct me.  What about libpng?  Is there just the one original
> libpng library used everywhere, or are there competing
> implementations?
> 
> There appear to be more deployments of SQLite than there are of Linux,
> since SQLite is on every Android device, and Android represents the
> bulk of Linux deployments.  SQLite is also on most Linux desktops by
> virtue of being included with Firefox and Chrome.  And it is on many
> Linux servers by virtue of being included in Python and PHP.  Some
> fraction of Linux machines may omit SQLite, but that fraction seems
> far smaller than (say) the number of iPhones that include SQLite, so
> SQLite still comes out numerically superior.
> 
> There appear to be more deployments of SQLite than all Apple-built
> computing devices, since SQLite seems to be in all Apple products and
> SQLite is in many other products as well.
> 
> SQLite is not in default Windows installations (historically - that is
> about to change with Windows 10 which uses SQLite as a core OS
> component) but many Windows desktops will include secondary software
> such as Firefox or Chrome or iTunes or Skype or Dropbox or something
> else that contains SQLite.  So perhaps most Windows desktops contain
> at least one copy of SQLite.  And in any event, I hear that the total
> number of smartphones now exceeds the total number of desktops (of any
> type, Windows or otherwise) and SQLite is in all of the smartphones.
> 
> There are multiple competing implementation of libc, and (unless I am
> mistaken) Android and MacOS/iOS use completely independent libc
> implementations.  You could argue that various implementations of libc
> are collectively more widely deployed than SQLite.  But there is only
> one implementation of SQLite, so if we talk about single
> implementations rather than competing implementations  of the same
> interface, then SQLite seems to still come out on top.
> 
> What am I overlooking?  Would it be overly brash to claim that SQLite
> is the second most widely deployed software component in the world
> today, after the Gailly/Adler zlib implementation?  Or maybe the
> third-most after zlib and libpng?
> 
> Any input you can provide is appreciated!
> -- 
> D. Richard Hipp
> drh at sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



Re: [sqlite] Adding the source to XCode

2013-10-22 Thread Rob Willett
Hi,

I used the builtin sqlite library that comes with Mac OS X. The problem I found 
was compiling it under 64 bit under X code seemed very difficult. I'm not 
blaming SQLite just my poor knowledge of how Xcode works and how 64 bit 
compilation worked. I needed 64 bit as I have very large data sets to work with.

So I have no issues with the built-in libraries, they may be a little older 
than the bleeding edge but work for me. I note that you are on 10.6.8 anyway so 
you may have the most up to date libraries for Snow Leopard.

Anyway, in answer to your specific questions.

1. I'm not a c++ expert as I only code in the One True Language 'C'. If you 
have a C++ (or do you mean Objective-C) project, check that Xcode allows .c 
files to be added. If your sqlite.c file is in the same directory as your .h 
files, this should not be a problem, I'd suggest you create a dummy command 
line project for C and see if you can add the .c file to your project list. I 
had no issues but I'm C based.

2. Don't move your sqlite db file to your bundle yet. If you have just started 
coding in Xcode, creating a bundle is about the last thing you need to do. Get 
your code working and keep your db file elsewhere. Once you have a working 
system THEN think about where your sqlite file needs to be. Create a #define 
MY_SQLFILE_IS for the file location and change it in your .h file when you need 
to. Xcode is really, really, really nasty and difficult about where it creates 
all your work in progress. It creates temporary files, so fir the moment focus 
on learning Xcode and getting your code working rather than worry about file 
locations in the future.  If this is your first time with Xcode start reading 
lots and lots as its difficult. I come from a UNIX background with 25 years C 
and Makefile experience and I struggle at times to understand how Xcode works. 

Rob.

On 22 Oct 2013, at 09:17, Igor Korot  wrote:

> Hi, ALL,
> I'm trying to build my C++ project which involves SQLite on MAC.
> I have Snow Leopard (OS X 10.6.8) along with XCode 4.2.
> I successfully made a project in XCode, then moved my cpp files in it.
> 
> Now I am trying to add the "sqlite3.c" but I couldn't.
> Going to "File"->"Add Files to,,," I see only my .cpp, .h and sqlite3.h
> files. I don't see sqlite3.c file in the "Add File" dialog.
> 
> Is there any setting I need to turn in order to be able to do so?
> 
> Also about the db file.
> I need to move it to the bundle. In order to do so I can make an additional
> "Build Phase" with "Copy Files". Problem is which destination to choose.
> 
> This is only second time I'm working with Apple XCode and first one with
> XCode +SQLite pair. ;-)
> 
> Thank you for any hints.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Web application with SQLite

2013-12-27 Thread Rob Willett
Igor

Suggest you look at 

http://www.sqlite.org/whentouse.html

Your questions are very broad and don’t really have a right or wrong answer. 
You might as well start a flame war by asking the best language for software 
development. Everybody has their own version of the truth, mine happens to be 
the only true development language and thats C, but accept that lesser and 
inferior languages may also be used :)

I think the answer is you can use SQLite just about wherever you want, but 
everybody’s use case is slightly different. I personally wrote my Web 
application using PHP and Postgres but in hindsight could easily have used PHP 
and SQLite. 

I would not assume that Web applications are only written in MySQL or MS SQL at 
all. I have no stats to back this up, but I would expect that people have 
written web apps in just about every possible database (and non database) 
technology possible :) 

Rob.

On 27 Dec 2013, at 21:14, Igor Korot  wrote:

> Hi ALL,
> Does people use SQLite for Web development? Or web apps are written
> only with mySQL/MS SQL?
> 
> What are pros/cons of usinf SQLite for Web app?
> 
> Thank you.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Solving the Sudoku with SQlite 3.8.3 trunk

2014-01-18 Thread Rob Willett
I have to say thats it been a long time since I sat back and was amazed by what 
people can do.

My knowledge of SQL is limited, whilst I recognise it is a functional language, 
I had no idea you could solve Sudoku in it.

I take my hat off to the real experts. The downside of this is I now really, 
really understand how little I know the language :)

Rob.

On 18 Jan 2014, at 17:50, Dan Kennedy  wrote:

> 
> Beaten by minutes!
> 
> 
> /* The input suduko. */
> WITH RECURSIVE input(sud) AS (
>  VALUES(
>'53  76  195986 8   6   34  8 3  17   2   6 6 28419  5
> 8  79'
>  )
> ),
> 
> /* A table filled with digits 1..9, inclusive. */
> digits(z, lp) AS (
>  VALUES('1', 1)
>  UNION ALL SELECT
>  CAST(lp+1 AS TEXT), lp+1 FROM digits WHERE lp<9
> ),
> 
> /* The tricky bit. */
> x(s, ind) AS (
>  SELECT sud, instr(sud, ' ') FROM input
>  UNION ALL
>  SELECT
>substr(s, 1, ind-1) || z || substr(s, ind+1),
>instr( substr(s, 1, ind-1) || z || substr(s, ind+1), ' ' )
>  FROM x, digits AS z
>  WHERE ind>0
>  AND NOT EXISTS (
>SELECT 1 FROM digits AS lp
>  WHERE z.z = substr(s, ((ind-1)/9)*9 + lp, 1)
> OR z.z = substr(s, ((ind-1)%9) + (lp-1)*9 + 1, 1)
> OR z.z = substr(s, (((ind-1)/3) % 3) * 3
>  + ((ind-1)/27) * 27 + lp
>  + ((lp-1) / 3) * 6
>, 1)
>  )
> )
> 
> SELECT s FROM x WHERE ind=0;
> 
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Setting boundaries in a search

2014-07-16 Thread Rob Willett
Hi,

Can I add my very first piece of advice after listening and reading for the 
last 6-9 months :)

I’ll second what Simon says, I use the very same technique for a table with 4M+ 
records in and its so fast I thought I had an error and looked for bugs in my 
code.  I assumed (incorrectly) that it would be very slow, It isn’t. 

I’ll also add one other piece of advice to the people clogging up the list on 
hot journals with childish comments. I’m a very old developer, C and UNIX, well 
into my 4th decade of programming. I have learnt over the years that some 
things I know very well and some things (SQL and SQLIte are excellent examples) 
I’m a novice and a noob and a newbie and all those other words we use. This 
list is an excellent source of knowledge and very valuable (both in terms of 
time and money). I have learnt an awful lot from reading the mails here, there 
are often very good questions and normally excellent answers from people who 
take a significant amount of time to understand the problem and to write 
comprehensive replies. I thank all the people who write such good replies and 
maintain the high quality of the mailing list. It is very rare indeed to have 
short and curt answers to people who make the effort to write a decent question.

Whilst I cannot add much to any SQL discussion (point one above excepted, of 
which I’m sinfully proud to have contributed something at long last, even if 
its only to confirm what somebody else has done), I have come to realise that 
the people who answer here are real experts, I will not embarrass them by 
naming names, and if they say something which disagrees with what I think or 
have done, my first thought now is to challenge what I think and how I do it, 
because there is a very, very, very good chance I am wrong and the people here 
on the list are right. I’m old enough not to be bothered by admitting I got it 
wrong, and that other people know SQLite better than I do. 

There, I’ll now go back to the rock from which I came and lurk for another 9 
months :)

Thanks for all the input and very best wishes,

Rob



On 16 Jul 2014, at 06:48, Simon Slavin  wrote:

> 
>> On 16 Jul 2014, at 3:21am, jose isaias cabrera  wrote:
>> 
>> SELECT * from startcodes where code = 'e';
>> 
>> but I want to search only from id >= 8 and <= 14.  Is there a way to set the 
>> boundary for that SELECT that will only search ids 8-14?  I know I can do a 
>> WHERE id BETWEEN 8 AND 14, but is there another faster way?
> 
> That way is not particularly slow.  You just need to have a good index.  A 
> good index for that search would be
> 
> CREATE INDEX sci ON startcodes (code,id)
> 
> You will find that that SELECT will then be blisteringly fast even with 
> millions of rows in your table.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Setting boundaries in a search

2014-07-16 Thread Rob Willett
All my searches are unique and go across the whole table. The range I select 
from us normally between 500 and 600 rows.  

I benchmarked the select over the growth of the database to circa 4m records 
and the slowdown was negligible. I'm not looking at optimising it as I have far 
better candidates for optimisation (sadly). I'm still building the application 
and still adding data, and may double the test database size over the next 
week. I'm confident (famous last words) it won't be a problem (stop sniggering 
at the back there).  

-- 

Rob


On Wednesday, 16 July 2014 at 15:49, RSmith wrote:

> 
> On 2014/07/16 14:23, jose isaias cabrera wrote:
> > "Simon Slavin" wrote...
> > > 
> > > That way is not particularly slow. You just need to have a good index. A 
> > > good index for that search would be
> > > 
> > > CREATE INDEX sci ON startcodes (code,id)
> > > 
> > > You will find that that SELECT will then be blisteringly fast even with 
> > > millions of rows in your table.
> > 
> > I do have that INDEX for that id and table. Thanks. :-) I was just trying 
> > to be greedy and see if I could become even faster. 
> > Thanks for your help.
> > 
> 
> 
> Greed is good in this regard :)
> 
> Are all the searches unique or do you repeat a lot of searches for a very 
> specific range? If the latter, then partitioning the table 
> (well in SQLite that would really be a second derived table) might speed 
> things up if space is not an issue, but I would only invest 
> the design time for this once the standard query is proven to be slow - which 
> might be the case.
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Handling Timezones

2014-07-31 Thread Rob Willett
The problem with doing this is that many IPs addresses are exit IP addreses, 
I.e the ISP or company brings things into their own network and only has a few 
exit nodes, so regardless of where you connect from, everybody comes out of one 
time zone. I know AOL used to do this and I know that IBM does this. 
Disclaimer, I work for IBM and can't access some parts of the BBC website as it 
thinks I'm calling from Germany when I'm in London.  

My own ISP puts my exit node a few hundred miles from where I am and I'm only 
in the UK. 

These databases are best guesses only and so long as you know that them use 
them accordingly. 

Previous suggestions of using the browser or client computer settings seem 
sensible to me.  

-- 
Rob Willett
Sent from my mobile phone


On Thursday, 31 July 2014 at 08:54, Stephen Chrzanowski wrote:

> Looking back at the clarification of what the OP wanted to do, I've got
> this to recommend;
> 
> If your users are talking to your server via the internet and not via a VPN
> connection, instead of relying on what time zone your users browser is
> giving you, look at what IP they're calling in from and do an IP to
> geographical look up to find out where they are. From there you'd be able
> to catalog a 'best time of contact' based on what the Geolocation service
> gives you.
> 
> Doing a quick google search on "ip to geo" I found these two:
> http://www.iplocation.net/ and http://www.geoiptool.com/
> 
> By the looks of it, for a modest yearly fee, you'd be able to download a
> database of IPs to locations and you'd be able to get time zone information
> right from there.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Sample SQL code thats beyond me :(

2015-05-23 Thread Rob Willett
Hi,

I?m trying to do some analysis across a couple of tables and the SQL is beyond 
my meagre skills. I?m struggling to even describe the problem to be honest.

The high level description is that I have three tables, Users, Devices and 
Perimeter_Notifications. The high level description is that I want to extract a 
list of users from a database to send information to if they are not on 
holiday. However I don?t necessarily have the holiday_mode set by the user and 
so our assumption is that unless the holiday mode is set to 1 (they are on 
holiday) its is assumed to be 0. Its the assumption thats causing the problem. 
If there is no entries in Perimeter_Notifications thats also fine, no rows get 
returned. 

CREATE TABLE "Users" (
"email" TEXT NOT NULL,
"password" TEXT NOT NULL,
"salt" TEXT NOT NULL,
"creation_timestamp" TEXT NOT NULL DEFAULT 
(datetime('now','localtime')),
   PRIMARY KEY("email")
);

CREATE TABLE "Devices" (
"Email" TEXT NOT NULL,
"UUID" text NOT NULL,
"Holiday_Mode" integer NOT NULL
);

CREATE TABLE "Perimeter_Notifications" (
   "Email" text NOT NULL ,
"UUID" text NOT NULL,
"route_id" INTEGER NOT NULL,
"day" integer NOT NULL,
"hour" integer NOT NULL
);

(Please note the UUID is nothing to do with the UUID discussion a few days ago, 
I?m not brave enough to open up that little can of worms, we just happen to 
have chosen that column name a few months ago for mobile devices. Also in case 
anybody asks, we're not storing passwords in plain text either). 

I?m not sure if my database design is wrong or I simply cannot work out the SQL 
to make it work. It might actually be both :)

What I want to do is join the table Users and Perimeter Notifications together 
but only if the value of Devices.Holiday_Mode is either non existent or if 
Devices.Holiday_Mode does exist and its 0. If Devices.Holiday_Mode is 1 it 
means the user is on holiday and don?t send them anything.

I can work out the logic if Devices.Holiday_Mode actually exists and is either 
1 or 0. Thats pretty basic SQL.However if there is no row in Devices with that 
Email and UUID then thats the equivalent as Devices.Holiday_Mode being 0. I?ve 
looked at IS NULL or NOT EXISTS but I?m struggling to get my head around it 
all. I can do all of this in a higher level language (not sure if Perl is 
higher level than SQL) but I should be able to do this in SQL itself. 

Any advice or guidance welcomed please.

Thanks for reading,

Rob.


[sqlite] Sample SQL code thats beyond me :(

2015-05-23 Thread Rob Willett
Rich,

Thanks. Simply writing the initial e-mail helped clarify things for me. 

We?re trying to work out the logic of setting holiday_mode to an explicit 0 
rather than an assumed 0. Its not quite as simple as setting it in a table as 
its linked back to a mobile app and the synchronisation logic is a little 
convoluted. 

If we can force holiday_mode to be set to either 0 or 1 then the problem goes 
away, which comes down to getting the design right. I think that this ?issue? 
we have is indicative of a poor DB design and wrong assumptions (bad pun) and 
we should fix that.

Rob.

> On 23 May 2015, at 17:06, Rich Shepard  wrote:
> 
> On Sat, 23 May 2015, Rob Willett wrote:
> 
>> What I want to do is join the table Users and Perimeter Notifications
>> together but only if the value of Devices.Holiday_Mode is either non
>> existent or if Devices.Holiday_Mode does exist and its 0. If
>> Devices.Holiday_Mode is 1 it means the user is on holiday and don?t send
>> them anything.
> 
> Rob,
> 
>  First, you can set holiday_mode to 0 by default rather than leaving it
> NULL (unknown). As you wrote, unless the user explicitly sets the mode to 1
> the assumption is that its value is 0. After all, it's gotta' be one or the
> other, right?
> 
>  Second, select * from Devices where holiday_mode == 0. Use that as a
> sub-query and join users to the results. Now you have a list of user email
> addresses for only those with holiday_mode of zero.
> 
> HTH,
> 
> Rich
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Sample SQL code thats beyond me :(

2015-05-23 Thread Rob Willett
Rich, Keith, Luuk,

Thank you all for taking the time to write such cogent and helpful replies. 

I?ve tried out the ideas and they all work fine. We?ve also been looking at the 
design of our database and our logic to see if thats right. As I mentioned 
previously, I have feeling that this issue is possibly due to bad DB design on 
our part so we need to look at that carefully. I?m rather nervous of 
propagating bad design any further so we?ll check carefully.

Either way we have a working solution, so I feel optimistic.

Thanks again and as its Saturday have a drink. If any of you are near York in 
England I?ll happily buy you a pint.

Best wishes,

Rob.

> On 23 May 2015, at 17:41, Keith Medcalf  wrote:
> 
> 
> You also lastly mention that the UUID fields are also used in the selection, 
> so the problem statement is really:
> 
> Return the projection of Users and Perimeter_Notifications using the common 
> email field as the equijoin key, but return only the results where there is 
> not a Devices record with the email and uuid matching the corresponding 
> fields in Perimeter_Notifications which has Holiday_Mode = 1:
> 
> SELECT *
>  FROM Users, Perimeter_Notifications
> WHERE Users.email = Perimeter_Notifications.email
>   AND NOT EXISTS (SELECT 1
> FROM Devices
>WHERE Devicess.email = Perimeter_Notifications.email
>  AND Devices.UUID = Perimeter_Notifications.UUID
>  AND Holiday_Mode = 1);
> 
> Your index on the Devices table will need to include the UUID as in Devices 
> (email, uuid, holiday_mode ...) (the order within the first three columns of 
> the index are irrelevant for this query's performance.
> 
>> -Original Message-
>> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
>> bounces at mailinglists.sqlite.org] On Behalf Of Keith Medcalf
>> Sent: Saturday, 23 May, 2015 10:26
>> To: General Discussion of SQLite Database
>> Subject: Re: [sqlite] Sample SQL code thats beyond me :(
>> 
>> To re-phrase your problem statement:
>> 
>> Join the table Users and Perimeter_Notifications using the common email
>> field and return the results as long as there does not exist a Devices
>> record where Holiday_Mode is 1 for that Users email.
>> 
>> Which translates directly to:
>> 
>> SELECT *
>>  FROM Users, Perimeter_Notifications
>> WHERE Users.email = Perimeter_Notifications.email
>>   AND NOT EXISTS (SELECT 1
>> FROM Devices
>>WHERE Devices.email = Users.email
>>  AND Holiday_Mode = 1);
>> 
>> You should have an index on Devices (email, Holiday_Mode ...), and of
>> course you will need an index on Perimeter_Notifications (email ...).  You
>> could also phrase it as an outer join, but that will be far less efficient
>> that the correlated subquery.  Some people are in love with outer joins,
>> however.  You would only need to use an outer join if you also needed some
>> data from the Devices table to be returned.
>> 
>> It also has the advantage that when you read it, it translates directly
>> back into the original (re-phrased) problem statement, so it is self-
>> documenting.
>> 
>> 
>>> -Original Message-
>>> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
>>> bounces at mailinglists.sqlite.org] On Behalf Of Rob Willett
>>> Sent: Saturday, 23 May, 2015 09:41
>>> To: General Discussion of SQLite Database
>>> Subject: [sqlite] Sample SQL code thats beyond me :(
>>> 
>>> Hi,
>>> 
>>> I?m trying to do some analysis across a couple of tables and the SQL is
>>> beyond my meagre skills. I?m struggling to even describe the problem to
>> be
>>> honest.
>>> 
>>> The high level description is that I have three tables, Users, Devices
>> and
>>> Perimeter_Notifications. The high level description is that I want to
>>> extract a list of users from a database to send information to if they
>> are
>>> not on holiday. However I don?t necessarily have the holiday_mode set by
>>> the user and so our assumption is that unless the holiday mode is set to
>> 1
>>> (they are on holiday) its is assumed to be 0. Its the assumption thats
>>> causing the problem. If there is no entries in Perimeter_Notifications
>>> thats also fine, no rows get returned.
>>> 
>>> CREATE TABLE "Users" (
>>>"email" TEXT NOT NULL,
>>>"password" TEXT NOT NULL,
>>>  

[sqlite] Sample SQL code thats beyond me :(

2015-05-23 Thread Rob Willett
Rich,

I have lived and worked in New York and DC but not for too many years. I have 
fond memories of the eastern seaboard. 

I think you are correct with first, second and third passes on design. We?re 
into our second pass now. As we work through the issues we may redesign bits of 
it. However the chain of data flow is now getting complex so ?small' changes 
can take a long time to move through (and yes we have abstracted our designs 
out).

All the very best.

Rob.

> On 23 May 2015, at 18:56, Rich Shepard  wrote:
> 
> On Sat, 23 May 2015, Rob Willett wrote:
> 
>> Thanks again and as its Saturday have a drink. If any of you are near York
>> in England I?ll happily buy you a pint.
> 
> Rob,
> 
>  I'm in the upper left corner of the US so I'll have to pass on your kind
> offer.
> 
>  Germane to your fundamental concern, over the years I've found that my
> first pass at a database schema is usually sub-optimal. The first design is
> based on initial assumptions, and further deep thinking can bring up issues
> not recognized before.
> 
>  I'm sure you will evolve a schema that works well for your needs and
> avoids hidden problems.
> 
> Rich
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Sample SQL code thats beyond me :(

2015-05-23 Thread Rob Willett
Rich,

Thanks. Simply writing the e-mail helped clarify things for me. 

We?re trying to work out the logic of setting holiday_mode to an explicit 0 
rather than an assumed 0. Its not quite as simple as setting it in a table as 
its linked back to a mobile app and the synchronisation logic is a little 
convoluted. 

If we can force holiday_mode to be set to either 0 or 1 then the problem goes 
away, which comes down to getting the design right. I think that this ?issue? 
we have is indicative of a poor DB design and wrong assumptions (bad pun) and 
we should fix that.

Rob.

> On 23 May 2015, at 17:06, Rich Shepard  wrote:
> 
> On Sat, 23 May 2015, Rob Willett wrote:
> 
>> What I want to do is join the table Users and Perimeter Notifications
>> together but only if the value of Devices.Holiday_Mode is either non
>> existent or if Devices.Holiday_Mode does exist and its 0. If
>> Devices.Holiday_Mode is 1 it means the user is on holiday and don?t send
>> them anything.
> 
> Rob,
> 
>  First, you can set holiday_mode to 0 by default rather than leaving it
> NULL (unknown). As you wrote, unless the user explicitly sets the mode to 1
> the assumption is that its value is 0. After all, it's gotta' be one or the
> other, right?
> 
>  Second, select * from Devices where holiday_mode == 0. Use that as a
> sub-query and join users to the results. Now you have a list of user email
> addresses for only those with holiday_mode of zero.
> 
> HTH,
> 
> Rich
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Sample SQL code thats beyond me :(

2015-05-25 Thread Rob Willett
Simon,

The issue was not storing 1 or 0 per se, but rather thinking through the logic 
of what it could be under certain circumstances.

Our app provides highly personalised traffic information for Londonders. One 
aspect of the app is to allow alerts to be sent to people when their app isn?t 
even on. The user may indicate they are on holiday in which case no information 
is sent, however we allow a highly detached ack mode of working with the app 
from the database. We were trying to think through the impact of providing a 
default to the system, e.g. they are on or not on holiday and how that would 
play out when synchronisation eventually occurs. The user may not have put 
anything in for their holiday and should we make any assumptions. 

We used to work for IBM which is heavy on Lotus Notes, the normal way of using 
Notes is detached from a main server and syncing every X minutes, the problems 
occur when the back end Notes database gets out of sync with the front end 
client database. You get conflicts where the system doesn?t know which is the 
master version of a document. We are trying to think through these scenarios 
and make sure we don?t end up in a Save/Replication conflict.This is not the 
biggest problem for us, but we want to make sure we?ve done the job right.

BTW This is not a slight on SQLIte at all, its the classic problem of data 
potentially being written in two places and then who wins. IBM never really 
resolved the issue with Lotus Notes and we are trying to think things through 
to avoid future issues. We *may* be overthinking the problem but at least we?ve 
tried.

We are still working our way through the potential issues and trying to make 
sure the schema is right. Its a lot easier to fix the schema now than in a 
years time :)

Thanks for the helpful reply,

Rob
> On 25 May 2015, at 13:34, Simon Slavin  wrote:
> 
> 
> On 23 May 2015, at 5:21pm, Rob Willett  wrote:
> 
>> If we can force holiday_mode to be set to either 0 or 1 then the problem 
>> goes away, which comes down to getting the design right.
> 
> I don't know how your synchronisation works, but you can definitely make sure 
> only 'legal' values are stored in a column:
> 
> CREATE TABLE myTable (
>   holiday_mode INTEGER
>   DEFAULT 0
>   CHECK (holiday_mode = 0 OR holiday_mode = 1)
> )
> 
> You don't need a DEFAULT constraint in there too, I just put it in to show 
> it's possible.
> 
>> I think that this ?issue? we have is indicative of a poor DB design and 
>> wrong assumptions (bad pun) and we should fix that.
> 
> In previous jobs I have inherited databases which would have required a lot 
> of work to fix in their current forms but would take care of themselves with 
> properly re-written schema.  But sometimes you can't do that because it would 
> require retesting huge amounts of legacy code.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] BEGINNER - Transactions in shell script

2015-09-11 Thread Rob Willett
Petr,

You are making a number of fundamental mistakes with your security.  

1. Security through obscurity is your first mistake. There is no such thing. 

2. Assuming that nobody is writing CGI scripts on Windows Servers is your next 
mistake. A lot of systems still do this, a lot of old systems still use this 
technique and some new ones, The attack vector is not necessarily through your 
CGI script itself but through the Windows Web server. Unless you have patched 
and patched and patched your web server, you will be attacked. 

3. You assume that nobody is interested in your machine. Wrong. A lot of people 
are very interested as they can add your hacked server to their bonnet and sell 
your resources on. Your machine does not have to be publicised at all. As an 
example, I have a private server which I use. It has no DNS entry (a common way 
to search for machines), so is only accessible through an IP address which has 
never been published. It only has a single ssh port open and port 80 for a 
private web server running some software there rest of the machine is locked 
down as best I can. The lock down took me a day to do. It is not trivial. My 
last weekly report showed over 200,000 attempts to break into the machine via 
ssh, http, and various CGI exploits. Thats 200,000 robot attempts, the most 
prevalent was an ssh attempt from a single machine which accounted for 72,000 
goes. A public web server I have has over 1M hacking attempts per week. This is 
for a low usage machine. 

I give your machine less than 24 hours once it is live on the internet if you 
put it on without taking security seriously. You need to get the OS patched up, 
the ports closed down, the web server patched up and correctly configured. Out 
of the box the security on a Windows server (depending on the version) is poor. 
You need to learn what you need to do (and there are loads of guides on the 
internet) otherwise your server will be owned by somebody else very quickly. 

To be blunt you have misunderstood computer security, Saying ?trust me? doesn?t 
work. 

Best of luck,

Rob

> On 11 Sep 2015, at 13:42, Petr L?z?ovsk?  wrote:
> 
> There is a major difference: You are talking about SSH and Linux, this 
> combination running on hundred milions of network devices accross whole 
> internet. Thus develop intruding scripts does make sense. But I am using 
> Windows shell scripts as CGI, which is EXTREMELY rare. Who will study this 
> technique to intrude my (or very few another) systems? No one trust me ;-)
> 
> L.
> 
> BTW: If someone did it anyway, I will give him medal and start experinces 
> sharing to him 
> 
>> You'd be surprised by what is out there trying to get into your system.
> 
>> I had port 22 open on my home router to go to a Linux machine so I could SSH 
>> into my home network from anywhere in the world, even though I rarely ever 
>> leave the 519 area code.  One day I went to look at my messages log file and 
>> noted numerous brute force attempts to get into my machine.  Fortunately, 
>> the machine is setup so that you can't SSH in as root, and the single login 
>> name that has any kind of access root capable access is intentionally camel 
>> cased to thwart name dictionary attacks.  The attacks were automated at 
>> their end, obviously, but if you have a machine exposed, someone is going to 
>> have software that will do anything and everything to gain access through 
>> whatever weakest link you have.
> 
> 
>> I'm on a residential cable line, with an IP that changes periodically, 
>> however, I'm still subject to attacks.  SSH is a common thing, and what you 
>> have written may not be interesting to the hacker space as a whole, however, 
>> there is that one idiot out there that WILL take the time to break into your 
>> system for jollies.
> 
> 
> 
> 
> 
>> On Fri, Sep 11, 2015 at 6:12 AM, Petr L?z?ovsk?  wrote:
> 
> 
>> Never heard about this. Thinked about this a bit, but have no idea how it 
>> could menace my CGI application. But as far I am a beginner, expecting it 
>> could be a menace but rely on Security by obscurity. Some time a go, when I 
>> start writing CGI powered by windows shell scripts, I have serched (almost 
>> whole) internet for some examples or informations, but I found nothing. 
>> That means I am lonely with this technique ;-) No hacker will study such 
>> weird technique to intrude only one system on whole internet ;-)
>> 
>> L.
> 
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] BEGINNER - Transactions in shell script

2015-09-12 Thread Rob Willett
Petr,

Since this is the SQLite mailing list, we are moving away from the intentions 
of the list, however I think your points need addressing as they may be 
relevant to other people using this mailing list. I apologise to other people 
if this is off topic but I think its important enough to answer.

> On 11 Sep 2015, at 18:38, Petr L?z?ovsk?  wrote:
> 
>> 1. Security through obscurity is your first mistake. There is no such thing. 
> 
> Interesting It does not exist, but it have article on wikipedia. Sounds 
> like UFO or Yetti?   

Security through obscurity means that you assume that because nobody knows your 
system or knows your code then you are secure. That is what I was referring to. 
The article I assume you refer to

https://en.wikipedia.org/wiki/Security_through_obscurity 


makes this very point.

> 
>> 2. Assuming that nobody is writing CGI scripts on Windows Servers is your 
>> next mistake. A lot of systems still do this, a lot of old systems still use 
>> this technique and some new ones, The attack vector is not necessarily 
>> through your CGI script itself but through the Windows Web server. Unless 
>> you have patched and patched and patched your web server, you will be 
>> attacked. 
> 
> Of course I keep my web server software up-to-date, why do you think I do not 
> did it? I am talking here about my scripts, not about the server SW. But the 
> server SW is relatively rare too... 

I do not know what you do with your server, I would like to think you keep it 
patched up, but since you did not say all I can do is assume the worst and be 
proven wrong. However have you locked down every port, have you removed 
everything that is not needed, have to configured your applications that are 
front facing to be as secure as possible, have you put https for the web 
traffic that needs to be transmitted, have you checked that another machine on 
your local internet cannot get access to your machine, a side door approach. 
The scripts themselves may be secure, but the mechanism needed to run them 
needs to be just as secure. 

> 
>> 3. You assume that nobody is interested in your machine. Wrong. A lot of 
>> people are very interested as they can add your hacked server to their 
>> bonnet and sell your resources on. Your machine does not have to be 
>> publicised at all. As an example, I have a private server which I use. It 
>> has no DNS entry (a common way to search for machines), so is only 
>> accessible through an IP address which has never been published. It only has 
>> a single ssh port open and port 80 for a private web server running some 
>> software there rest of the machine is locked down as best I can. The lock 
>> down took me a day to do. It is not trivial. My last weekly report showed 
>> over 200,000 attempts to break into the machine via ssh, http, and various 
>> CGI exploits. Thats 200,000 robot attempts, the most prevalent was an ssh 
>> attempt from a single machine which accounted for 72,000 goes. A public web 
>> server I have has over 1M hacking attempts per week. This is for a low usage 
>> machine. 
> 
> Script kiddies starting codes writen to attack widely spreaded systems, 
> otherwise it will be not much fun. Some of this codes could be specialized to 
> intrude minor systems, but I have doubts there are number of working scripts 
> to successfuly intrude systems with rare occurance.
> 
> Real hackers, those who are experienced in writing WORKING code targeted to 
> intrude one specific rare system, need a REAL reason to did such job. My 
> system does not offer such reason?.

If you are using a Windows OS then your system is widely available. There may 
be millions of machines running your version of the OS, so you are a target 
from script kiddies. This comes back to Security through Obscurity. Your system 
is not unique and the resources it offers of an internet connection and 
processing power makes it attractive. The people searching the internet do not 
know the details of your machine, they are looking for machines to add to 
botnets. The fact you run a local database on it is of no interest, they want 
the machines to use to rent out for a DDOS attack or password cracking or spam 
sending. 

> 
>> I give your machine less than 24 hours once it is live on the internet if 
>> you put it on without taking security seriously. You need to get the OS 
>> patched up, the ports closed down, the web server patched up and correctly 
>> configured. Out of the box the security on a Windows server (depending on 
>> the version) is poor. You need to learn what you need to do (and there are 
>> loads of guides on the internet) otherwise your server will be owned by 
>> somebody else very quickly. 
> 
> As I already wrote, not using IIS. OS is protected by manualy configured 
> firewall. By concept Security through obscurity using this one 
> http://wipfw.sourceforge.net/ Intruding script perform OS 

[sqlite] Handling the whole select query from an index

2015-09-17 Thread Rob Willett
Hi,

There was a topic on here a few weeks ago which I cannot remember and cannot 
find in the mess that is my e-mail system and after spending the last hours 
search the SQLite archives I still can?t find it so will ask here if anybody 
can remember or help. I can?t even format the question for Google to search on 
:(

The specific question I have is about trying to provide the fastest response 
possible to a select query.  I recall that the e-mail talked about using an 
index to satisfy the query and therefore never having to go out to get the rest 
of the data from the table, so it was a lot quicker. Is there anything that I 
need to do specially to make this happen. e.g. if I put all the fields of the 
table in the index BUT I really only search on the primary key

The reason for this I want to look up UK postcodes (Zip codes to our American 
brethren) and get their longitude and latitude. A UK postcode identifies a 
number of houses or commercial buildings. Depending on the area it can be just 
one building (a big one) or if you are in the country it can be quite a big 
area. If you sent a letter just to a postcode with no other identifier it 
probably wouldn?t get delivered, but putting a name on it or a building number, 
there?s a very good chance the post(wo)?man will deliver it. 

The CSV file looks like this

id,postcode,latitude,longitude
1,AB101XG,57.14416516000,-2.11484776800
2,AB106RN,57.13787976000,-2.12148668800
3,AB107JB,57.12427377000,-2.12718964400
4,AB115QN,57.14270109000,-2.09301461900
5,AB116UL,57.13754663000,-2.11269588600
?.
Couple of million more lines

The entire database schema looks like this. I know its complicated but bear 
with me :)

CREATE TABLE "postcode" (
 "postcode" text NOT NULL,
 "long" TEXT NOT NULL,
 "lat" TEXT NOT NULL,
PRIMARY KEY("postcode")
);

The only query that will ever run will be 

select long,lat from postcode where postcode = ??

Note I drop off the id field (column 0 in the CSV file) as its of no interest 
to me. I also store the long and lat as strings as I don?t want any number 
formatting changes at all. Rounding on a GPS number could cause the wrong 
location to be used.

The database will do nothing but return long and lat based on doing a postcode 
lookup. There will never be any updates or changes. If there are, the whole 
database will be regenerated. 

I need this to be as fast as possible and if necessary I?ll put it all in RAM. 
The database is currently 120MB so it would easily fit in RAM. As it never 
changes (perhaps 4 times per year), it could stay there. 

Is there anything else from the database schema side that would make things 
quicker? e.g. If I created an index with postcode, long, lat in, would that be 
quicker? or if i changed the long, lat to real (though I?m reluctant to do so), 
would that make a lot of difference? 

Any suggestions gratefully received and apologies for not being able to find it 
in the archives.

Thanks,

Rob


[sqlite] Handling the whole select query from an index

2015-09-17 Thread Rob Willett
John,

Thanks for this.

The base OS is Linux on a a 64bit server, the wrapper will be Perl running as a 
web service under Mojolicious. It will eventually be presented as a web service.

We could simply load up the entire database into Perl into an Assoc array and 
search on that. 

Richard,

WITHOUT ROWID was what I was looking for.  Thanks.

All,

What we?ll do is write a quick Perl script to check the speed of an assoc array 
in Perl vs SQLite. It might be instructive to see the difference. 

Rob.

> On 17 Sep 2015, at 13:12, John McKown  wrote:
> 
> Well, this may be a heretical answer. Given what you have said, I wouldn't
> even try to use SQLite. Well, at least not directly. I would use a "hash
> table". SQLite's indexing, if I understand correctly, is a B-Tree. And that
> is the _only_ option. What would likely work better is a "hash index". You
> might get some more concrete answers if you were to post the OS and
> implementation language. Such as, Windows 8.1 using C#. Or Linux 64-bit
> using C++. Also, do you an SQL data base? Perhaps something else would be
> better, if you're not really doing relational queries. But I don't know
> what, given that I don't know your system environment.
> 
> On Thu, Sep 17, 2015 at 6:58 AM, Rob Willett 
> wrote:
> 
>> Hi,
>> 
>> There was a topic on here a few weeks ago which I cannot remember and
>> cannot find in the mess that is my e-mail system and after spending the
>> last hours search the SQLite archives I still can?t find it so will ask
>> here if anybody can remember or help. I can?t even format the question for
>> Google to search on :(
>> 
>> The specific question I have is about trying to provide the fastest
>> response possible to a select query.  I recall that the e-mail talked about
>> using an index to satisfy the query and therefore never having to go out to
>> get the rest of the data from the table, so it was a lot quicker. Is there
>> anything that I need to do specially to make this happen. e.g. if I put all
>> the fields of the table in the index BUT I really only search on the
>> primary key
>> 
>> The reason for this I want to look up UK postcodes (Zip codes to our
>> American brethren) and get their longitude and latitude. A UK postcode
>> identifies a number of houses or commercial buildings. Depending on the
>> area it can be just one building (a big one) or if you are in the country
>> it can be quite a big area. If you sent a letter just to a postcode with no
>> other identifier it probably wouldn?t get delivered, but putting a name on
>> it or a building number, there?s a very good chance the post(wo)?man will
>> deliver it.
>> 
>> The CSV file looks like this
>> 
>> id,postcode,latitude,longitude
>> 1,AB101XG,57.14416516000,-2.11484776800
>> 2,AB106RN,57.13787976000,-2.12148668800
>> 3,AB107JB,57.12427377000,-2.12718964400
>> 4,AB115QN,57.14270109000,-2.09301461900
>> 5,AB116UL,57.13754663000,-2.11269588600
>> ?.
>> Couple of million more lines
>> 
>> The entire database schema looks like this. I know its complicated but
>> bear with me :)
>> 
>> CREATE TABLE "postcode" (
>> "postcode" text NOT NULL,
>> "long" TEXT NOT NULL,
>> "lat" TEXT NOT NULL,
>>PRIMARY KEY("postcode")
>> );
>> 
>> The only query that will ever run will be
>> 
>> select long,lat from postcode where postcode = ??
>> 
>> Note I drop off the id field (column 0 in the CSV file) as its of no
>> interest to me. I also store the long and lat as strings as I don?t want
>> any number formatting changes at all. Rounding on a GPS number could cause
>> the wrong location to be used.
>> 
>> The database will do nothing but return long and lat based on doing a
>> postcode lookup. There will never be any updates or changes. If there are,
>> the whole database will be regenerated.
>> 
>> I need this to be as fast as possible and if necessary I?ll put it all in
>> RAM. The database is currently 120MB so it would easily fit in RAM. As it
>> never changes (perhaps 4 times per year), it could stay there.
>> 
>> Is there anything else from the database schema side that would make
>> things quicker? e.g. If I created an index with postcode, long, lat in,
>> would that be quicker? or if i changed the long, lat to real (though I?m
>> reluctant to do so), would that make a lot of difference?
>> 
>> Any suggestions gratefully received and apologies for not being able to
>> find it

[sqlite] Handling the whole select query from an index

2015-09-17 Thread Rob Willett
Simon,

Fair point and agreed. We?ll update it.

Rob
> On 17 Sep 2015, at 13:56, Simon Slavin  wrote:
> 
> 
> On 17 Sep 2015, at 12:58pm, Rob Willett  
> wrote:
> 
>> CREATE TABLE "postcode" (
>>   "postcode" text NOT NULL,
> 
> As an incidental tip not related to your question, I would recommend that you 
> don't have a column name which is the same as a table name.  Call the table 
> 'locations' instead or something.
> 
> It's not illegal in SQL, but it can be confusing if you come back to it in a 
> few years.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Handling the whole select query from an index

2015-09-17 Thread Rob Willett
John, Richard,

A quick test shows the following:

1. Creating the database ?normally? without using WITHOUT ROWID.

This has 1703538 rows. 

Doing 1,000 randomish accesses via the web service takes 10 secs.

2. Using a perl assoc array and doing the same access via a web service takes 
10 secs. 

This also has 1703538 rows.

3. We cannot get WITHOUT ROWID working using Perl DBI. We get the following 
error 

DBD::SQLite::db prepare failed: malformed database schema (postcode) - near 
?WITHOUT"

This appears to be due to mismatched SQLite version but we are running 

macpro:postcode rwillett$ sqlite3 --version
3.8.5 2014-08-15 22:37:57 c8ade949d4a2eb3bba4702a4a0e17b405e9b6ace

The SQLite web page  on WITHOUT ROWID indicates we need 3.8.2 or higher so 
we?re confused as to what the problem is. We?ve never actually had problem with 
SQLite and Perl DBI so we need to look into it a little more. 

4. We then ran the same test using nothing more than select statements directly 
into sqlite3

e.g. 

select * from postcode where id = 'BT635SW';
select * from postcode where id = 'BT350PD';
select * from postcode where id = 'BT667TG';
select * from postcode where id = 'BT750PE';
select * from postcode where id = 'BT621AE';
select * from postcode where id = 'BT623PJ';
select * from postcode where id = 'BT670BX';
select * from postcode where id = 'BT623EG';
select * from postcode where id = 'BT670DS';
select * from postcode where id = 'BT655BU?;
??

We ran 1,000 tests in 

real0m0.025s
user0m0.013s
sys 0m0.013s

So the ?direct? connection runs approx 400x time quicker. Now this direct 
connection doesn?t have to open and close the database each time as ours 
currently does and doesn?t have the overhead of the web service around it. So 
its not a realistic comparison, but it does show how quick sqlite can be :)

In conclusion 

1. I can optimise the actual SQL as much as Iike, the bottleneck isn?t SQLite 
at the moment.

2. I need to work out what is happening using WITHOUT ROWID.

We?ll dig further but suspect that we are now optimising the wrong thing in our 
system :)

Rob

If we ignore the WITHOUT ROWID issue, then 
> On 17 Sep 2015, at 14:21, John McKown  wrote:
> 
> On Thu, Sep 17, 2015 at 7:21 AM, Rob Willett 
> wrote:
> 
>> John,
>> 
>> Thanks for this.
>> 
>> The base OS is Linux on a a 64bit server, the wrapper will be Perl running
>> as a web service under Mojolicious. It will eventually be presented as a
>> web service.
>> 
>> We could simply load up the entire database into Perl into an Assoc array
>> and search on that.
>> 
> 
> ?This is probably would I would do, in this particular case.?
> 
> 
> 
>> 
>> Richard,
>> 
>> WITHOUT ROWID was what I was looking for.  Thanks
> 
> 
> ?I'm going to have to review this option as well. I'm not familiar with
> what it accomplishes, performance wise.?
> 
> 
> 
>> .
>> 
>> All,
>> 
>> What we?ll do is write a quick Perl script to check the speed of an assoc
>> array in Perl vs SQLite. It might be instructive to see the difference.
>> 
> 
> ?Yes, it would. Please post your results. I'm really curious about it.?
> 
> 
> 
>> 
>> Rob.
>> 
>> 
> 
> -- 
> 
> Schrodinger's backup: The condition of any backup is unknown until a
> restore is attempted.
> 
> Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.
> 
> He's about as useful as a wax frying pan.
> 
> 10 to the 12th power microphones = 1 Megaphone
> 
> Maranatha! <><
> John McKown
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Handling the whole select query from an index

2015-09-17 Thread Rob Willett
Simon,

We understand how SQlite is compiled into programs. We weren?t sure if the 
issues were Navicat which we sometimes use for a front end or the Perl DB 
stuff. My money was on Navicat being the problem. I was wrong, it was the DB 
version. 

The sqlite version is 3.7.17 on Ubuntu 14.04 server. 

Rob

> On 17 Sep 2015, at 15:00, Simon Slavin  wrote:
> 
> 
>> On 17 Sep 2015, at 2:47pm, Rob Willett  
>> wrote:
>> 
>> 3. We cannot get WITHOUT ROWID working using Perl DBI. We get the following 
>> error 
>> 
>> DBD::SQLite::db prepare failed: malformed database schema (postcode) - near 
>> ?WITHOUT"
>> 
>> This appears to be due to mismatched SQLite version but we are running 
>> 
>> macpro:postcode rwillett$ sqlite3 --version
>> 3.8.5 2014-08-15 22:37:57 c8ade949d4a2eb3bba4702a4a0e17b405e9b6ace
>> 
>> The SQLite web page  on WITHOUT ROWID indicates we need 3.8.2 or higher so 
>> we?re confused as to what the problem is.
> 
> SQLite is not 'installed in your system'.  It is compiled separately into 
> each program which uses it.  The version number returned by the command-line 
> tool (sqlite3) is the version of SQLite that is compiled into the 
> command-line tool.
> 
> Your version of Perl will have a different version of SQLite compiled into 
> it.  To find out which version that is, I think you can do this:
> 
>  $dbh->{sqlite_version};
> 
> assuming $dbh is your handle to a SQLite connection.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Handling the whole select query from an index

2015-09-17 Thread Rob Willett
John,

The problem was the version of SQLite included in DBD. Its way old?.

I am a sysadmin as well as developer as well as tester as well as CFO. I wear 
many hats in this startup :)

Our conclusion is that we need to look at the architecture again to see if the 
web services can be speeded up. There?s no point in us shaving off 0.01 ms off 
SQlite if the response takes 0.1ms in the web service. 100 calls per second 
isn?t good enough. We?ll look at mojolicious and see how many processes we can 
pre-fork.

Rob

> On 17 Sep 2015, at 15:41, John McKown  wrote:
> 
> The latest PERL DBI for SQLite that I could see is at
> http://www.cpan.org/authors/id/I/IS/ISHIGAKI/DBD-SQLite-1.48.tar.gz . I
> took a look at it and it has sqlite3.c (and others) from the 3.8.10
> amalgamation in it.
> 
> Just "for fun", I copied 3 files (sqlite3.c, sqlite3.h, and sqlite3ext.h)
> from my SQLite (64abb65d4df11e5b3bcc4afc8e7c18e907c6080a 2015-08-28
> 03:48:04 UTC) source. In the DBD-SQLite-1.48 source directory, I then did:
> perl Makefile.PL #create the Makefile
> make #create the SQLite.so shared library
> sudo make install # Install the new DBD for SQLite
> 
> The above was on RedHat Fedora 22 x86_64. All the test ran successfully.
> 
> If it were me, I'd download the latest SQLite almagamation & the above
> mentioned DBD-SQLite. Copy the 3 files I mentioned from the amalgamation
> source to the DBD-SQLite-1.48 directory, then re-install DBD-SQLite as I
> did. But, of course, this will likely need to go though whatever change
> control procedures that Rob's installation has. Hum, I guess that I assumed
> that Rob is a sysadmin on this system. So maybe he will really to push an
> update request through channels to get DBD-SQLite updated.
> 
> 
> On Thu, Sep 17, 2015 at 9:00 AM, Simon Slavin  wrote:
> 
>> 
>>> On 17 Sep 2015, at 2:47pm, Rob Willett 
>> wrote:
>>> 
>>> 3. We cannot get WITHOUT ROWID working using Perl DBI. We get the
>> following error
>>> 
>>> DBD::SQLite::db prepare failed: malformed database schema (postcode) -
>> near ?WITHOUT"
>>> 
>>> This appears to be due to mismatched SQLite version but we are running
>>> 
>>> macpro:postcode rwillett$ sqlite3 --version
>>> 3.8.5 2014-08-15 22:37:57 c8ade949d4a2eb3bba4702a4a0e17b405e9b6ace
>>> 
>>> The SQLite web page  on WITHOUT ROWID indicates we need 3.8.2 or higher
>> so we?re confused as to what the problem is.
>> 
>> SQLite is not 'installed in your system'.  It is compiled separately into
>> each program which uses it.  The version number returned by the
>> command-line tool (sqlite3) is the version of SQLite that is compiled into
>> the command-line tool.
>> 
>> Your version of Perl will have a different version of SQLite compiled into
>> it.  To find out which version that is, I think you can do this:
>> 
>>  $dbh->{sqlite_version};
>> 
>> assuming $dbh is your handle to a SQLite connection.
>> 
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> 
> 
> 
> 
> -- 
> 
> Schrodinger's backup: The condition of any backup is unknown until a
> restore is attempted.
> 
> Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.
> 
> He's about as useful as a wax frying pan.
> 
> 10 to the 12th power microphones = 1 Megaphone
> 
> Maranatha! <><
> John McKown
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Handling the whole select query from an index

2015-09-17 Thread Rob Willett
Nicolas,

Speed of development, ease of development, maintenance, available frameworks, 
available Perl modules.

I actually know C the best of any language and will very happily code in C, but 
for some things its quicker to develop it in out in Perl. Certainly for 
anything involving ?interesting? text manipulations Perl is far, far quicker.  
We use the DBI module for database integration which we use with SQLite BUT we 
could change that for PostgreSQL or  mysql in a few hours. 

As an example I built a very simple restful server in Perl using Mojoliscous in 
around two hours this afternoon. The other two hours was coming to the 
conclusion that SQLite was not the bottleneck anyway in the webservice :) Now 
if I had to build that in C I?d still be reading the manual on sockets and 
programming them and trying to remember various flags. I have no doubt I could 
write something that worked but it would take me an awful lot longer. It would 
also be an awful lot bigger.  

A lot of Perl is very, very, very fast. People write modules in C that 
integrate into Perl and simple slot in. You get the coding speed of C with the 
 ease of maintenance and development of Perl . Code speed close 
to C is not uncommon.

Its also very mature as a language, though I dip in and out of object 
orientated Perl and back to traditional Perl as I see fit. 

There are disadvantages to Perl:

1. its single threaded so high clock speed is good or you need to break the 
problem down into different processes. 
2. People (myself included) can write the most awful code imaginable.
3. The CPAN modules can be waaay out of date and unmaintained.
4. Memory management consists of stopping the program to free the memory. Mmmm?.
5. The syntax can be obtuse. I was a beta tester for Perl about a zillion years 
ago and it was a little archaic then.
6. Writing regular expressions is great but it can lead to code that looks like 
noise on a modem line.
7. You can end up with the same dependency hell as any other language. 
8. Any other Perl coder is as old as me.

However as an example of what you can get, we do a lot of natural language 
programming and Bayesian maths in our system. We needed a language stemmer. 
Writing this would have taken us weeks , if not months, all of these modules 
were available on CPAN. We installed them and got up and running in hours (well 
a few days). Try that in C or JavaScript or PHP or C++ or Java. I suspect Lisp 
would work though.

Hope that helps,

Rob


> On 17 Sep 2015, at 17:30, Nicolas J?ger  wrote:
> 
> hi Rob (Willet),
> 
> May I ask why are you using Perl instead of C/C++ ? I'm not going to start a 
> debate, I'm just
> curious.
> 
> 
> regards,
> Nicolas
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Handling the whole select query from an index

2015-09-18 Thread Rob Willett
Chris,

The postcode is unique in this data collection. I *should* have stated that as 
it is important. My apologies.

The UK postcode is defined to cover several houses, in London my postcode 
covers around 20 houses, all of them on my street. This means the area it 
covers is pretty small. Outside London, a postcode can cover a far, far wider 
area specially in rural or sparsely populated areas. I would imagine Australian 
postcodes to be similar but thats based on a guess rather than actually any 
knowledge. 

What we wanted was a postcode to GPS lookup, it was not to pinpoint a house, 
you would need a little more information to be that specific, e.,g. house 
number or a name, but to get to a general area that has a central GPS point. 

For our use case, what we have is sufficient as all the postcodes are unique in 
our system. If the user has more information such as a full address and 
postcode, there are other techniques we can use to be more accurate but thats a 
different use case.

Rob.

> On 18 Sep 2015, at 01:44, Chris Waters  wrote:
> 
> 
>> On Thu, 17 Sep 2015 13:55:51 +0100 Simon Slavin wrote
>> 
>>> At the end of the CREATE TABLE statement add keywords: "WITHOUT
>>> ROWID".  Like this:
>>> 
>>> CREATE TABLE postcode(
>>> postcode TEXT,
>>> long TEXT,
>>> lat TEXT
>>> ) WITHOUT ROWID;
>> 
>> The documentation suggests that you need to explicitly declare a PRIMARY 
>> KEY, so you would need something more like
>> 
>> CREATE TABLE postcode(
>> postcode TEXT PRIMARY KEY,
>> long TEXT,
>> lat TEXT
>> ) WITHOUT ROWID;
>> 
>> but perhaps I misunderstood.
> 
> 
> 
> Elsewhere in that email (on another topic) a correspondent urged that the 
> outcome should be stated, the requirements understood so the proposed 
> solution could address that, not how to improve code snippets. I think that 
> applies here. In this particular thread I think an important element is that 
> postcode will not be unique, if Australia Post's version of this is typical. 
> One postcode covers several suburbs. I guess this will fundamentally alter 
> the approach. A blisteringly fast lookup retrieving the wrong result may not 
> be all that useful.
> 
> Chris Waters
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] [OT] Handling the whole select query from an index

2015-09-18 Thread Rob Willett
Rowan,

It wasn?t a troll. My trolls are normally far, far less subtle than that. My 
?guess? was that since Australia is such a large country in comparison to the 
UK and has a very small population in comparison to the UK, that its relatively 
sparsely populated. I know that some cities are quite heavily populated, indeed 
one of our friends left to work in Sydney for three years only last week, but 
that the population density in the inside of the country is almost 
non-existent. As I said it was a guess rather than any informed knowledge. 

In London postcodes are down to very small areas, you can easily check this 
through a number of postcode lookup systems used by banks or online shoppers. 
Clearly I was wrong about Australia. I apologise for any offence caused. I 
wasn?t aware that population density was such a sore subject in Australia.

However now I know that we have an Aussie on the list and the Ashes are over, I 
will feel completely happy to troll about that, confident that at least for the 
next N months, there?s nothing you can do about it :) It was a great series, 
full of drama and excitement. It was weird not knowing which English or which 
Aussie team was going to turn up for each of the Tests. 

Just realised that the RWC starts tonight here and Australia are in our Pool of 
Death. Mmm??might retract my comment above :) We?re going round to a Welsh 
friends to watch the opening ceremony. I know for certain that one of the 
Welsh, Australian or English are going to be really unhappy in a few weeks 
time. 

Rob.

> On 18 Sep 2015, at 07:35, Rowan Worth  wrote:
> 
>> Outside London, a postcode can cover a far, far wider area specially in
> rural or sparsely populated areas. I would imagine Australian postcodes to
> be similar but thats based on a guess rather than actually any knowledge.
> 
> I'm not sure whether to take this claim of ignorance at face value or
> whether its an incredibly subtle troll aimed at Australia's lack of
> population density (in which case well played sir, I guess you deserve the
> ashes after all).
> 
> Postcodes in my city (Perth) cover a *much* wider area than 20 houses -
> generally several suburbs as Chris Waters suggested. But we jokingly refer
> to Perth as "the biggest country town in the world" which is why your
> comparison to rural England is so amusing/cutting :P
> 
> -Rowan
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Handling the whole select query from an index

2015-09-17 Thread Rob Willett
John,

The problem was the version of SQLite included in DBD. Its way old?.

I am a sysadmin as well as developer as well as tester as well as CFO. I wear 
many hats in this startup :)

Our conclusion is that we need to look at the architecture again to see if the 
web services can be speeded up. There?s no point in us shaving off 0.01 ms off 
SQlite if the response takes 0.1ms in the web service. 100 calls per second 
isn?t good enough. We?ll look at mojolicious and see how many processes we can 
pre-fork.

Rob

> On 17 Sep 2015, at 15:41, John McKown  wrote:
> 
> The latest PERL DBI for SQLite that I could see is at
> http://www.cpan.org/authors/id/I/IS/ISHIGAKI/DBD-SQLite-1.48.tar.gz . I
> took a look at it and it has sqlite3.c (and others) from the 3.8.10
> amalgamation in it.
> 
> Just "for fun", I copied 3 files (sqlite3.c, sqlite3.h, and sqlite3ext.h)
> from my SQLite (64abb65d4df11e5b3bcc4afc8e7c18e907c6080a 2015-08-28
> 03:48:04 UTC) source. In the DBD-SQLite-1.48 source directory, I then did:
> perl Makefile.PL #create the Makefile
> make #create the SQLite.so shared library
> sudo make install # Install the new DBD for SQLite
> 
> The above was on RedHat Fedora 22 x86_64. All the test ran successfully.
> 
> If it were me, I'd download the latest SQLite almagamation & the above
> mentioned DBD-SQLite. Copy the 3 files I mentioned from the amalgamation
> source to the DBD-SQLite-1.48 directory, then re-install DBD-SQLite as I
> did. But, of course, this will likely need to go though whatever change
> control procedures that Rob's installation has. Hum, I guess that I assumed
> that Rob is a sysadmin on this system. So maybe he will really to push an
> update request through channels to get DBD-SQLite updated.
> 
> 
> On Thu, Sep 17, 2015 at 9:00 AM, Simon Slavin  wrote:
> 
>> 
>>> On 17 Sep 2015, at 2:47pm, Rob Willett 
>> wrote:
>>> 
>>> 3. We cannot get WITHOUT ROWID working using Perl DBI. We get the
>> following error
>>> 
>>> DBD::SQLite::db prepare failed: malformed database schema (postcode) -
>> near ?WITHOUT"
>>> 
>>> This appears to be due to mismatched SQLite version but we are running
>>> 
>>> macpro:postcode rwillett$ sqlite3 --version
>>> 3.8.5 2014-08-15 22:37:57 c8ade949d4a2eb3bba4702a4a0e17b405e9b6ace
>>> 
>>> The SQLite web page  on WITHOUT ROWID indicates we need 3.8.2 or higher
>> so we?re confused as to what the problem is.
>> 
>> SQLite is not 'installed in your system'.  It is compiled separately into
>> each program which uses it.  The version number returned by the
>> command-line tool (sqlite3) is the version of SQLite that is compiled into
>> the command-line tool.
>> 
>> Your version of Perl will have a different version of SQLite compiled into
>> it.  To find out which version that is, I think you can do this:
>> 
>>  $dbh->{sqlite_version};
>> 
>> assuming $dbh is your handle to a SQLite connection.
>> 
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> 
> 
> 
> 
> -- 
> 
> Schrodinger's backup: The condition of any backup is unknown until a
> restore is attempted.
> 
> Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.
> 
> He's about as useful as a wax frying pan.
> 
> 10 to the 12th power microphones = 1 Megaphone
> 
> Maranatha! <><
> John McKown
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Handling the whole select query from an index

2015-09-18 Thread Rob Willett
Yes we are aware of the Google API. Our business is GPS and GIS traffic data so 
we have actually built a lot of the infrastructure ourselves. We have our own 
map server running OpenStreetMap, a routing server and database allowing us to 
do the same queries as you mention. Its actually a lot quicker than two seconds 
in the UK.

Google is great until you want start hitting their limits which aren?t that 
high for us (in our opinion) and then they want that scarce commodity, money, 
hence we build our own stuff. Our business model wants a lot of clients, so its 
cheaper to do our own. 

Rob

> On 18 Sep 2015, at 16:54, R.Smith  wrote:
> 
> 
> 
> On 2015-09-18 08:17 AM, Rob Willett wrote:
>> What we wanted was a postcode to GPS lookup, it was not to pinpoint a house, 
>> you would need a little more information to be that specific, e.,g. house 
>> number or a name, but to get to a general area that has a central GPS point.
> 
> As an aside, you know you can already do this via a Google API right? That 
> is, you can find a GPS location for an address or approximate address (city & 
> postal code will do). Usually, if the address is not specific, the nearest 
> road intersection GPS is given or indeed 4 GPS points that denotes the 
> bounding box encircling the area.
> The country is not needed unless the city and postal code combination is 
> ambiguous... so safer to add.
> 
> You can do the reverse too, supply a GPS and obtain an address or approximate 
> address where ambiguous.
> 
> As long as you hit less than 1,000 requests per day, the service is free.
> 
> Results are returned usually in under 2 seconds in nice full XML or JSON.
> 
> Here's an example using my postal detail - one XML and one JSON:
> https://maps.googleapis.com/maps/api/geocode/xml?address=Midrand+1684+South+Africa=en
> https://maps.googleapis.com/maps/api/geocode/json?address=Midrand+1684+South+Africa=en
> 
> 
> (Might be less effort for you to just piggy-back off this until your system 
> becomes very adopted).
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Handling the whole select query from an index

2015-09-19 Thread Rob Willett
Ryan,

Thanks. We?ve got 100 requests a second which may be enough. We?ll keep looking 
though for any time.

Rob.

> On 18 Sep 2015, at 18:26, R.Smith  wrote:
> 
> >>>Rob: "We want to do postal code ==> GPS..."
> 
> >>Me: "You can use google apis..."
> 
> >Rob: "Our business is GPS and GIS traffic data"
> 
> Oops yes, that's a whole nother kettle of fish then. To return to some of the 
> older parts of the conversation, I think the SQL route is best (whichever 
> engine is chosen) - I am sure you will need to expand in future. I hope you 
> get the PERL latency sorted out.
> 
> Best of luck!
> Ryan
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Handling the whole select query from an index

2015-09-22 Thread Rob Willett
A quick update on handling the the select query. We have some speed results 
which are quite illuminating and positive.

We?ve been playing around with building a few web services, one of which was 
the postcode lookup that people have kindly helped with.

We have one service that is a simple insert into a database. One piece of data 
comes in (some GPS information), and we need to get it into a database table as 
quickly as possible. Its basically a data logger. 

The other is the postcode lookup, where a user requests a GPS location based on 
a postcode. The postcode lookup uses a single table in SQLite with a unique 
postcode as a key. 

We had written some rubbish shell scripts to test both services, which was lazy 
and pretty useless. So we decided to do the job properly and do a load test 
using Siege as the weapon of choice (pun intended). We like Siege as its dead 
simple and works very well. We can argue over the merits of Siege vs roadrunner 
vs whatever later.

Both tests done on a Mac Pro quad core with 32GB of RAM.

Test 1 - Simulating 100 concurrent users doing 100 requests one after the 
other. This uses morbo as a wrapper for the Perl script

Transactions:  1 hits
Availability: 100.00 %
Elapsed time:  11.25 secs
Data transferred:   0.53 MB
Response time:  0.11 secs
Transaction rate: 888.89 trans/sec
Throughput: 0.05 MB/sec
Concurrency:   99.32
Successful transactions:   1
Failed transactions:   0
Longest transaction:0.17
Shortest transaction:   0.00

Mmmm? Just under 900 requests per second. 

Test 2 - This is simulating writing a single (small) line of data to a table. 
No special set up has been done regarding pragmas or anything like that. Its a 
pretty simple ?standard? setup. our macs have SSD rather than HDD though. This 
uses morbo as a wrapper script around the perl.

We simulate 100 concurrent users writing 100 requests again.

Transactions:  1 hits
Availability: 100.00 %
Elapsed time:  33.15 secs
Data transferred:   0.02 MB
Response time:  0.33 secs
Transaction rate: 301.66 trans/sec
Throughput: 0.00 MB/sec
Concurrency:   99.51
Successful transactions:   1
Failed transactions:   0
Longest transaction:0.52
Shortest transaction:   0.05

Thats 10,000 inserts into the database. We cleared the table out beforehand and 
then checked that the table had 10,000 records in afterwards. It did.

Test 3: As per test1 but using hypnotoad (don?t you just love the names). This 
is a more performant wrapper for Perl.

Transactions:  1 hits
Availability: 100.00 %
Elapsed time:   6.53 secs
Data transferred:   0.53 MB
Response time:  0.06 secs
Transaction rate:1531.39 trans/sec
Throughput: 0.08 MB/sec
Concurrency:   91.40
Successful transactions:   1
Failed transactions:   0
Longest transaction:0.92
Shortest transaction:   0.00

Performance is almost (ish) doubled to just over 1500 transactions per second. 
No changes to the Perl scripts.

Test 4: as per test2 but using hypnotoad. Database tables cleared out before 
starting

Transactions:  1 hits
Availability: 100.00 %
Elapsed time:  13.16 secs
Data transferred:   0.02 MB
Response time:  0.13 secs
Transaction rate: 759.88 trans/sec
Throughput: 0.00 MB/sec
Concurrency:   95.96
Successful transactions:   1
Failed transactions:   0
Longest transaction:0.50
Shortest transaction:   0.00

Double the performance to just over 750 transactions per second. No changes to 
Perl scripts.

Summary

We?re rather pleased with the performance of Mojolicious as a web server (or 
rather web service) with SQLite. The Perl script for the postcode lookup is 96 
lines long. The Perl script for the GPS capture is 190 lines and uses a 
slightly different framework for a number of reasons which are too embarrassing 
to go into. 

I have no idea how these compare to other systems, but I think 700 web based 
insertions per second is pretty good. I have no doubt we could improve the 
speed with some careful optimisations but to be brutally honest, its not worth 
it for us.

Hope this helps somebody, it validates our choice of software so we?re pretty 
pleased.

Rob







[sqlite] Is it possible that dropping a big table takes very long

2016-04-16 Thread Rob Willett
Cecil,

If you have a load average of 15 then that normally means you have a 
massively overloaded Linux box. I don?t know your system but I get 
worried around a load average of 3-4 on our boxes. Load Average is a 
very crude measurement but a high number tends to be bad.

If your CPU is only running at 15% (how do you know this?) then the 
problem is likely elsewhere. My first thought is swap space, check what 
the disk io is, the fact you mention Java would immediately make me look 
at the amount of memory allocated to the JVM. For some reason many 
JVM?s allocate a pitifully small amount of memory, Java then runs out 
of memory pretty quickly and spends the next few secs/mins/hours 
fighting with the OS for resources.

Top, netstat, ps etc are your friends here. You need to look at them and 
see whats kicking your load average into the stratosphere. I?d be 
surprised if its Sqlite, we don?t normally drop tables, but from 
memory, dropping a 10GB table took no time. It might have been seconds 
or a minute, we didn?t measure it as it wasn?t an issue.

Rob

On 16 Apr 2016, at 20:25, Cecil Westerhof wrote:

> 2016-04-16 20:36 GMT+02:00 R Smith :
>
>>
>>
>> On 2016/04/16 3:39 PM, Cecil Westerhof wrote:
>>
>>> 2016-04-16 14:52 GMT+02:00 R Smith :
>>>
>>> Let me try the 100 million rows, this may take some time - I will 
>>> post
 again when it is done.

 ?I am curious.
>>>
>>
>> Well, here it is then, 100-million rows: The INSERT took a lot of 
>> time,
>> near 5 hours, but again, the DROP was only a few seconds.
>>
>
> ?That is what I would expect, but not what I see. :'-(
>
> I am filling the database again, but now with text UUID instead of 
> blob
> UUID. That takes a ?little? more time. When it is filled I try 
> again.
>
> Have another problem also. My CPU is about 15%, but the load average 
> is
> also about 15. (This is on a Linux system.) This results (sometimes) 
> in a
> very sluggish system. Can the load be a SQLite problem, or is it a 
> Java
> problem? (When the program is not running, the load average is a lot 
> lower.)
>
> -- 
> Cecil Westerhof
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Is it possible that dropping a big table takes very long

2016-04-17 Thread Rob Willett
Cecil,,

Linux reporting 8 cores is due to hyper threading on the four cores. 
Thats normal.

One of the cores reporting a high usage is normal if you have single 
threaded app that simply cannot be moved, e.g. many perl programs 
exhibit this behaviour. Thats fine and to be expected.

I do not have the whole thread for what you reported but I did read 
somebody ask if you have put your inserts between a BEGIN/END 
transaction. That will make a massive difference to your speed.  Also 
I?m unclear as to how Java fits in all of this. Perhaps you gave a 
better indication further up the thread I do not have.

There are a lot of very, very talented people on the list, I am an not 
one of them :), I would strongly urge you to provide more information 
rather than less, e.g. you provide summary information for Java by RSS. 
It would be a lot more useful to have the full raw data so people can 
look for quirks and anomalies rather than simply you interpreting it for 
us. I am NOT the best person to talk about SQLite performance, however I 
am more familiar with Unix/Linux performance and administration.

Since most people here do not have the same issues as you and I have no 
doubt they are hitting Sqlite far harder than I or you can, I would look 
at what you are doing that is different and Java keeps jumping out at 
me. I have run (though not programmed as I?m not a Java developer) a 
number of programs running Java as the backend onto large DB2 databases 
and the first thing the developers would do would be to increase the 
amount of memory that Java is allowed to allocate and use. They would 
tend to throw 4GB at each JVM or more if the sysadmins would let them.

I still don?t get a feeling for where the problem is though as the 
information is fragmentary.

Rob

On 17 Apr 2016, at 8:40, Cecil Westerhof wrote:

> 2016-04-17 1:03 GMT+02:00 Keith Medcalf :
>
>>> Have another problem also. My CPU is about 15%, but the load average 
>>> is
>>> also about 15. (This is on a Linux system.) This results (sometimes) 
>>> in a
>>> very sluggish system. Can the load be a SQLite problem, or is it a 
>>> Java
>>> problem? (When the program is not running, the load average is a lot
>>> lower.)
>>
>> You have slow CPU with multiple cores (15% means you probably have 
>> quad
>> core -- probably with Intel Hyper-Slowness as well (as in not SMT), 
>> or you
>> have dual-core SMT), and the dispatcher is brain-dead and dispatching 
>> the
>> single threaded application on the same CPU as more critical OS 
>> workers
>> resulting in swamping that core with compute use and preventing 
>> competing
>> threads (such as the OS or the gooey) from being other than -- well 
>> --
>> gooey.
>>
>> You claim this is Linux.  There are many brands and flavours (and
>> versions) or Linux, all tuned to behave differently.  You do not 
>> specify
>> the particular version and brand it is.
>>
>
> ?It never hurts to give some extra information. I only expected 
> SQLite
> help. But I do not mind to get other help. :-D
>
> The CPU is: Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz.
> It is a little strange. It has four cores but Linux thinks it has 
> eight.
> None of the eight has a high load: sometimes one 60-70%, but most of 
> the
> time a lot lower.
> The operating system is openSUSE 13.2.
>
> free -m gives:
>  total   used   free sharedbuffers 
> cached
> Mem: 15923  15737185   1300105   
> 8131
> -/+ buffers/cache:   7500   8422
> Swap: 2053   1659394?
>
>
> ?RSSMemory usage for java
> 
> RSSMemory   6 MB by PID=9705
> RSSMemory  54 MB by PID=9731
> RSSMemory  75 MB by PID=28844
> RSSMemory 112 MB by PID=18743
> RSSMemory 121 MB by PID=28880
> RSSMemory 345 MB by PID=1036
> 
> Total used RSSMemory: 713 MB
>
> swap usage for java
> 
> swap   2 MB by PID=28880
> swap   7 MB by PID=28844
> swap 128 MB by PID=9731
> swap 132 MB by PID=9705
> swap 442 MB by PID=1036
> 
> Total used swap: 711 MB
>
> The program has PID 18743 and does not use swap.
>
> ps -l 18743 gives:
> F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTYTIME CMD
> 0 S  1000 18743 28670  4  80   0 - 1718302 futex_ pts/12  45:29 java
> RandomUUID CheckUUID.properties
> ?
> My own psPid 18743 gives:
> USER   PID  PPID TT   START ELAPSED TIME STAT COMMAND
> cecil18743 28670 pts/12   Apr1616:29:42 00:45:27 Sl+  java
> RandomUUID CheckUUID.properties
>
> So it does not have a high CPU usage about 5%.
>
> I think I have to cancel the program, because I have:
> 23:03:12: Inserted6.00e+07 UUID's
> 00:50:33: Inserted6.10e+07 UUID's
> 02:56:31: Inserted6.20e+07 UUID's
> 04:56:06: Inserted6.30e+07 UUID's
> 06:53:03: Inserted6.40e+07 UUID's
> 09:14:08: Inserted

[sqlite] Performance comparison between SQLite and SQL Server?

2016-02-15 Thread Rob Willett
To add to the responses, we use SQLite for our main database which is 
currently around 60GB in size, so size isn?t an issue for SQLite. I am 
aware of other people with far, far larger SQLite databases. We did a 
quick test and ran up to a couple of hundred DB?s of data in SQLIte 
and it seemed fine.

We chose SQLite over other databases as

1. We didn?t need concurrency. All our data is written and read by a 
single process.
2. Its very lightweight. We can move the database around by simply 
copying the file which is fantastic for our use.
3. The speed is very, very fast. We haven?t found any significant 
speed differences FOR OUR USE CASE between Sqlite and other ?bigger? 
databases. YMMV.
4. The support is top notch. I have brought and paid for govt scale 
databases for governments and to be honest the support for SQLite is 
just as good, and to be honest I would say better than Big Red or Big 
Blue (and I used to work for Big Blue).

Thats not to say that SQLite is perfect, its not, however no database is 
perfect as everyones usage is different. I certainly wouldn?t state 
that SQLite is the answer to everyones problems and I can state with 
absolute certainty that DB2 is also not the answer for every use.

We are moving into a different phase of our development and we are 
investigating PostGIS for some of our work now, as that *MAY* be a 
better tool for some of our more exotic spatial queries and analysis. No 
reflection on SQLite but its a case of finding the right tool for the 
right job. After saying that we *may* end up using SQLite for this area 
as well.

Rob

On 15 Feb 2016, at 8:20, R Smith wrote:

> On 2016/02/15 5:21 AM, admin at shuling.net wrote:
>> Hi,
>>
>> I am just curious whether there is a performance comparison between 
>> SQLite
>> and SQL Server? Surely SQL Server will perform better on huge 
>> database with
>> thousands of tables(more than 10GB size). But whether SQLite will 
>> perform
>> better on smaller database such as one database with one table that 
>> is less
>> than 1GB?
>>
>> Thanks
>
> Some other replies have hinted at this already, but to expand: The one 
> isn't faster than the other in all cases. There are ways in which 
> SQLite is much faster even on a 100GB database - the "Lite" in SQLite 
> doesn't mean database size - it pertains more to the available 
> function-set. It can handle the very large DB's just as well as any 
> other.
>
> Where SQLite lacks (due to the "Lite"-ness) is in not having 
> user-access control, not having programmability (stored procedures and 
> functions, but then you can add custom functions to SQLite in C even, 
> which you can't easily do with the others, especially not with MSSQL). 
> The largest difference however, is that an SQLite connection operates 
> on (talks-to) a file, and the others usually talk to a server.
>
> The main advantage of SQLite is that it can be used for an application 
> file format to your program, complete as if it was a database in 
> itself (which, actually, it is), and moreover, you can embed the 
> entire DB in your application and on your hardware etc - like the 
> billions of handheld devices, phones, tablets, etc. that uses SQLite 
> daily.
>
> For more information, see:
> http://www.sqlite.org/whentouse.html
>
> To answer your speed question - it depends on data shape, size, IO 
> access speeds, Memory on the generation machine etc. Import big 
> datasets in both MSSQL and SQlite, run some queries, chances are some 
> queries be slightly faster in SQLite, and some are slightly faster in 
> MSSQL.  Single query performance is not really the driver of that 
> decision.
>
> Cheers,
> Ryan
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Can't understand why I need this cast

2016-02-18 Thread Rob Willett
Hi,

I?m sure this is a really dumb question but I?m struggling to 
understand why the following SQL is needed for what should be a trivial 
SQL expression.

I?ve minimised the example down to (hopefully) make it simpler.

I have a table with an ISO date StartTime in it held as a string.


sqlite> .schema test1
CREATE TABLE "Test1" (
 "Id" INTEGER NOT NULL,
 "StartTime" TEXT NOT NULL,
PRIMARY KEY("Id")
);



I add in some test ISO date data that *looks* OK


sqlite> select * from test1;
1|2011-05-03T05:00:00Z


I run the following to check that the ISO date is between a range of 
seconds, i.e. its greater than three hours ago and less than three hours 
in the future. I know this is contrived but this is the minimum test 
case from a far larger query.

select
strftime('%s' , starttime) ,
strftime('%s' , starttime) - (180 * 60) ,
 strftime('%s' , starttime) + (180 * 60) ,
strftime('%s' , starttime) - (strftime('%s' , starttime) - (180 * 60)) 
,
(strftime('%s' , starttime) + (180 * 60)) - strftime('%s' , starttime) 
,
strftime('%s' , starttime) >= (strftime('%s' , starttime) - (180 * 60)) 
,
strftime('%s' , starttime) <= (strftime('%s' , starttime) + (180 * 60))
from
test1;

What I expect to see is

1304398800|1304388000|1304409600|10800|10800|1|0

1. the ISO start time as secs - Correct
2. The ISO start time in secs minus 10800 - Correct
3, The ISO start time in secs plus 10800 - Correct
4. The value 10800 to check that the maths is correct - Correct
5. The value 10800 to check that the maths is correct - Correct
6. A check that the ISO value in secs is greater than the current time 
minus 10800 - Correct
7. A check that the ISO value in secs is less than the current time plus 
10800 secs - INCORRECT

if I CAST the value of the last column

select
strftime('%s' , starttime) ,
strftime('%s' , starttime) - (180 * 60) ,
 strftime('%s' , starttime) + (180 * 60) ,
strftime('%s' , starttime) - (strftime('%s' , starttime) - (180 * 60)) 
,
(strftime('%s' , starttime) + (180 * 60)) - strftime('%s' , starttime) 
,
strftime('%s' , starttime) >= (strftime('%s' , starttime) - (180 * 60)) 
,
strftime('%s' , starttime) <= cast((strftime('%s' , starttime) + (180 * 
60)) as integer)
from
test1;

I get the right answer

1304398800|1304388000|1304409600|10800|10800|1|1

I?m puzzled as to why this should be. Why would I need to cast an 
integer to an integer. I??m sure there?s an obvious answer but I 
can?t find it.

Any suggestions please?

Thanks,

Rob


[sqlite] Can't understand why I need this cast

2016-02-18 Thread Rob Willett
Quad,

Thanks, that almost makes sense. What still confuses us is that the 
other maths expressions work OK without a cast.

e.g.

strftime('%s' , starttime) - (180 * 60)

and

(strftime('%s' , starttime) + (180 * 60)) - strftime('%s' , starttime)

and

strftime('%s' , starttime) >= (strftime('%s' , starttime) - (180 * 60)) 
,

all work but

strftime('%s' , starttime) <= (strftime('%s' , starttime) + (180 * 60))

doesn?t.



So why does

strftime('%s' , starttime) >= (strftime('%s' , starttime) - (180 * 60))

work and

strftime('%s' , starttime) <= (strftime('%s' , starttime) + (180 * 60))

not work.

Thanks,

Rob

On 18 Feb 2016, at 11:25, Quan Yong Zhai wrote:

> It looks like strftime('%s',...) should return an integer and 
> strftime('%J', ...) should return a double value. But it always return 
>  text value.
> ________
> ???: Rob Willett<mailto:rob.sqlite at robertwillett.com>
> : ?2016/?2/?18 18:20
> ???: SQLite mailing 
> list<mailto:sqlite-users at mailinglists.sqlite.org>
> ??: [sqlite] Can't understand why I need this cast
>
> Hi,
>
> I?m sure this is a really dumb question but I?m struggling to
> understand why the following SQL is needed for what should be a 
> trivial
> SQL expression.
>
> I?ve minimised the example down to (hopefully) make it simpler.
>
> I have a table with an ISO date StartTime in it held as a string.
>
> 
> sqlite> .schema test1
> CREATE TABLE "Test1" (
>   "Id" INTEGER NOT NULL,
>   "StartTime" TEXT NOT NULL,
>  PRIMARY KEY("Id")
> );
>
> 
>
> I add in some test ISO date data that *looks* OK
>
> 
> sqlite> select * from test1;
> 1|2011-05-03T05:00:00Z
> 
>
> I run the following to check that the ISO date is between a range of
> seconds, i.e. its greater than three hours ago and less than three 
> hours
> in the future. I know this is contrived but this is the minimum test
> case from a far larger query.
>
> select
>  strftime('%s' , starttime) ,
>  strftime('%s' , starttime) - (180 * 60) ,
>   strftime('%s' , starttime) + (180 * 60) ,
>  strftime('%s' , starttime) - (strftime('%s' , starttime) - (180 * 
> 60))
> ,
>  ,
>  strftime('%s' , starttime) >= (strftime('%s' , starttime) - (180 
> * 60))
> ,
>  strftime('%s' , starttime) <= (strftime('%s' , starttime) + (180 
> * 60))
> from
>  test1;
>
> What I expect to see is
>
> 1304398800|1304388000|1304409600|10800|10800|1|0
>
> 1. the ISO start time as secs - Correct
> 2. The ISO start time in secs minus 10800 - Correct
> 3, The ISO start time in secs plus 10800 - Correct
> 4. The value 10800 to check that the maths is correct - Correct
> 5. The value 10800 to check that the maths is correct - Correct
> 6. A check that the ISO value in secs is greater than the current time
> minus 10800 - Correct
> 7. A check that the ISO value in secs is less than the current time 
> plus
> 10800 secs - INCORRECT
>
> if I CAST the value of the last column
>
> select
>  strftime('%s' , starttime) ,
>  strftime('%s' , starttime) - (180 * 60) ,
>   strftime('%s' , starttime) + (180 * 60) ,
>  strftime('%s' , starttime) - (strftime('%s' , starttime) - (180 * 
> 60))
> ,
>  (strftime('%s' , starttime) + (180 * 60)) - strftime('%s' , 
> starttime)
> ,
>  strftime('%s' , starttime) >= (strftime('%s' , starttime) - (180 
> * 60))
> ,
>  strftime('%s' , starttime) <= cast((strftime('%s' , starttime) + 
> (180 *
> 60)) as integer)
> from
>  test1;
>
> I get the right answer
>
> 1304398800|1304388000|1304409600|10800|10800|1|1
>
> I?m puzzled as to why this should be. Why would I need to cast an
> integer to an integer. I??m sure there?s an obvious answer but I
> can?t find it.
>
> Any suggestions please?
>
> Thanks,
>
> Rob
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Can't understand why I need this cast

2016-02-18 Thread Rob Willett
Tim,

We actually do store the timestamps as epoch seconds, but we wrote a 
quick piece of SQL to test something out and wanted to use the ISO date. 
Thats when the SQL failed and we couldn?t understand why. We hate it 
when we don?t understand why things don?t work the way we expect. 
Our OCD kicks in and annoys us :)

We?ve investigated it further and it still makes no sense, though Quan 
Yong Zhai has helped. It appears that

strftime('%s' , starttime) >= (strftime('%s' , starttime) - (180 * 60))

works BUT

strftime('%s' , starttime) <= (strftime('%s' , starttime) + (180 * 60))

does not work.

The only difference is the boolean operand.

We know how to solve the problem, we?re puzzled though as our 
understanding is clearly wrong.

Thanks for replying,

Rob

On 18 Feb 2016, at 12:14, Tim Streater wrote:

> On 18 Feb 2016 at 10:20, Rob Willett  
> wrote:
>
>> I?m sure this is a really dumb question but I?m struggling to
>> understand why the following SQL is needed for what should be a 
>> trivial
>> SQL expression.
>>
>> I?ve minimised the example down to (hopefully) make it simpler.
>>
>> I have a table with an ISO date StartTime in it held as a string.
>
> I'd be inclined to store your dates as seconds since the epoch. That 
> way arithmetic and comparisons become easy, and your SQL looks 
> simpler. Convert to a string for display. But perhaps your application 
> prevents that for some reason.
>
> --
> Cheers  --  Tim
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Can't understand why I need this cast

2016-02-18 Thread Rob Willett
Richard,

The example is highly contrived. The actual SQL we are/were actually 
interested in is

strftime('%s' , starttime) >= (strftime('%s' , starttime) - (180 * 60)),

AND

strftime('%s' , starttime) <= (strftime('%s' , starttime) + (180 * 60))

The rest of the SQL was to test what was happening with strftime and 
whether integers being produced from the expressions to test the two SQL 
statements above. We thought the contrived test expressions worked OK 
which was why we were puzzled as to why

strftime('%s' , starttime) >= (strftime('%s' , starttime) - (180 * 60)),

worked and

strftime('%s' , starttime) <= (strftime('%s' , starttime) + (180 * 60))

failed, though the only difference was the change in the boolean 
operand.

As Tim Streater pointed out we would actually use epoch seconds as this 
is more natural and we have that information in another database.

However we threw/hacked the above SQL together to test an idea we had, 
specifically to find traffic incidents three hours before and three 
hours after other traffic incidents.The test database table we tested it 
with didn?t have the epoch seconds in, so we used strftime instead to 
check our idea. Thats when we got results that puzzled us, we know we 
have alternate and better solutions to actually doing the work, whats 
worrying us is that our understanding of SQLite, strftime and implicit 
cast conversions is incorrect. The fact that strftime returns a text 
string is fine and that fits in with what we expected. We simple got the 
rest of the conversions wrong. We appear to have created a boolean 
expression that worked (a false positive) and that threw us back as we 
misunderstood (quite badly) what was going on.

We now know we have to be more careful about how we use strftime and we 
will be in our logic.

Thanks to everybody for the help, we?ve learnt something new today 
which is always good.

Best wishes,

Rob

On 18 Feb 2016, at 12:45, Richard Hipp wrote:

> On 2/18/16, Rob Willett  wrote:
>>
>> select
>>  strftime('%s' , starttime),
>>  strftime('%s' , starttime) - (180 * 60),
>>strftime('%s' , starttime) + (180 * 60),
>>  strftime('%s' , starttime) - (strftime('%s' , starttime) - (180 * 
>> 60)),
>>  (strftime('%s' , starttime) + (180 * 60)) - strftime('%s' , 
>> starttime),
>>  strftime('%s' , starttime) >= (strftime('%s' , starttime) - (180 * 
>> 60)),
>>  strftime('%s' , starttime) <= (strftime('%s' , starttime) + (180 * 
>> 60))
>> from
>>  test1;
>>
>
> I don't exactly understand why you are doing the query above, but for
> whatever your purpose is, the following formulation seems simpler to
> me:
>
> SELECT
>  strftime('%s', starttime),
>  strftime('%s', starttime, '-180 seconds'),
>  strftime('%s', starttime, '+180 seconds'),
>  (julianday(starttime) - julianday(starttime','-180 seconds'))*86400,
>  (julianday(starttime, '+180 seconds') - julianday(starttime))*86400,
>  julianday(starttime) >= julianday(starttime, '-180 seconds'),
>  julianday(starttime) <= julianday(starttime, '+180 seconds')
> FROM
> test1;
>
>
>
>
> -- 
> D. Richard Hipp
> drh at sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fastest way to backup/copy database?

2016-05-04 Thread Rob Willett
Hi,

We think we know the answer to this, but we?ll ask the question 
anyway.

We?re trying to backup a 10GB live running database 
?as-fast-as-we-possibly-can? without stopping updates coming in. The 
updates come every 2-3 mins, and write a chunk of data in. We can?t 
really stop the database updates, well we can but we don?t want to.

1. We had a quick look to see if we could copy the sqlite file over in 
the short interval between updates but sadly cp simply wasn?t fast 
enough. We get around 3GB copied before an update happens, which 
basically renders the cp useless.

2. If we use the command line sqlite  .dump >  
it works, but its very slow.

3. Using the Sqlite C API works but is also very slow.

4. We don?t have the option of an LVM snapshot as the file system is 
in a Container .

So is there any other method of doing a quick snapshot? Failing that, 
our solution will be to stop any updates for the duration of the cp 
command, and then restart the process afterwards. Its not the end of the 
world but it would have to be done out of normal working hours.

This is going to become a bigger problem for us as the database will 
only get bigger so any advice welcomed.

Thanks

Rob


[sqlite] Fastest way to backup/copy database?

2016-05-04 Thread Rob Willett
Thanks for the reply,

Yes Example 2 in https://www.sqlite.org/backup.html is what we are 
talking about. It was very slow to run for us.

Rob

On 4 May 2016, at 12:08, Dominique Devienne wrote:

> On Wed, May 4, 2016 at 12:44 PM, Rob Willett
>  wrote:
>> We?re trying to backup a 10GB live running database
> [...]
>
>> 2. If we use the command line sqlite  .dump > 
>>  it
>> works, but its very slow.
>
> That's going to SQL text.
> While .backup is page-based, and binary. But not incremental in the
> Shell I believe.
>
>> 3. Using the Sqlite C API works but is also very slow.
>
> Are you talking about https://www.sqlite.org/backup.html ?
> Because Example 2 is exactly your use case.
>
>> So is there any other method of doing a quick snapshot?
>
> Well, https://www.sqlite.org/backup.html is the only supported way 
> IMHO. --DD
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fastest way to backup/copy database?

2016-05-04 Thread Rob Willett
Dominque,

We put together a quick C program to try out the C API a few weeks ago, 
it worked but it was very slow, from memory not much different to the 
sqlite command line backup system. We put it on the back burner as it 
wasn?t anywhere near quick enough.

We hadn?t seen or found or even looked for RBU, when I read this I 
thought Realtime Backup Unit, I wasn?t even close :) Rats

I think that unless we can get the C API to back up in a time close to 
that of a cp, the easiest solution is to suspend updates for 10-15 mins 
out-of-hours and do a simple cp from there. Sometimes a change in 
workflow might be the easiest and simplest solution.

I know you have mentioned VFS shims further down the email trail but 
thats certainly a step too far for us.

Thanks

Rob

On 4 May 2016, at 12:22, Dominique Devienne wrote:

> On Wed, May 4, 2016 at 1:13 PM, Rob Willett
>  wrote:
>> Thanks for the reply,
>>
>> Yes Example 2 in https://www.sqlite.org/backup.html is what we are 
>> talking
>> about. It was very slow to run for us.
>
> Then maybe https://www.sqlite.org/rbu.html is your last change.
> Although I don't see how it could be faster than the Backup API.
>
> I think you should share more details of how you use the Backup API,
> so experts can provide advice on what to try to make it scale better. 
> --DD
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fastest way to backup/copy database?

2016-05-04 Thread Rob Willett
Clemens,

We have 8GB of memory which is the most our VPS provider allows. We?d 
like 32GB but its not an option for us. Our desktops have more than 
that, but the VPS provider is reasonably priced :)

We hadn?t considered the WAL mode, my conclusion is that a simple 
change to our workflow is actually simpler, we stop the database updates 
for 15 mins out of hours, cp and then restart. Its not ideal but we?re 
not running a nuclear power station or a 24/7 medical facility. Users 
*may* not get traffic updates for 15 mins at 03:00 in the morning. The 
world will keep spinning.

Rob

On 4 May 2016, at 12:58, Clemens Ladisch wrote:

> Rob Willett wrote:
>> We?re trying to backup a 10GB live running database 
>> ?as-fast-as-we-
>> possibly-can? without stopping updates coming in.
>
> How much memory do you have?  I guess you can't simply read the entire
> database file to force it into the file cache?
>
> In WAL mode, a writer does not block readers.  You have to decide
> whether you can live with its restrictions:
> http://www.sqlite.org/wal.html
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fastest way to backup/copy database?

2016-05-04 Thread Rob Willett
Simon,

Thanks for the reply, we were a little surprised at the UNIX cp speed as 
well. We investigated it with the provider in the past over other file 
system speed issues and it turned out that they rate limit the IO ops, 
so you can?t consume them all. Our provider runs many servers out of 
their data centre and they want to make sure that one company (us!) 
can?t monopolise all the resources. I can see their point of view and 
since we are not a critical company we choose to pay an appropriate rate 
for this. All systems have limitations one way or another. This is one 
of theirs and is acceptable given the price bracket. Now if we were 
paying ?1,000 ($1,500)/week we would have had a very different 
conversation :)

We?ll investigate the BEGIN IMMEDIATE and see what happens.

As you say though, a simple script that suspends a message queue whilst 
a cp takes place and then sim,ply turns the tap back on is very simple 
to code and to see working. I like easy, simple solutions because I?m 
simple at heart.

Best wishes,

Rob

On 4 May 2016, at 14:24, Simon Slavin wrote:

> On 4 May 2016, at 1:35pm, Rob Willett  
> wrote:
>
>> I think that unless we can get the C API to back up in a time close 
>> to that of a cp, the easiest solution is to suspend updates for 10-15 
>> mins out-of-hours and do a simple cp from there. Sometimes a change 
>> in workflow might be the easiest and simplest solution.
>
> This solution may be what you eventually come up with.  But I would 
> like to comment that I used a Unix platform to copy 15 Gigabytes 
> through FireWire from an external hard disk yesterday and that took 
> less than 10 minutes.  Simply duplicating a 10 Gigabyte file onto your 
> boot drive should take considerably less time.
>
> You may be able to use built-in SQLite mechanisms to suspend updates.  
> It might be worth using the SQLite shell tool to execute "BEGIN 
> IMMEDIATE" and see whether that does suspend operations.
>
> On the other hand there's a lot to be said for running a script at 3am 
> which quits the update program, takes the snapshot, then starts them 
> up again.  Not only will this perform the task needed but it would 
> also serve to 'reset' those updating programs in case they have a slow 
> resource leak or some other long-term bug.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fastest way to backup/copy database?

2016-05-04 Thread Rob Willett
Ryan,

Ah! The penny drops, we didn?t realise that with the backup API. 
That explains a great deal. We must have missed that in the docs. Blast.

We?ve looked around for other providers in Europe and the cost 
differences are very high. We need to be in the EU for various data 
protection reasons. Until now we haven?t had any issues as we don?t 
move a significant amount of data around in a very short period of time, 
so the rate limited IO has not been a problem.

One of our options is to do what you suggest with a second database 
server and run them hot/warm. We had already thought of that but not got 
around to it as the setting up time is quite high (we need a bank of 
servers, feeding things from one server to another), but our immediate 
issue is simply copying the 10GB database. The downside of the second 
server is moving 10GB data files around the internet afterwards back to 
the failed server. Rebuilding from scratch is a pain as it takes around 
2-3 weeks to rebuild the database from scratch as we have to process 
every file again (circa 200,000) in order and each file takes around 4-8 
secs to run.

I think the backup solution is the tried and tested Keep-It-Simple shell 
script. We pause the queue upstream which stops the update process, do a 
cp and then restart the queue again. All of this is doable in shell 
script.

Rob

On 4 May 2016, at 18:22, R Smith wrote:

> On 2016/05/04 2:35 PM, Rob Willett wrote:
>> Dominque,
>>
>> We put together a quick C program to try out the C API a few weeks 
>> ago, it worked but it was very slow, from memory not much different 
>> to the sqlite command line backup system. We put it on the back 
>> burner as it wasn?t anywhere near quick enough.
>
> You do realize that the backup API restarts the backup once the 
> database content changes, right? I'm sure at the rates you describe 
> and update frequency, that backup would never finish. The backup API 
> is quite fast if your destination file is on a not-too-slow drive, but 
> you will have to stop the incoming data to allow it to finish.
>
> As an aside - you need a better provider, but that said, and if it was 
> me, I would get two sites up from two different providers, one live, 
> one stand-by, both the cheap sort so costs stay minimal (usually two 
> cheap ones are much cheaper than the next level beefy one). Feed all 
> updates/inserts to both sites - one then is the backup of the other, 
> not only data-wise, but also can easily be switched to by simple DNS 
> redirect should the first site/provider go down for any reason.  The 
> second site can easily be interfered with / copied from / backed up / 
> whatever without affecting the service to the public.
>
> I only do this with somewhat critical sites, but your use-case sounds 
> like it might benefit from it. My second choice would be to simply 
> stop operations at a best-case time-slot while the backup / copy 
> completes.
>
> Cheers,
> Ryan
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fastest way to backup/copy database?

2016-05-04 Thread Rob Willett
Scott,

Thats an interesting idea. Is there an option in SQLite to do this for 
us, or do we have to write a small shim in our app?

I like the idea of this as its simple and elegant.

Rob

On 4 May 2016, at 16:51, Scott Robison wrote:

>>> This is going to become a bigger problem for us as the database will
>>> only get bigger so any advice welcomed.
>
> Perhaps, rather than backing up the live data, you create an append 
> only
> log of each and every query you send to the database. Should you need 
> to
> restore, you replay the log of statements. Or at the appointed backup 
> time,
> you replay the day's log of statements into another database. No need 
> to
> ever take the live database offline at the cost of slightly longer 
> running
> commands during the day to handle the append operation.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fastest way to backup/copy database?

2016-05-04 Thread Rob Willett
Scott,

OK, We can see how to do this (I think). Our app is written in Perl and 
we?d just need to capture the command we write down. The only issue I 
can think of is the prepare statement and making sure we capture the 
right SQL command. W

We?ll dig into it and have a look,

Thanks for taking the time to reply.

Rob

On 4 May 2016, at 18:52, Scott Robison wrote:

> On Wed, May 4, 2016 at 11:47 AM, Rob Willett 
> 
> wrote:
>
>> Scott,
>>
>> Thats an interesting idea. Is there an option in SQLite to do this 
>> for us,
>> or do we have to write a small shim in our app?
>>
>> I like the idea of this as its simple and elegant.
>
>
> It would require a little extra work on your part. Nothing built into 
> the
> system that would accomplish this directly. However, I've done similar
> things and they don't involve a ton of overhead. You could use another
> SQLite database as the append only log, or a simple text file.
>
> I'm not aware of a free lunch solution, sadly.
>
>
>>
>>
>> Rob
>>
>>
>> On 4 May 2016, at 16:51, Scott Robison wrote:
>>
>> This is going to become a bigger problem for us as the database will
>>>>> only get bigger so any advice welcomed.
>>>>>
>>>>
>>> Perhaps, rather than backing up the live data, you create an append 
>>> only
>>> log of each and every query you send to the database. Should you 
>>> need to
>>> restore, you replay the log of statements. Or at the appointed 
>>> backup
>>> time,
>>> you replay the day's log of statements into another database. No 
>>> need to
>>> ever take the live database offline at the cost of slightly longer 
>>> running
>>> commands during the day to handle the append operation.
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users at mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
>
> -- 
> Scott Robison
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fastest way to backup/copy database?

2016-05-04 Thread Rob Willett
Dan,

Thats NOT the case for us so that explains why things are slow. Mmm? 
as I recall we never did get a backup to finish?. Now we know why :)

Rob

On 4 May 2016, at 18:53, Dan Kennedy wrote:

> On 05/05/2016 12:45 AM, Rob Willett wrote:
>> Ryan,
>>
>> Ah! The penny drops, we didn?t realise that with the backup 
>> API. That explains a great deal. We must have missed that in the 
>> docs. Blast.
>
>
> There is one exception to this:
>
> If the database is written to via the same database handle that is 
> being used as the source db by the backup API, then the backup is not 
> restarted. In this case if any pages that have already been 
> transferred to the backup db are modified the new versions are written 
> into the backup db at the same time as the source is updated.
>
> Dan.
>
>
>
>
>
>>
>> We?ve looked around for other providers in Europe and the cost 
>> differences are very high. We need to be in the EU for various data 
>> protection reasons. Until now we haven?t had any issues as we 
>> don?t move a significant amount of data around in a very short 
>> period of time, so the rate limited IO has not been a problem.
>>
>> One of our options is to do what you suggest with a second database 
>> server and run them hot/warm. We had already thought of that but not 
>> got around to it as the setting up time is quite high (we need a bank 
>> of servers, feeding things from one server to another), but our 
>> immediate issue is simply copying the 10GB database. The downside of 
>> the second server is moving 10GB data files around the internet 
>> afterwards back to the failed server. Rebuilding from scratch is a 
>> pain as it takes around 2-3 weeks to rebuild the database from 
>> scratch as we have to process every file again (circa 200,000) in 
>> order and each file takes around 4-8 secs to run.
>>
>> I think the backup solution is the tried and tested Keep-It-Simple 
>> shell script. We pause the queue upstream which stops the update 
>> process, do a cp and then restart the queue again. All of this is 
>> doable in shell script.
>>
>> Rob
>>
>> On 4 May 2016, at 18:22, R Smith wrote:
>>
>>> On 2016/05/04 2:35 PM, Rob Willett wrote:
>>>> Dominque,
>>>>
>>>> We put together a quick C program to try out the C API a few weeks 
>>>> ago, it worked but it was very slow, from memory not much different 
>>>> to the sqlite command line backup system. We put it on the back 
>>>> burner as it wasn?t anywhere near quick enough.
>>>
>>> You do realize that the backup API restarts the backup once the 
>>> database content changes, right? I'm sure at the rates you describe 
>>> and update frequency, that backup would never finish. The backup API 
>>> is quite fast if your destination file is on a not-too-slow drive, 
>>> but you will have to stop the incoming data to allow it to finish.
>>>
>>> As an aside - you need a better provider, but that said, and if it 
>>> was me, I would get two sites up from two different providers, one 
>>> live, one stand-by, both the cheap sort so costs stay minimal 
>>> (usually two cheap ones are much cheaper than the next level beefy 
>>> one). Feed all updates/inserts to both sites - one then is the 
>>> backup of the other, not only data-wise, but also can easily be 
>>> switched to by simple DNS redirect should the first site/provider go 
>>> down for any reason.  The second site can easily be interfered with 
>>> / copied from / backed up / whatever without affecting the service 
>>> to the public.
>>>
>>> I only do this with somewhat critical sites, but your use-case 
>>> sounds like it might benefit from it. My second choice would be to 
>>> simply stop operations at a best-case time-slot while the backup / 
>>> copy completes.
>>>
>>> Cheers,
>>> Ryan
>>>
>>>
>>>
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users at mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fastest way to backup/copy database?

2016-05-05 Thread Rob Willett
Hi,

We did look at this before, and discarded the idea but I can?t 
remember why.

I?ve just looked again and seen the ?in-place option which I 
wasn?t aware of. That *might* help and be an interesting solution. We 
know we can make cp wrk, though with a little downtime. We?ll 
investigate rsync ?in-place on a closed (and definitely not working 
database), see what happens and report back. It should be easy to test.

Thanks for the information

Rob

On 5 May 2016, at 16:42, J Decker wrote:

> Instead of cp, rsync might help it is able to send delta changes.
>
> On Wed, May 4, 2016 at 10:55 AM, Rob Willett
>  wrote:
>> Scott,
>>
>> OK, We can see how to do this (I think). Our app is written in Perl 
>> and we?d
>> just need to capture the command we write down. The only issue I can 
>> think
>> of is the prepare statement and making sure we capture the right SQL
>> command. W
>>
>> We?ll dig into it and have a look,
>>
>> Thanks for taking the time to reply.
>>
>> Rob
>>
>>
>> On 4 May 2016, at 18:52, Scott Robison wrote:
>>
>>> On Wed, May 4, 2016 at 11:47 AM, Rob Willett
>>> 
>>> wrote:
>>>
>>>> Scott,
>>>>
>>>> Thats an interesting idea. Is there an option in SQLite to do this 
>>>> for
>>>> us,
>>>> or do we have to write a small shim in our app?
>>>>
>>>> I like the idea of this as its simple and elegant.
>>>
>>>
>>>
>>> It would require a little extra work on your part. Nothing built 
>>> into the
>>> system that would accomplish this directly. However, I've done 
>>> similar
>>> things and they don't involve a ton of overhead. You could use 
>>> another
>>> SQLite database as the append only log, or a simple text file.
>>>
>>> I'm not aware of a free lunch solution, sadly.
>>>
>>>
>>>>
>>>>
>>>> Rob
>>>>
>>>>
>>>> On 4 May 2016, at 16:51, Scott Robison wrote:
>>>>
>>>> This is going to become a bigger problem for us as the database 
>>>> will
>>>>>>>
>>>>>>> only get bigger so any advice welcomed.
>>>>>>>
>>>>>>
>>>>> Perhaps, rather than backing up the live data, you create an 
>>>>> append only
>>>>> log of each and every query you send to the database. Should you 
>>>>> need to
>>>>> restore, you replay the log of statements. Or at the appointed 
>>>>> backup
>>>>> time,
>>>>> you replay the day's log of statements into another database. No 
>>>>> need to
>>>>> ever take the live database offline at the cost of slightly longer
>>>>> running
>>>>> commands during the day to handle the append operation.
>>>>> ___
>>>>> sqlite-users mailing list
>>>>> sqlite-users at mailinglists.sqlite.org
>>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>>>
>>>> ___
>>>> sqlite-users mailing list
>>>> sqlite-users at mailinglists.sqlite.org
>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>>
>>>
>>>
>>>
>>> --
>>> Scott Robison
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users at mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fastest way to backup/copy database?

2016-05-05 Thread Rob Willett
Mmmm?. Initial tests are not conclusive, it does look as if using 
rsync ?-inplace does speed things up but nowhere near as much as we 
anticipated.

Testing consisted of a 6GB test database which is a backup copy from a 
few months ago.

We timed copying the database over using cp

# time cp tfl.sqlite.backup t1

real2m30.528s
user0m0.052s
sys 0m10.403s

We then edited the database and deleted the contents of a table that 
would have changed over the lifetime of the database. We will freely 
admit we have no idea where this table is in the database file and have 
no intention of finding out. Thats SQLites problem :) The file had 
65,000 lines or so and would have been updated regularly at the start of 
the database and over the last few months would have had small daily 
updates but they would be getting fewer and fewer.

We then did

# time rsync -av --inplace --no-whole-file tfl.sqlite.backup test_db
sending incremental file list
tfl.sqlite.backup



sent 564,081,269 bytes  received 623,255 bytes  4,805,995.95 bytes/sec
total size is 6,067,933,184  speedup is 10.75

real1m57.689s
user2m4.947s
sys 0m8.952s

The ?no-while-file is apparently needed as well as ?inplace.

We checked the md5sums at the end

# md5sum tfl.sqlite.backup test_db t1
b5bd91cc9b49ee1f54a8a2d013005586  tfl.sqlite.backup
b5bd91cc9b49ee1f54a8a2d013005586  test_db
b5bd91cc9b49ee1f54a8a2d013005586  t1

So we can see that the file integrity has been preserved which is what 
we expect but its always good to check. Yes we know that md5sum is not 
perfect but for this purpose its fine. However the rsync vs cp time is 
not as big a difference as we expected. Its still taking 80% of the time 
of the cp.

Our first thoughts are that 80% is still better than 100% so thats good, 
our second thoughts are that we need to test this on a more 
representative set of changes to the database. Deleting the contents of 
a table that has been changed over the lifetime of the database may be 
the worst-case scenario as it might well touch many, many pages in the 
database. We are certainly not expert enough to comment on this 
assumption of how rows are distributed in the database and if anybody 
would like to chip in, please do so.

Our intention now is to take a more recent and representative database, 
run a days and a weeks set of database transactions through it which is 
easy enough, though takes time, and see how that compares.

Thanks for the suggestion,

Rob,

On 5 May 2016, at 16:42, J Decker wrote:

> Instead of cp, rsync might help it is able to send delta changes.
>
> On Wed, May 4, 2016 at 10:55 AM, Rob Willett
>  wrote:
>> Scott,
>>
>> OK, We can see how to do this (I think). Our app is written in Perl 
>> and we?d
>> just need to capture the command we write down. The only issue I can 
>> think
>> of is the prepare statement and making sure we capture the right SQL
>> command. W
>>
>> We?ll dig into it and have a look,
>>
>> Thanks for taking the time to reply.
>>
>> Rob
>>
>>
>> On 4 May 2016, at 18:52, Scott Robison wrote:
>>
>>> On Wed, May 4, 2016 at 11:47 AM, Rob Willett
>>> 
>>> wrote:
>>>
>>>> Scott,
>>>>
>>>> Thats an interesting idea. Is there an option in SQLite to do this 
>>>> for
>>>> us,
>>>> or do we have to write a small shim in our app?
>>>>
>>>> I like the idea of this as its simple and elegant.
>>>
>>>
>>>
>>> It would require a little extra work on your part. Nothing built 
>>> into the
>>> system that would accomplish this directly. However, I've done 
>>> similar
>>> things and they don't involve a ton of overhead. You could use 
>>> another
>>> SQLite database as the append only log, or a simple text file.
>>>
>>> I'm not aware of a free lunch solution, sadly.
>>>
>>>
>>>>
>>>>
>>>> Rob
>>>>
>>>>
>>>> On 4 May 2016, at 16:51, Scott Robison wrote:
>>>>
>>>> This is going to become a bigger problem for us as the database 
>>>> will
>>>>>>>
>>>>>>> only get bigger so any advice welcomed.
>>>>>>>
>>>>>>
>>>>> Perhaps, rather than backing up the live data, you create an 
>>>>> append only
>>>>> log of each and every query you send to the database. Should you 
>>>>> need to
>>>>> restore, you replay the log of statements. Or at the appointed 
>>>>> backup
>>>>> time,
>>>>>

[sqlite] Fastest way to backup/copy database?

2016-05-05 Thread Rob Willett
Yes, realised after I sent the e-mail that I hadn?t said we had 
created the destination file in which to run the command. You cannot 
actually run rsync ?inlace if the destination file doesn?t exist, we 
found that out earlier ;) So it was a reasonable test though only one 
test. We need to do more to see whats going on.

Top marks for spotting our deliberate omission :)

Rob

On 5 May 2016, at 17:42, J Decker wrote:

> On Thu, May 5, 2016 at 9:38 AM, Rob Willett
>  wrote:
>> Mmmm?. Initial tests are not conclusive, it does look as if using 
>> rsync
>> ?-inplace does speed things up but nowhere near as much as we 
>> anticipated.
>>
>> Testing consisted of a 6GB test database which is a backup copy from 
>> a few
>> months ago.
>>
>> We timed copying the database over using cp
>>
>> # time cp tfl.sqlite.backup t1
>>
>> real2m30.528s
>> user0m0.052s
>> sys 0m10.403s
>>
>> We then edited the database and deleted the contents of a table that 
>> would
>> have changed over the lifetime of the database. We will freely admit 
>> we have
>> no idea where this table is in the database file and have no 
>> intention of
>> finding out. Thats SQLites problem :) The file had 65,000 lines or so 
>> and
>> would have been updated regularly at the start of the database and 
>> over the
>> last few months would have had small daily updates but they would be 
>> getting
>> fewer and fewer.
>>
>> We then did
>>
>> # time rsync -av --inplace --no-whole-file tfl.sqlite.backup test_db
>> sending incremental file list
>> tfl.sqlite.backup
>>
>>
>
> was the file already in place with sending changes?  Or is it a full
> copy of the whole thing all the time?   if the later... well cp is
> gonna really be as good as it gets
>
>>
>> sent 564,081,269 bytes  received 623,255 bytes  4,805,995.95 
>> bytes/sec
>> total size is 6,067,933,184  speedup is 10.75
>>
>> real1m57.689s
>> user2m4.947s
>> sys 0m8.952s
>>
>> The ?no-while-file is apparently needed as well as ?inplace.
>>
>> We checked the md5sums at the end
>>
>> # md5sum tfl.sqlite.backup test_db t1
>> b5bd91cc9b49ee1f54a8a2d013005586  tfl.sqlite.backup
>> b5bd91cc9b49ee1f54a8a2d013005586  test_db
>> b5bd91cc9b49ee1f54a8a2d013005586  t1
>>
>> So we can see that the file integrity has been preserved which is 
>> what we
>> expect but its always good to check. Yes we know that md5sum is not 
>> perfect
>> but for this purpose its fine. However the rsync vs cp time is not as 
>> big a
>> difference as we expected. Its still taking 80% of the time of the 
>> cp.
>>
>> Our first thoughts are that 80% is still better than 100% so thats 
>> good, our
>> second thoughts are that we need to test this on a more 
>> representative set
>> of changes to the database. Deleting the contents of a table that has 
>> been
>> changed over the lifetime of the database may be the worst-case 
>> scenario as
>> it might well touch many, many pages in the database. We are 
>> certainly not
>> expert enough to comment on this assumption of how rows are 
>> distributed in
>> the database and if anybody would like to chip in, please do so.
>>
>> Our intention now is to take a more recent and representative 
>> database, run
>> a days and a weeks set of database transactions through it which is 
>> easy
>> enough, though takes time, and see how that compares.
>>
>> Thanks for the suggestion,
>>
>> Rob,
>>
>> On 5 May 2016, at 16:42, J Decker wrote:
>>
>>> Instead of cp, rsync might help it is able to send delta changes.
>>>
>>> On Wed, May 4, 2016 at 10:55 AM, Rob Willett
>>>  wrote:
>>>>
>>>> Scott,
>>>>
>>>> OK, We can see how to do this (I think). Our app is written in Perl 
>>>> and
>>>> we?d
>>>> just need to capture the command we write down. The only issue I 
>>>> can
>>>> think
>>>> of is the prepare statement and making sure we capture the right 
>>>> SQL
>>>> command. W
>>>>
>>>> We?ll dig into it and have a look,
>>>>
>>>> Thanks for taking the time to reply.
>>>>
>>>> Rob
>>>>
>>>>
>>>> On 4 May 2016, at 18:52, Scott Robison wrote:
>>>>
>>>>> On Wed, May 4, 2016 a

[sqlite] General question on 'style'

2013-06-19 Thread Rob Willett
Hi,

My first question as a lurker and read of these groups. Hopefully it's not too 
stupid :)

I've been working through using SQLite on my Mac and have a question on style 
and the way to use SQLite databases.

My application has two parts, a client/PHP side to collect requests from a user 
and a C server side program that manipulates large quantities of data, anything 
from 10's bytes to 10's of Terabytes. When its manipulating TB's of data (none 
of which is in a SQL database) it will run for hours which is absolutely fine 
and expected. Just to be clear I do not have a performance issue with SQLite at 
all.

Since the application has to work across numerous operating systems, I'm 
treating SQLite as the main way to handle information, I'm not using lock 
files, semaphores or any other files as these are all too OS specific. I hold 
requests to do work in SQLite in a queue table, I pass information back to the 
client side via a SQL database. None of this is too difficult and I take my 
inspiration from the manual which says that you can use SQLite almost as a 
replacement for a file in some cases. I like the idea of a simple interface 
like this and to be honest it seems to work well.

Now to the hub (excuse the really bad pun) of my questions, within my service 
side application, I'm finding that I'm constantly opening and closing the same 
SQLite database, inserting data, reading data, updating date over many hours. 
The total number of interactions is quite small, perhaps a few hundred over the 
course of the application run which could take hours to run. 

I'm not bothered over the file i/o per se. nor about the amount of data written 
to the database as its very small but wanted to validate that this approach of 
opening and closing the database as close to the necessary transaction as 
possible is an appropriate 'style'. As I do this I'll open and close the SQLite 
database probably a few hundred times in the course of a run.

My feeling is that opening and keeping open the SQLite database for any time 
longer than necessary at all is bad, since my PHP might want to update things.

Comments welcomed.

Thanks,

Rob



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] General question on 'style' Simon/Stephen.

2013-06-19 Thread Rob Willett
Stephen, Simon,

I'm not worried about the CPU load at all, it was purely a question of style. 
Stephan has talked about keeping the connections open for the lifetime of a 
desktop connection, Simon has talked about closing and opening connections 
many, many times during the course of a transaction. My model is more towards 
Simon's. 

Part of this model is due to the backend process taking hours to run. I wanted 
the web user to kick this off and come back when it's finished (possible 
tomorrow). It didn't feel write to keep the database connection open as other 
users would/could/might want to add stuff to the queue to be run. 

The fact I have had two replies, and thank you both for writing back, but have 
very different viewpoints and experiences makes me think that my model isn't 
bad for my particular use case. Clearly other people will have different 
experiences and in different situations I would do it differently.

On another note, I have been impressed with the quality of replies on this 
list. I listen into other lists on other technologies and the quality and 
standard of 'stuff' here is very high. Not seen very many "My database don't 
work, fix it for me" type mails here which is great.  Some of the detail people 
go into is wonderful and I've learnt an awful lot just reading replies.

Best wishes,

Rob.

On 19 Jun 2013, at 15:35, Simon Slavin <slav...@bigfraud.org> wrote:

> 
> On 19 Jun 2013, at 3:15pm, Rob Willett <rob.sql...@robertwillett.com> wrote:
> 
>> Now to the hub (excuse the really bad pun) of my questions, within my 
>> service side application, I'm finding that I'm constantly opening and 
>> closing the same SQLite database, inserting data, reading data, updating 
>> date over many hours. The total number of interactions is quite small, 
>> perhaps a few hundred over the course of the application run which could 
>> take hours to run. 
>> 
>> I'm not bothered over the file i/o per se. nor about the amount of data 
>> written to the database as its very small but wanted to validate that this 
>> approach of opening and closing the database as close to the necessary 
>> transaction as possible is an appropriate 'style'. As I do this I'll open 
>> and close the SQLite database probably a few hundred times in the course of 
>> a run.
>> 
>> My feeling is that opening and keeping open the SQLite database for any time 
>> longer than necessary at all is bad, since my PHP might want to update 
>> things.
> 
> Nice description.
> 
> In some of my uses I have a setup similar to yours in many ways.  The back 
> end to my web-facing services is a small PHP file on my web server which acts 
> as a SQLite3 shim.  Requests are passed to it in JSON and it answers in JSON. 
>  Some web-facing systems (in JavaScript) and other applications (which 
> generate their own HTTP POST requests) pass it individual SQLite commands and 
> it opens the database, does the command, then closes the database again.  
> Because ... well, that’s all it can really do given the structure of what PHP 
> on a browser can do.
> 
> So yes, my logs can show my PHP shim being called hundreds of times a minute. 
>  But it does the job fine: I’ve never traced down any fault down to it being 
> called so often.  And, of course, the server has Apache, PHP, the shim file 
> and many of my SQLite databases permanently in cache, so the whole thing runs 
> pretty quickly.
> 
> As you write, it does seem to be terribly inefficient.  But we’re bound by 
> the nature of how HTTP requests work.  You can’t really improve on it without 
> implementing something that serves SQLite over TCP/IP.  Or maybe implement it 
> as an Apache module.  And it’s so much more convenient to be able to write my 
> code in PHP and know it will run on any Apache platform and can be maintained 
> (should I die) by anyone familiar with PHP programming and HTTP requests.
> 
> I’ll worry about it more when my server CPU starts heading towards 100%.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] General question on 'style' Simon/Stephen.

2013-06-19 Thread Rob Willett
Hi,

I wasn't overaly bothered about performance, since my backend jobs take hours 
to run and processing TB's of data, the overhead of opening and closing a few 
hundred database connections in the grand scheme of things is pretty low. It 
was a general style and usage question. I'm a C hacker by preference and have 
my own style of how I use memory, how I break things down into functions and 
use pointers and structures. I'm not a SQLite expert (pretty much a novice in 
fact) and wanted to get practises in from the beginning rather than trying to 
embed them later.

My backend process might run for an hour or two and only write out a few 
hundred bits of data, so I'll carry on opening and closing it as needed unless 
someone can say there's good practise not to do it. I like the idea of 
localised database transactions within the code, it's a bit like malloc and 
free. Keep it short and sweet if you can.

Thanks again for the sensible and constructive replies.

Rob.

On 19 Jun 2013, at 20:15, RSmith <rsm...@rsweb.co.za> wrote:

> Actually, having benched this specific thing before, I can tell you that 
> there is a performance penalty opeining and closing DBs (as opposed to just 
> keeping a connection open) - but this is not news, what might be news is that 
> the performance penalty is much much smaller than I anticipated and in fact 
> probably negligible for any sort of web application - so my advice would be 
> to go with what suits your feelings best and the rest of your app best.
> One comment: I see you wirte "...keep the database connection open as other 
> users would/could/might want to add stuff to the queue..." - AFAIK there is 
> zero impact on who else wants to write to your database, any other connection 
> can still write to it (just not at the exact same time, but even then SQLite 
> will handle the timing/locking/writing very well). Only worry about this if 
> your connection locks up a table for a long time - but my guess is that 
> either it doesn't, or there's nothing you can do about it, the data MUST be 
> updated etc.
> 
> Btw, to all, I concur very much with Rob as to the quality of replies on this 
> forum - many thanks to all!
> 
> 
> On 2013/06/19 16:45, Rob Willett wrote:
>> Stephen, Simon,
>> 
>> I'm not worried about the CPU load at all, it was purely a question of 
>> style. Stephan has talked about keeping the connections open for the 
>> lifetime of a desktop connection, Simon has talked about closing and opening 
>> connections many, many times during the course of a transaction. My model is 
>> more towards Simon's.
>> 
>> Part of this model is due to the backend process taking hours to run. I 
>> wanted the web user to kick this off and come back when it's finished 
>> (possible tomorrow). It didn't feel write to keep the database connection 
>> open as other users would/could/might want to add stuff to the queue to be 
>> run.
>> 
>> The fact I have had two replies, and thank you both for writing back, but 
>> have very different viewpoints and experiences makes me think that my model 
>> isn't bad for my particular use case. Clearly other people will have 
>> different experiences and in different situations I would do it differently.
>> 
>> On another note, I have been impressed with the quality of replies on this 
>> list. I listen into other lists on other technologies and the quality and 
>> standard of 'stuff' here is very high. Not seen very many "My database don't 
>> work, fix it for me" type mails here which is great.  Some of the detail 
>> people go into is wonderful and I've learnt an awful lot just reading 
>> replies.
>> 
>> Best wishes,
>> 
>> Rob.
>> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] GUI for SQLite

2013-06-26 Thread Rob Willett
Well I just simply downloaded my free copy from the Apple App store. I entered 
absolutely zero information (apart from Apple knowing everything about men 
anyway). Can't say I've been plagued by spam from other vendors who have 
provided free downloads but there's a first time for everything.

It looks a very interesting product, it seems to do an awful lot (to my wholly 
untutored ignorant eye) and hasn't crashed in the last ten minutes. I like the 
ER diagrams you can make with it.

I'm not complaining and if I like it enough I might even buy the Pro version if 
I need the functionality. 

BTW just for forms sake, I have zero connections with this company and until I 
saw this e-mail didn't even know this product existed.

Rob.

On 26 Jun 2013, at 16:42, RSmith  wrote:

> Well you need to "buy" a $0.00 registration key, for which you need to fill 
> in an inordinate amount of personal and contact information (just like buying 
> other things online). The sort of event that promises Inbox spam aplenty and 
> is too much hassle for most casual onlookers - but it does promise some sort 
> of interoperability between SQLite and MySQL etc.
> (Haven't used it, so just mentioning what it claims).
> 
> 
> On 2013/06/26 17:34, Paolo Bolzoni wrote:
>> So... no? It is gratis, but not open. thanks.
>> 
>> On Wed, Jun 26, 2013 at 5:28 PM, Michael Black  wrote:
>>> Free doesn't necessarily mean open source
>>> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite on ESXI hypervisor

2013-07-23 Thread Rob Willett
Hi,

Apologies for jumping in here.

I was interested in your question as I have some development going on which 
will eventually go into production on a VMWare ESXI server, which could be 
either Linux or Windows.

When you say you are running sqlite3 on esxi5 do you mean that you are running 
sqlite3 on a host system on top of Esxi 5 (e.g. Linux or Windows Server) or 
actually in the hypervisor itself? 

I know you say "esxi 5.0 (VMware hypervisor)" in your original email but wanted 
to check the details. 

Thanks,

Rob.

On 23 Jul 2013, at 15:04, Clemens Ladisch  wrote:

> 1 1 wrote:
>> I've tried to run the latest version of sqlite3 on esxi 5.0 (VMware
>> hypervisor), but unsuccessfully. Strace shows "fcntl function not
>> implemended".
> 
> Apparently, VMFS does not implement file locking.
> 
> Try using the unix-dotfile VFS, or unix-none if you can guarantee that
> the database will never be accessed concurrently.
> 
> 
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Multiple autoinc columns?

2013-07-24 Thread Rob Willett




On 24 July 2013 05:34:43 Kai Peters  wrote:

Is it possible to have two (or more) autoincrement columns per table?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] logical to use sqlite to replace Berkeley DB ?

2013-09-14 Thread Rob Willett
Hi,

I cannot comment on the speed of Berkeley DB, I am somewhat familiar with 
Oracles approach to licensing terms having spent somewhat more than £10M with 
them over the last five years :) You can accuse Ellison of many things (and I 
have. all of them unrepeatable) but being cheap isn't one of them, anyway I 
digress.

Point 1. 

Oracle can only change the licensing of Berkely DB from a certain point 
forward. Previous versions of licenses should still be perfectly applicable and 
useable. I am not a lawyer so check it out. This may work for you.

Point 2.

You don't say what your use case is for Berkeley or for SQLite.  if you are 
after pure performance then it appears that Berkeley on the surface may be 
appropriate but do you actually need the speed? e.g. I have an application that 
uses SQLite3, I have no issue if the team make it 10x quicker, but to be honest 
for my needs, they could make it 100x slower and I would not notice a great 
deal of degradation in my application as I use SQLite for simple use, basically 
a cross platform simple filing system. 99.99% (and I've measured it) lies in 
the data processing rather than the database access and management.

Point 3.

As Howard has said look around and check out his stuff as well. It never hurts 
to have a different view on things and OpenLDAP LMDB may be a perfect fit for 
you. 

Just my 2p worth.

Rob.

On 14 Sep 2013, at 03:16, Howard Chu  wrote:

> Patrick wrote:
>> Hi Everyone
>> 
>> After Oracle acquired Berkeley DB they changed the license. If people
>> don't pay a significant licensing fee, it can now only be used for GPL code.
>> 
>> I don't know Berkeley DB very well but I do know a moderate amount of
>> Sqlite.
>> 
>> I want to tinker with a compiler that uses DB, I was thinking about
>> ripping it out and replacing it with Sqlite. Does this make sense?
>> 
>> I know they are both zero configuration embedded DBs but DB is a
>> key-value based one and I am assuming lighter, is this true? Any idea of
>> how close they would be in terms of memory use and execution speed?
> 
> BDB is much faster than SQLite, yes. In fact Oracle supplies a port of SQLite 
> that uses BDB as the underlying Btree engine instead of SQLite's native code, 
> and there's a significant performance gain.
> 
> If you have an app that is comfortably using the key-value API of BDB it 
> would introduce major inefficiencies to convert it to using SQL. So no, this 
> doesn't seem like a logical action to take.
> 
> If you're using BDB and want to switch off it because of the license issue, 
> try OpenLDAP LMDB instead. No license hassles, and also several times smaller 
> and faster than BDB.
> 
> -- 
>  -- Howard Chu
>  CTO, Symas Corp.   http://www.symas.com
>  Director, Highland Sun http://highlandsun.com/hyc/
>  Chief Architect, OpenLDAP  http://www.openldap.org/project/
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Pattern for using sqlite3_busy_timeout()

2013-10-15 Thread Rob Willett
Hi,

As I've learnt more about how to use SQLite (mainly from looking at this 
mailing list), I've realised I've coded somethings in a  less than an 
optimal manner .

One of the things I've realised is that SQLite has a timing system built into 
accessing the database, so that if things are busy it will back off for a while 
and then retry for a certain amount of milliseconds (microseconds?) and then 
fail with SQLITE_IOERR_BLOCKED or SQLITE_BUSY.

Of course I realised this after I'd written my own system to do exactly the 
same thing. 

I'm not a big fan of reinventing the wheel and, to be honest, suspect (know?) 
that the writers of SQLite understand their system far, far better than I can, 
so want to do-the-right-thing and use sqlite3_busy_timeout as its supposed to 
be used rather than me mimicking it badly.

I've read through the on-line documentation and can see more or less how it 
works, but I want to check the pattern usage of this to see if it follows best 
practise, or failing that, somebody to comment on if this looks appropriate. 
Just to be clear I'm not asking anybody to write my code for me, and I have 
looked through the manual and as many examples as I can find of this function 
to see how its used. 

Here's how I used to do things using the code I used to close a database. This 
is a simple code example, the code I have for executing SQL is far longer but I 
don't feel it offers anything more to the discussion.

int DB_CloseDatabase(DB_p db)
{
int rc;
int do_loop = 1;
int timer_interval = SQL_TIMER_RETRY_INTERVAL;
int no_attempts = SQL_MAX_NO_ATTEMPTS;

while (do_loop && no_attempts > 0)
{
rc = sqlite3_close(db);

switch (rc)
{
case SQLITE_OK:
do_loop = 0;
break;
case SQLITE_BUSY:
if (debug) printf("1. Sleeping for %d\n" , timer_interval);

USleep(timer_interval);
timer_interval += timer_interval;
no_attempts--;
break;
default:
{
char error_string[EXCEPTION_MAX_STRING];
Exception_t exception;

SetException(exception , rc , 
DB_ConstructErrorString(error_string , EXCEPTION_MAX_STRING , "Close 
Database%s:'" , (char*) sqlite3_errmsg(db) , "Close Database1));
Throw(exception);
break;
}
}
}

if (no_attempts <= 0)
{
char error_string[EXCEPTION_MAX_STRING];
Exception_t exception;

SetException(exception , rc , DB_ConstructErrorString(error_string , 
EXCEPTION_MAX_STRING , "Close Database%s:'" , (char*) sqlite3_errmsg(db) , 
"Close Database2"));
Throw(exception);
}

return 0;
}

My intention with this is to try and close the database, if it fails the first 
time due to SQLITE_BUSY being returned, then I would sleep for a certain amount 
of time determined by timer_interval (in uSecs), I would then double the 
time_interval so that the next time it would back off for even longer, try 
again for SQL_MAX_NO_ATTEMPTS  and if that all failed, I would eventually drop 
into an Exception handling routine. This seems to work OK and I have managed to 
simulate a locked database, I can see the time_interval being incremented.

The logic in this is based on Ethernet packet handling (or how I remember it 
used to) when packets collided (without the randomisation of the back off).

Here's my rewrite of the simple function above:

int DB_CloseDatabase(DB_p db)
{
int rc;
int do_loop = 1;
int no_attempts = SQL_MAX_NO_ATTEMPTS;

sqlite3_busy_timeout(db , SQL_TIMER_RETRY_INTERVAL);

while (do_loop && no_attempts > 0)
{
rc = sqlite3_close(db);

switch (rc)
{
case SQLITE_OK:
do_loop = 0;
break;
case SQLITE_IOERR_BLOCKED:
case SQLITE_BUSY:
no_attempts--;
break;
default:
{
char error_string[EXCEPTION_MAX_STRING];
Exception_t exception;

// Reset the database timeout to 0
sqlite3_busy_timeout(db , 0);
SetException(exception , rc , 
DB_ConstructErrorString(error_string , EXCEPTION_MAX_STRING , "Close 
Database%s:'" , (char*) sqlite3_errmsg(db) , "Close Database"));
Throw(exception);
break;
}
}
}

// Reset the database timeout to 0
sqlite3_busy_timeout(db , 0);

if (no_attempts <= 0)
{
char error_string[EXCEPTION_MAX_STRING];
Exception_t exception;

SetException(exception , rc , DB_ConstructErrorString(error_string , 
EXCEPTION_MAX_STRING , "Close Database%s:'" , (char*) 

Re: [sqlite] Locking databases - Possibly (probably?) a dumb question

2016-08-07 Thread Rob Willett

Richard, Ryan,

Thanks for this. We were dimly aware of WAL but until now hadn’t 
needed to use it.


We’ve done a quick check with it and it *seems* to work on a test 
database. We’ve all read the docs again and paid attention to 
https://www.sqlite.org/wal.html#bigwal


To test if it works we started our long running analytics query, on our 
test machine it takes around 8 mins. We then speed up the rate of 
updating our database with external data. In the real world an update 
comes along every 3-5 mins, in our test system we queue them up so we 
have them every 6-10 secs so they are around 60x quicker. The updates 
are real data around 3-5MB in size.


We monitored the -wal and the .shm files created as we throw data in the 
database.


The .wal file gets larger and larger until it hits 224MB and then stays 
constant, the .shm file is only 1.8MB and seems to stay at that size. We 
can also see that the main sqlite database is NOT updated (or at least 
the timestamp isn’t) whilst we are running the updates in WAL mode. 
This appears to be correct as the updates would be in the -wal file.


The times taken for each updates seems a little slower (10% possibly but 
that could be just because we are looking at it) but since the data is 
real and variable in size, it might be just our subjective impression.


Once the long running read-only analytics query completes, the main 
sqlite database appears to get updated (or at least the timestamp on the 
file is updated) as we are still updating with our test data and the 
-wal files are still being used.


Once we stop updating with our test data, the -wal files and .shm files 
disappear (as expected).


A quick check of the database appears to show its correct.

One question though, the size of the -wal file worries us. 
https://www.sqlite.org/wal.html#bigwal states


```
Avoiding Excessively Large WAL Files

In normal cases, new content is appended to the WAL file until the WAL 
file accumulates about 1000 pages (and is
thus about 4MB in size)  at which point a checkpoint is automatically 
run and the WAL file is recycled.
The checkpoint does not normally truncate the WAL file (unless the 
journal_size_limit pragma is set).
Instead, it merely causes SQLite to start overwriting the WAL file from 
the beginning. This is done because
it is normally faster to overwrite an existing file than to append. When 
the last connection to a database
closes, that connection does one last checkpoint and then deletes the 
WAL and its associated shared-memory

file, to clean up the disk.
```

We have not set the journal_size_limit and we have a -wal file which is 
224MB in size, somewhat larger than 4MB. We are running


3.8.2 2013-12-06 14:53:30 27392118af4c38c5203a04b8013e1afdb1cebd0d

which does not appear to have the code in 3.11.0 so that the WAL file is 
proportional to the size of the transaction. From the same page of the 
manual:


```
Very large write transactions.

A checkpoint can only complete when no other transactions are running, 
which
means the WAL file cannot be reset in the middle of a write transaction. 
So a large change to a large database
might result in a large WAL file. The WAL file will be checkpointed once 
the write transaction completes
(assuming there are no other readers blocking it) but in the meantime, 
the file can grow very big.


As of SQLite version 3.11.0, the WAL file for a single transaction 
should be proportional in size to
the transaction itself. Pages that are changed by the transaction should 
only be written into the WAL
file once. However, with older versions of SQLite, the same page might 
be written into the WAL file multiple

times if the transaction grows larger than the page cache.
```

We think that using WAL mode works for us, indeed inspection seems to 
indicate it does, but the size of the -wal file appears to be far larger 
than would be expected. Is there a problem here? It doesn’t appear to 
be a problem but would welcome any comments.


Thanks for taking the time to reply.

Rob


On 6 Aug 2016, at 22:35, R Smith wrote:


On 2016/08/06 10:50 PM, Rob Willett wrote:


Our understanding of this is that many processes can READ the 
database at the same time but NO process can INSERT/UPDATE if another 
is reading. We had thought that one process can write and multiple 
processes can read. Our reading (no pun intended) now of this 
paragraph from the manual is that you cannot write if one or more 
processes is reading. Have we understood this correctly? If so is 
there an easy way to get around this?


The Write-Ahead-Log (WAL) journal mode will help you. It basically 
allows a writer to write to the WAL Log in stead of the main database 
so that any amount of readers can still do their thing reading the 
database (and the parts of the WAL journal that is already committed, 
or even parts still in progress if you use "read_uncommitted" mode). 
SQLite then pushes committed data into the DB file based on 
Checkpo

[sqlite] Locking databases - Possibly (probably?) a dumb question

2016-08-06 Thread Rob Willett

Hi,

We’ve been using Sqlite though Perl for some time now and have started 
to get more adventurous.


Our SQLite database is around 32GB in size, is created and manipulated 
by a single Perl process and is working well.


What we now want to do is mine the database using a very long running 
query to generate analytics.We have the SQL written to query the 
database, we have requested help on this list from last year and we are 
happy with the SQL query itself. It takes a long time (between 5 and 60 
mins) as its a complex query and collates an awful lot of data. Whilst 
we would love to have the query execute in 10 secs, thats not going to 
happen due to the size of the data and the queries we need to execute.


What we have now found is that when we are running the analytics query 
in one Perl process, we can no longer UPDATE the main database through 
another Perl process. We are getting “database is locked” errors.


We don’t need any help with our SQL but we are trying to understand 
how the locking works (at a high level) in SQL. Reading the docs 
(https://www.sqlite.org/lockingv3.html) for locking seems to indicate 
the problem


```
SHARED	The database may be read but not written. Any number of processes 
can hold SHARED locks at the same time, hence there can be many 
simultaneous readers. But no other thread or process is allowed to write 
to the database file while one or more SHARED locks are active.


```

Our understanding of this is that many processes can READ the database 
at the same time but NO process can INSERT/UPDATE if another is reading. 
We had thought that one process can write and multiple processes can 
read. Our reading (no pun intended) now of this paragraph from the 
manual is that you cannot write if one or more processes is reading. 
Have we understood this correctly? If so is there an easy way to get 
around this?


Further down the page we find

```
5.1 Writer starvation

In SQLite version 2, if many processes are reading from the database, it 
might be the case that there is never a time when there are no active 
readers. And if there is always at least one read lock on the database, 
no process would ever be able to make changes to the database because it 
would be impossible to acquire a write lock. This situation is called 
writer starvation.


SQLite version 3 seeks to avoid writer starvation through the use of the 
PENDING lock. The PENDING lock allows existing readers to continue but 
prevents new readers from connecting to the database. So when a process 
wants to write a busy database, it can set a PENDING lock which will 
prevent new readers from coming in. Assuming existing readers do 
eventually complete, all SHARED locks will eventually clear and the 
writer will be given a chance to make its changes.



```

We cannot do this as we cannot set the PENDING lock as we do not know 
when we need to do a write.


If we have a single reader and therefore no writers we do have a Plan B 
(and a plan C) so whilst there is some work for us (1-2 days), we can 
put a workflow into the system to get around it.


Thanks,

Rob
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Locking databases - Possibly (probably?) a dumb question

2016-08-07 Thread Rob Willett

Ryan,

Thanks for the update.

We have done a few more tests during the day and not had any issues to 
date. This is still on a test version but we are getting a warm, cuddly 
feeling about using WAL mode.


The -wal file grows as you describe and you have explained it very well. 
We were groping in the twilight to get to where we wanted to go, your 
explanation brought a bright beacon of light onto the proceedings. (I 
have been watching too many speeches from the various US political 
conventions in the US though I am British).


We will investigate changing the page size. We would need to work out 
the row size.


I will note in future your OCD and ensure that I am accurate in 
reporting numbers rather than have self inflicted rounding errors, 60x 
is a nicer number than 50x as it maps to mins and secs more easily :)


Thanks again for the help.

Rob

On 7 Aug 2016, at 12:11, R Smith wrote:


On 2016/08/07 8:55 AM, Rob Willett wrote:

Richard, Ryan,

Thanks for this. We were dimly aware of WAL but until now hadn’t 
needed to use it.


We’ve done a quick check with it and it *seems* to work on a test 
database. We’ve all read the docs again and paid attention to 
https://www.sqlite.org/wal.html#bigwal


To test if it works we started our long running analytics query, on 
our test machine it takes around 8 mins. We then speed up the rate of 
updating our database with external data. In the real world an update 
comes along every 3-5 mins, in our test system we queue them up so we 
have them every 6-10 secs so they are around 60x quicker. The updates 
are real data around 3-5MB in size.


We monitored the -wal and the .shm files created as we throw data in 
the database.


The .wal file gets larger and larger until it hits 224MB and then 
stays constant, the .shm file is only 1.8MB and seems to stay at that 
size. We can also see that the main sqlite database is NOT updated 
(or at least the timestamp isn’t) whilst we are running the updates 
in WAL mode. This appears to be correct as the updates would be in 
the -wal file.


I'm truncating this post for brevity - but basically your concern 
about the size (voiced later in the post) is not a concern. What 
happens is the stated 4MB is simply 1000 pages x 4KB default page size 
- your page size might be a lot bigger (and should be set higher 
looking at your DB size and data entry sizes - I think it is "nicer" 
if, at a minimum, a complete row can fit on a page). Further, the WAL 
for your version of SQLite will grow with copies of data and multiple 
inserts in it because of the long-running query not allowing push-back 
check points for the time - and looking at your insert frequency and 
size, your WAL size seems pretty normal. (If you manage it wrong, it 
will fill up Terrabytes - this is the situation you want to avoid, but 
I think you've got it sorted).


The Documentation simply describes the normal situation, which yours 
isn't.


Also, on a point of satisfying my OCD... going on your quoted averages 
- 5 minutes vs. 10 secs is a 30:1 ratio (5 mins = 300 seconds vs. 10 
secs) so the speed is only 30x faster, not 60) - And before anyone 
asks, yes the low end of the ranges given is 3mins (180s) vs. 6 secs 
which is also a 30:1 ratio.  Even if I take the opposite range 
extremes (5 mins vs. 6s) I still only get 50x speedup.  LoL - Sorry, 
I'll shush now :)


As an aside, I think Richard posted a small study of testing multiple 
DB ops with varying page sizes and varying hardware page sizes, and 
basically, IIRC, the Jury was out on best size in the general case 
with 8192 seeming to be a good standard and the idea that the page 
size should try match the underlying OS page size for best performance 
turned out to be a bit of a "sometimes maybe", but the point was made 
that every implementation should experiment to find the optimum size. 
That said, my memory cannot be trusted - could someone re-post that or 
point us to an on-line page somewhere? Thanks!


Cheers,
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Locking databases - Possibly (probably?) a dumb question

2016-08-07 Thread Rob Willett

Jean-Christophe

Thanks for the update on wal-mode. Your explanation is clear and makes 
sense to us. We can see what we would have a 224MB -wal file, we 
experimented with killing processes whilst updating and generally 
messing around and SQLite did what is was supposed to do. I wouldn’t 
say we were exhaustively testing it and to be honest, I know we can’t 
push SQLite to its limits with the little things we use it for.


We did understand the differences in 3.11.10 and 3.8 re the size of the 
-wal mode, its just that I communicated it poorly. Too little sleep and 
far too much coffee.


We are going to do some more tests, more about familiarising ourselves 
with WAL rather than expecting it to break to be honest. WAL seems to 
work well enough for us and assuming our last conversion tests work OK, 
we’ll shine it in tomorrow night when we get some downtime.


Thanks for you help and elegant description

Rob

On 7 Aug 2016, at 9:59, Jean-Christophe Deschamps wrote:


Rob,

At 08:55 07/08/2016, you wrote:
We think that using WAL mode works for us, indeed inspection seems to 
indicate it does, but the size of the -wal file appears to be far 
larger than would be expected. Is there a problem here? It doesn't 
appear to be a problem but would welcome any comments.


After reading your post I'd like to clear up a few points about WAL 
mode.


We can also see that the main sqlite database is NOT updated (or at 
least the timestamp isn't) whilst we are running the updates in WAL 
mode. This appears to be correct as the updates would be in the -wal 
file.


The WAL mode is persistant and consistant. That means that once 
successfully put in his mode the DB itself will remain in WAL mode for 
every (new) connection. Thus your updates and the long-running query 
are both running under WAL mode. That is, provided the WAL mode was 
set prior to the start of the long-running query, but that detail 
doesn't matter for reads in this case.


It doesn't matter whether your query is a single query statement 
(hence in auto-commit mode) or a huge transaction extracting and 
massaging data in multiple temp tables and myriads of read/write 
statements, all inside an explicit transaction), ACID properties 
guarantee that once your query is started, it will see the DB in the 
state prior to any updates that could occur during its run. Else you 
would obtain potentially dangerously inconsistant data of course.


We have not set the journal_size_limit and we have a -wal file which 
is 224MB in size, somewhat larger than 4MB. We are running


3.8.2 2013-12-06 14:53:30 27392118af4c38c5203a04b8013e1afdb1cebd0d

which does not appear to have the code in 3.11.0 so that the WAL file 
is proportional to the size of the transaction. From the same page of 
the manual:


```
Very large write transactions.

A checkpoint can only complete when no other transactions are 
running, which
means the WAL file cannot be reset in the middle of a write 
transaction. So a large change to a large database
might result in a large WAL file. The WAL file will be checkpointed 
once the write transaction completes
(assuming there are no other readers blocking it) but in the 
meantime, the file can grow very big.


As of SQLite version 3.11.0, the WAL file for a single transaction 
should be proportional in size to
the transaction itself. Pages that are changed by the transaction 
should only be written into the WAL
file once. However, with older versions of SQLite, the same page 
might be written into the WAL file multiple

times if the transaction grows larger than the page cache.
```


Reread the quoted part again: only with SQLite versions 3.11.0 and 
above will a given page be written only once in the WAL file when 
initiated within a given transaction.


Since you're running a prior version, it's pretty logical to expect 
that your updates will cause writes of multiple distinct copies of the 
same pages in the WAL file. You should update your SQLite version to 
see a possible improvement there. That, or refer only to the old 3.8.2 
documentation, but this is an inferior option because there have been 
significant improvements meanwhile.


Also the ratio in the help file (1000 pages or about 4Mb) applies to 
the default page size (4Kb).


Finally, while the long-running query is running, no checkpoint can 
run to completion. Doc states under "Checkpoint starvation.":


However, if a database has many concurrent overlapping readers and 
there is always at least one active reader, then no checkpoints will 
be able to complete and hence the WAL file will grow without bound.


Since you clearly can't introduce a read-gap inside your read query, 
the .wal file will grow as large as it needs until completion of the 
query. You mentionned that you tested with much more frequent updates 
than the real-world case (120x actually), so .wal file size shouldn't 
be an issue in your actual use case.


HTH

--
Jean-Christophe

Re: [sqlite] New tool for PUTTY logging [Windows]

2017-02-02 Thread Rob Willett
I've been following this thread with interest. I have used Putty for 
years as its the de-facto standard for decent ssh terminals on Windows 
boxes.


A slightly more radical suggestion for the log files. Since Putty is 
open source, have a look at the code and see if you can easily add in a 
timestamp per line for the log file section.


That gives you two features:

1. You now have a verifiable traceable source as you have downloaded and 
compiled it. I have worked in environments where we need to trace every 
bit of code that comes into the data centre. We need to know where we 
downloaded it from, what the license was, authority from legal to use 
etc etc. Your situation might not warrant it :)


2. You now have unique lines with a timestamp, a hostname and a TTY 
session (I assume). I think that guarantees uniqueness.


I have no idea if Putty can be modified in this way, but it wouldn't 
hurt to have a look, see if the change is easy, do the change and then 
send some diffs back to the Putty team. If they accept the changes 
you're sorted. If they don't well, Putty doesn't change that much over 
time so you could probably use your version for years to come.


Rob

On 2 Feb 2017, at 19:53, Stephen Chrzanowski wrote:

I can only get to our customer machines by jumping into a server that 
has
access to both sides of the network.  Our side, and the customer side. 
 I
can't get to a customers machine directly.  The  is out, but I'm 
already

doing the rest.

The image in my head of what my program is going to do is that I feed 
it a
date range, a server I'm interested in, and optionally provide text 
that
further filters the information I'm looking for.  Once I have the 
filtered
data, I'd have a list of days that I'd been on that exact server, 
and/or

entries that mention my subject server, and I can see the text only
pertaining to that machine and date range.  I'd be able to read the 
full
set of activities on that machine for that day, and not have to hop 
around

to multiple log files..  This would get rid of the concept of many log
files as well, since all files are now one.  Kind of Borg-ish?


On Thu, Feb 2, 2017 at 11:54 AM, Donald Griggs  
wrote:



Maybe another method to consider:

This guy shows that Putty appears to support creating separate log 
files

for each session including a timestamp in the file name.

https://www.viktorious.nl/2013/01/14/putty-log-all-session-output/

Could your script import any new log files it sees, then move them to 
an

archive?

That way, you'd never have to read through a huge log file to find 
what

should be imported.


==From the page linked above:


I am using some putty parameters which will make every session 
unique, in

this case “”, which means:

   -  = hostname for the session
   -  = year
   -  = month
   -  = day
   -  = time

 ==
​
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database malformed after 6000 Inserts?

2016-10-04 Thread Rob Willett

We don't use Windows Server or System.Data.SQLite so can't comment.

I'd be astonished if its Sqlite itself thats at fault here.

Rob

On 4 Oct 2016, at 13:24, Werner Kleiner wrote:


Thanks for help.
Hopefully I give you the correct answer, because a collegue has
written the C# program.
We had no problems with inserts in the past, but now if we have
records about 6000 inserts we get the errors.

The OS is Server 2012, there are no pragma settings (but collegue has
also used with pragma settings)
It is a single thread with "synchronize full"
We use the System.Data.SQLite.dll with version 1.0.89.0 from 
12.12.2013


hope this is what you mean

2016-10-04 13:48 GMT+02:00 Jim Borden :
I had a problem similar to this before. What is the threading model 
for access to the database and how is the native library compiled and 
configured?


Jim Borden
(Sent from a mobile device)

On 4 Oct 2016, at 19:12, Werner Kleiner  
wrote:


Hello,
a program written in C# makes inserts from an SQL script into a 
sqlite db.

We now saw that the database will be malformed after 6000 records.

Is there a limitation with huge inserts?
What could be the problem?

regards
Werner
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database malformed after 6000 Inserts?

2016-10-04 Thread Rob Willett
We've done inserts of tens of thousand at a time, we may well have done 
hundreds of thousands in one single transaction. I've no doubt other 
people do even larger transactions.


I would assume the problem lies elsewhere.

What error message are you getting? Whats the OS, the environment, disk, 
pragma settings?


Rob

On 4 Oct 2016, at 11:11, Werner Kleiner wrote:


Hello,
a program written in C# makes inserts from an SQL script into a sqlite 
db.

We now saw that the database will be malformed after 6000 records.

Is there a limitation with huge inserts?
What could be the problem?

regards
Werner
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using too much memory to execute an update query

2016-10-02 Thread Rob Willett
We have production databases with 30-50GB and have no issues with 
managing them. Mind you we have more memory, though we only have 8GB as 
thats all our VM provider allows us.


After saying that we have never seen any performance issues that weren't 
due to our schemas or our bad design. Our working assumption is that if 
there's a problem, its our fault before we blame SQLite. So far that 
working assumption had held true :)


We did some tests (just for the hell of it) to 200GB with dummy data and 
had no issues I can recall.


I know that other people on this list have far, far bigger production 
databases than us.


I'm not a SQLite expert but I look on machines with 2GB of main memory 
as rather small for this sort of thing.


Rob

On 2 Oct 2016, at 21:35, Domingo Alvarez Duarte wrote:


Hello !

I'm still fighting to use sqlite with a 20GB database and now I'm 
trying to update a table with 980M rows but sqlite is eating all my 
memory (2GB) and making blocking my computer (too much swap).


I'm in doubt if sqlite can really manage databases above 1GB.

Have someone any experience with big databases ?

Cheers !

The query:

--explain query plan
update bolsas_familia set favorecido_id=(select id from favorecidos 
where nis=favorecido_id);


Query plan:

selectid|order|from|detail
0|0|0|SCAN TABLE bolsas_familia
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 0
0|0|0|SEARCH TABLE favorecidos USING COVERING INDEX 
favorecidos_nis_idx (nis=?)


The schema is:

CREATE TABLE bolsas_familia(
id integer primary key,
favorecido_id integer not null,
valor_parcela numeric,
mes_competencia integer
); //980M rows

CREATE TABLE "favorecidos"(
id integer primary key,
nis integer,
nome varchar collate nocase_slna,
municipio_id integer,
programa_id integer not null,
fonte_finalidade_id integer not null,
first_ano_mes integer
); //14M rows

CREATE UNIQUE INDEX favorecidos_nis_idx ON favorecidos(nis);

Cheers !

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using too much memory to execute an update query

2016-10-02 Thread Rob Willett
We tend to use append only. We delete some small transient data as we 
use them, this transient data maintains a bit of state between appends. 
We're talking 300-500 lines of a table.


It's not read only, we add a lot of data every 3-5 minutes, but its all 
driven from a single Perl process that captures XML data, processes it 
as only Perl can, and then updates the database. Nothing updates the 
database between those time intervals. No foreign keys at all.


The schema is quite long but rather simplistic. There are lots of tables 
all referenced by a big central table, since a lot of our data is common 
and referenced thousands of times by different rows, we needed to reduce 
the amount of data or else we would have tripled the size of the main 
database to over 100GB.


Its 1,200 lines long so not really suitable for posting here, but the 
main table is


-- 
--  Table structure for Disruptions
-- 
DROP TABLE IF EXISTS "Disruptions";
CREATE TABLE "Disruptions" (
 "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
 "version" integer NOT NULL,
 "Disruption_id" INTEGER NOT NULL,
 "status" integer NOT NULL,
 "severity" integer NOT NULL,
 "levelOfInterest" integer NOT NULL,
 "category" integer NOT NULL,
 "subCategory" integer NOT NULL,
 "startTime" TEXT NOT NULL,
 "endTime" text NOT NULL,
 "location" integer NOT NULL,
 "corridor" integer NOT NULL,
 "comments" integer NOT NULL,
 "currentUpdate" integer NOT NULL,
 "remarkTime" TEXT NOT NULL,
 "lastModTime" TEXT NOT NULL,
 "CauseAreaPointX" real NOT NULL,
 "CauseAreaPointY" real NOT NULL,
 "Direction" TEXT
);
INSERT INTO "main".sqlite_sequence (name, seq) VALUES ("Disruptions", 
'0');


All the integer fields are really indexes into other tables. We made a 
deliberate decision to move some of the more complex logic out of SQL 
into Perl as it sat better there. This was no reflection on SQLite, we 
would have made the same decision with PostGres or MySQL or Oracle. 
Right tool for the right job etc etc.


Rob

On 2 Oct 2016, at 22:47, Domingo Alvarez Duarte wrote:


Hello Rob !

How do you use those big databases ?

Append only ? Read only ? Constant read write ? Foreign keys ?

Could you show the schema to talk about it ?

Cheers !

On 02/10/16 17:44, Rob Willett wrote:
We have production databases with 30-50GB and have no issues with 
managing them. Mind you we have more memory, though we only have 8GB 
as thats all our VM provider allows us.


After saying that we have never seen any performance issues that 
weren't due to our schemas or our bad design. Our working assumption 
is that if there's a problem, its our fault before we blame SQLite. 
So far that working assumption had held true :)


We did some tests (just for the hell of it) to 200GB with dummy data 
and had no issues I can recall.


I know that other people on this list have far, far bigger production 
databases than us.


I'm not a SQLite expert but I look on machines with 2GB of main 
memory as rather small for this sort of thing.


Rob

On 2 Oct 2016, at 21:35, Domingo Alvarez Duarte wrote:


Hello !

I'm still fighting to use sqlite with a 20GB database and now I'm 
trying to update a table with 980M rows but sqlite is eating all my 
memory (2GB) and making blocking my computer (too much swap).


I'm in doubt if sqlite can really manage databases above 1GB.

Have someone any experience with big databases ?

Cheers !

The query:

--explain query plan
update bolsas_familia set favorecido_id=(select id from favorecidos 
where nis=favorecido_id);


Query plan:

selectid|order|from|detail
0|0|0|SCAN TABLE bolsas_familia
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 0
0|0|0|SEARCH TABLE favorecidos USING COVERING INDEX 
favorecidos_nis_idx (nis=?)


The schema is:

CREATE TABLE bolsas_familia(
id integer primary key,
favorecido_id integer not null,
valor_parcela numeric,
mes_competencia integer
); //980M rows

CREATE TABLE "favorecidos"(
id integer primary key,
nis integer,
nome varchar collate nocase_slna,
municipio_id integer,
programa_id integer not null,
fonte_finalidade_id integer not null,
first_ano_mes integer
); //14M rows

CREATE UNIQUE INDEX favorecidos_nis_idx ON favorecidos(nis);

Cheers !

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mai

Re: [sqlite] creating a table

2016-11-21 Thread Rob Willett

John,

There is a lot of documentation on the SQLite website.

Here's the 'official' docs on creating a table 
https://www.sqlite.org/lang_createtable.html


A Sqlite database consists of many tables. I am unsure if there is an 
upper limit, if there is, its more tables than I have ever created. You 
probably need to read up a bit more on SQL in general and SQLite in 
particular. Its a great SQL database for many uses, though not for every 
use. Here's a summary https://sqlite.org/whentouse.


SQLite does not set limits on text fields. Thats a great strength (some 
people may disagree), see here https://www.sqlite.org/datatype3.html


You can create a table and specify the text length but its there for 
compatibility and is ignored. You want to drop 20, 200 or 2000 chars in 
your field, go ahead and do it.


SQLite is very flexible and very fast, there's a ton of help and docs 
out there, the support is direct from the people who write it.


Rob


On 21 Nov 2016, at 17:29, John R. Sowden wrote:

First of all, I come from the dBASE/Foxpro world. There is no 
distinction between a table and a database.  I understand that with 
Sqlite a database includes tables and other items.  The scenario that 
I do not understand, is: say I have a log file with about 7 fields 
totaling about 80 characters per record.  How do I name the database 
and table.  Currently I say log16 for the year 2016.


Secondly, I have 2 "front ends" for Sqlite on my Ubuntu 16.04 
computer.  Neither one allows me to set the length of the text fields 
in the table creation process.  How does the Sqlite know how long each 
record should be, same with integers.


No help found in the documentation on the Sqlite web site.

John

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Segmentation fault on OpenBSD

2016-11-01 Thread Rob Willett

Simon,

We use the Perl DBD module all the time. What I would expect to see is 
(IGNORE THE LINE BREAKS)


my $sth = $dbh->prepare("INSERT INTO
func_begin_change(
author,
author_contact,
author_contact_method,
author_shortname,
id,
lang,
mtime,
mtimetz,
parent_id
)
VALUES
(
?,
?,
?,
?,
?,
?,
?,
?,
?
)");

followed by

$sth->execute('Mark Lawrence',
  'em...@address.net',
  'email',
  'ML',
  13,
  'en',
  '1478010282201',
  360,
  undef);

The prepare sets the statement up and the execute fills the variables 
in.


Now Mark has probably printed out using Dumper an array in Perl with the 
variables. Thats shown by the use of $VAR1. Anybody who uses Perl uses 
Dumper and is very familiar with the output format :)


Here's what the Perl array could/would/should look like

	['Mark 
Lawrence','em...@address.net','email','ML',13,'en','1478010282201',360,undef]


Now he could call $sth->execute_array(\%attar) rather than 
$sth->execute(var1 , var2) see 
http://search.cpan.org/~timb/DBI-1.636/DBI.pm#execute


We've just checked our Perl libraries and we have never used 
$sth->execute_array as we prefer to be very specific about what we pass 
in.


It would be useful to see the actual Perl itself rather than the 
abstraction in the email. I can't comment on BSD as we don't use it BSD 
at all, but the coding pattern in Perl is pretty standard and very, very 
common. I'd be surprised if this is a problem but you never know.


Rob


On 1 Nov 2016, at 14:57, Simon Slavin wrote:




On 1 Nov 2016, at 2:38pm, mark  wrote:

   VALUES
   (
   ?,
   ?,
   ?,
   ?,
   ?,
   ?,
   ?,
   ?,
   ?
   )
   ;

At least that is what I am passing to Perl's DBD::SQLite prepare()
method. I am then binding the following values before running 
execute:


   $VAR1 = [
 'Mark Lawrence',
 'em...@address.net',
 'email',
 'ML',
 13,
 'en',
 '1478010282201',
 360,
 undef
   ];


Excuse me.  I don't know Perl, or how it uses its DBD module.  Are you 
binding each of those values (e.g. 'ML') separately, or are you 
binding the whole of $VAR1 as one operation ?


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett

Thanks to everybody for their help earlier today.

As promised here's the results of our various tests. Hopefully they may 
be of use to somebody...


We decided to start from a known position and so recreated the original 
index with the collation in it. We know this was sub optimal but its our 
reference point. We have the bytecode output if anybody wants to see it.


CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE 
ASC);


We ran the the following SQL twice

echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null


and the two runs totalled 46 mins. Each was actually 23 mins.

We then dropped the old index, built the new one

echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' | 
sqlite3 tfl.sqlite


We ran

echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null


twice and each run as 12 mins. So we were twice as quick, which is nice.

We then ran

echo "explain select * from Disruptions where status = 2 UNION ALL 
select * from Disruptions where status = 6;" | sqlite3 tfl.sqlite


twice. Each run was around 11.5 mins. We're not going to get into 
differences of less than a minute on a run of this size, so we'll say 
they are about the same speed.


Interesting results, clearly the collation does make a big difference. 
We are now going to go through the schema and check if we have made the 
same mistake elsewhere.


Thanks for your help, we can post the bytecode it people are interested.

Rob

On 17 Mar 2017, at 11:41, Rob Willett wrote:


Gunter,

I would never presume to describe anybody as a Nerd!

We're just going back to very first position with the 'bad' collation 
index so we can do proper timings as we change things so we understand 
the speed up (we hope there is a speed up)


We've written a quick script to check each version. Once we've put the 
original index back in, we've added a step to generate the SQLite 
bytecode for you. It's the least we can do...


We'll post this when its completed but we suspect it may take most of 
the day now :)


echo "Using Index 'CREATE INDEX "Disruptions_idx4" ON Disruptions 
("status" COLLATE NOCASE ASC);'"
echo "explain select * from Disruptions where status = 2 OR status = 
6;" | sqlite3 tfl.sqlite

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date

echo "---"

echo "Creating new index without collation"
echo "drop index Disruptions_idx4;" | sqlite3 tfl.sqlite
echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' | 
sqlite3 tfl.sqlite
echo "explain select * from Disruptions where status = 2 OR status = 
6;" | sqlite3 tfl.sqlite

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date

echo "---"

echo "Trying SELECT statement with UNION ALL"
echo "explain select * from Disruptions where status = 2 OR status = 
6;" | sqlite3 tfl.sqlite

date
echo "select * from Disruptions  where status = 2 UNION ALL select * 
from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null

date
echo "select * from Disruptions  where status = 2 UNION ALL select * 
from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null

date


On 17 Mar 2017, at 11:30, Hick Gunter wrote:

Nerds with chronic byte code affinity like myself would like to see 
the output of "explain" (without "query plan"), i.e. the SQLite 
bytecode produced. I guess the query with OR will have a subprogram 
called once for each status value, whereas I expect the query with 
UNION ALL to have 2 copies of the search (which would not affect the 
run time) and maybe even a temporary table of results (which would 
take longer and use more memory).


-Ursprüngliche Nachricht-
Von: sqlite-users 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von 
Rob Willett

Gesendet: Freitag, 17. März 2017 12:19
An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index 
question ...


Gunter, Simon,

Thanks for the replies, You both seem to be drilling into the 
collation sequence as a possible issue. We now have a new index and 
we have just run the query again


sqlite> analyze;
sqlite> drop index Disruptions_idx4;
sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
sqlite> explain query plan select * 

Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett

Simon,

We're still benchmarking the various changes we've made during the day. 
I'm sitting here watching a tail of a log file waiting for it to finish. 
Ah the joys of the internet startup :)


Our plan of action is

1. Have baseline from which we can work from. We now have that.

2. Create the correct sub-query index and work out the (hopefully) 
increase in speed.


3. Work through all the remaining indexes and check that we have not 
made the same mistake. I know we actually have :( Keep rerunning our 
benchmark so we know if we are actually making a difference.


4. Work through returning just the columns we actually need from our 
queries. We have a recollection that if we can build an index with all 
the information necessary in it, we can do all the work in joins rather 
than paging out to disk. Is this what you are referring to?


5. Sleep (not exactly sure when) and watch three international rugby 
games tomorrow.


Rob

On 17 Mar 2017, at 18:15, Simon Slavin wrote:

On 17 Mar 2017, at 5:30pm, Rob Willett <rob.sql...@robertwillett.com> 
wrote:


echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null


twice and each run as 12 mins. So we were twice as quick, which is 
nice.


Do you actually need all columns ?  If not, then specifying the 
columns you need can lead to a further speedup.  It might be enough 
just to specify the columns you need, but you can achieve further 
increases in speed by making a covering index.  If speed for this 
SELECT is sufficiently important to you, and you don’t actually need 
all columns, post again and we’ll explain further.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett

Dear all,

We progress steadily forward.

1. We immediately halved our execution time by moving to an updated 
index that doesn't use COLLATE. Thats 32mins to 16mins.


2. We then shaved a further six minutes off the execution time by 
removing extraneous fields in the select statement, so instead of 
"select * ...", we identified which fields we used and directly selected 
those. So we are now down to 10 mins or 1/3 of when we started for, to 
be honest, virtually no extra work, merely being smarter, or rather you 
being smarter.


3. We have looked through all our indexes and can see that every index 
has a COLLATE against it, even if the column is an integer. We have 
raised a support call with Navicat.


4. The next step is to create a "covering index" to try and get the 
whole of the query into the index. However its 22:11 in London and I 
need to get home.


Thanks very much for the help so far. Tomorrow is more tricky but I'll 
read up on covering indexes to see how to use them,.


Rob

On 17 Mar 2017, at 18:39, Simon Slavin wrote:

On 17 Mar 2017, at 6:22pm, Rob Willett <rob.sql...@robertwillett.com> 
wrote:


4. Work through returning just the columns we actually need from our 
queries. We have a recollection that if we can build an index with 
all the information necessary in it, we can do all the work in joins 
rather than paging out to disk. Is this what you are referring to?


It works only where all the columns you need to read are in the same 
table.  The ideal form of a covering index is to have the columns 
listed in this order:


1) columns needed for the WHERE clause
2) columns needed for the ORDER BY clause which aren’t in (1)
3) columns needed to be read which aren’t in (2) or (1)

SQLite detects that all the information it needs for the SELECT is 
available from the index, so it doesn’t bother to read the table at 
all.  This can lead to something like a doubling of speed.  Of course, 
you sacrifice filespace, and making changes to the table takes a 
little longer.


5. Sleep (not exactly sure when) and watch three international rugby 
games tomorrow.


Sleep while waiting for indexes to be created and ANALYZE to work.  
May you see skilled players, creative moves and dramatic play.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett
We've just implemented a covering index for the last step (is it 
really?) in our quest to get the query execution time down.


To summarise we have gone from 32 mins to 16 mins by updating an index 
so it doesn't use collate, we took another six minutes off by removing 
extra fields in the select we didn't need.


We have just created a new index which 'covers' all the fields we use in 
the select, this means (and I paraphrase) that we use the index to get 
all the data and there is no need to read from the database.


Well that was a bit of a surprise, the index creation took 45 mins, we 
ran the program again and thought, rats, we've cocked it up, it only 
took 54 secs, we got something wrong. So we checked it and checked again 
and we hasn't got anything wrong. Our query has moved from 32 mins to 54 
secs.


We're quite happy with that performance increase. In fact we're 
delighted, so thanks for all the help in getting us to this stage.


We have kept copies of the query planner bytecode output if anybody is 
interested. Gunter has had copies, but if anybody else would like them, 
please ask.


Many thanks again for all the help,

Rob

On 17 Mar 2017, at 22:12, Rob Willett wrote:


Dear all,

We progress steadily forward.

1. We immediately halved our execution time by moving to an updated 
index that doesn't use COLLATE. Thats 32mins to 16mins.


2. We then shaved a further six minutes off the execution time by 
removing extraneous fields in the select statement, so instead of 
"select * ...", we identified which fields we used and directly 
selected those. So we are now down to 10 mins or 1/3 of when we 
started for, to be honest, virtually no extra work, merely being 
smarter, or rather you being smarter.


3. We have looked through all our indexes and can see that every index 
has a COLLATE against it, even if the column is an integer. We have 
raised a support call with Navicat.


4. The next step is to create a "covering index" to try and get the 
whole of the query into the index. However its 22:11 in London and I 
need to get home.


Thanks very much for the help so far. Tomorrow is more tricky but I'll 
read up on covering indexes to see how to use them,.


Rob

On 17 Mar 2017, at 18:39, Simon Slavin wrote:

On 17 Mar 2017, at 6:22pm, Rob Willett <rob.sql...@robertwillett.com> 
wrote:


4. Work through returning just the columns we actually need from our 
queries. We have a recollection that if we can build an index with 
all the information necessary in it, we can do all the work in joins 
rather than paging out to disk. Is this what you are referring to?


It works only where all the columns you need to read are in the same 
table.  The ideal form of a covering index is to have the columns 
listed in this order:


1) columns needed for the WHERE clause
2) columns needed for the ORDER BY clause which aren’t in (1)
3) columns needed to be read which aren’t in (2) or (1)

SQLite detects that all the information it needs for the SELECT is 
available from the index, so it doesn’t bother to read the table at 
all.  This can lead to something like a doubling of speed.  Of 
course, you sacrifice filespace, and making changes to the table 
takes a little longer.


5. Sleep (not exactly sure when) and watch three international rugby 
games tomorrow.


Sleep while waiting for indexes to be created and ANALYZE to work.  
May you see skilled players, creative moves and dramatic play.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-20 Thread Rob Willett

David,

We're English and masters of the understatement ;)

Rob

On 20 Mar 2017, at 15:04, David Raymond wrote:


"... Our query has moved from 32 mins to 54 secs.

We're quite happy with that performance increase."

I have to admit that the hearty chuckle which that statement produced 
from me hurt my sore throat. Totally worth it though.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-20 Thread Rob Willett

Just to close this thread down completely.

We've implemented the changes on our production server and the actual 
performance increase is even better than we predicted.


Our production server is a containerised Ubuntu Server. Its hardware 
configuration is significantly different to our Macs on which we 
develop. Our Macs have more RAM and SSD disks. The production server has 
other advantages though :)


The query took 90 mins on our production server, this is mainly due to 
slower disks than we have locally, so this was not unexpected.


After doing the performance changes (removing unneeded fields and adding 
in a covering index), we went down to 38 secs.


This has meant we are no longer looking at a second database just for 
analytics, we just needed to learn to use the database we already had :)


Rob

On 18 Mar 2017, at 5:48, Rob Willett wrote:

We've just implemented a covering index for the last step (is it 
really?) in our quest to get the query execution time down.


To summarise we have gone from 32 mins to 16 mins by updating an index 
so it doesn't use collate, we took another six minutes off by removing 
extra fields in the select we didn't need.


We have just created a new index which 'covers' all the fields we use 
in the select, this means (and I paraphrase) that we use the index to 
get all the data and there is no need to read from the database.


Well that was a bit of a surprise, the index creation took 45 mins, we 
ran the program again and thought, rats, we've cocked it up, it only 
took 54 secs, we got something wrong. So we checked it and checked 
again and we hasn't got anything wrong. Our query has moved from 32 
mins to 54 secs.


We're quite happy with that performance increase. In fact we're 
delighted, so thanks for all the help in getting us to this stage.


We have kept copies of the query planner bytecode output if anybody is 
interested. Gunter has had copies, but if anybody else would like 
them, please ask.


Many thanks again for all the help,

Rob

On 17 Mar 2017, at 22:12, Rob Willett wrote:


Dear all,

We progress steadily forward.

1. We immediately halved our execution time by moving to an updated 
index that doesn't use COLLATE. Thats 32mins to 16mins.


2. We then shaved a further six minutes off the execution time by 
removing extraneous fields in the select statement, so instead of 
"select * ...", we identified which fields we used and directly 
selected those. So we are now down to 10 mins or 1/3 of when we 
started for, to be honest, virtually no extra work, merely being 
smarter, or rather you being smarter.


3. We have looked through all our indexes and can see that every 
index has a COLLATE against it, even if the column is an integer. We 
have raised a support call with Navicat.


4. The next step is to create a "covering index" to try and get the 
whole of the query into the index. However its 22:11 in London and I 
need to get home.


Thanks very much for the help so far. Tomorrow is more tricky but 
I'll read up on covering indexes to see how to use them,.


Rob

On 17 Mar 2017, at 18:39, Simon Slavin wrote:

On 17 Mar 2017, at 6:22pm, Rob Willett 
<rob.sql...@robertwillett.com> wrote:


4. Work through returning just the columns we actually need from 
our queries. We have a recollection that if we can build an index 
with all the information necessary in it, we can do all the work in 
joins rather than paging out to disk. Is this what you are 
referring to?


It works only where all the columns you need to read are in the same 
table.  The ideal form of a covering index is to have the columns 
listed in this order:


1) columns needed for the WHERE clause
2) columns needed for the ORDER BY clause which aren’t in (1)
3) columns needed to be read which aren’t in (2) or (1)

SQLite detects that all the information it needs for the SELECT is 
available from the index, so it doesn’t bother to read the table 
at all.  This can lead to something like a doubling of speed.  Of 
course, you sacrifice filespace, and making changes to the table 
takes a little longer.


5. Sleep (not exactly sure when) and watch three international 
rugby games tomorrow.


Sleep while waiting for indexes to be created and ANALYZE to work.  
May you see skilled players, creative moves and dramatic play.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett

Gunter, Simon,

Thanks for the replies, You both seem to be drilling into the collation 
sequence as a possible issue. We now have a new index and we have just 
run the query again


sqlite> analyze;
sqlite> drop index Disruptions_idx4;
sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
sqlite> explain query plan select * from Disruptions where status = 2 OR 
status = 6;

selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

So we have a different response from the query planner, which I think is 
good.


If we use your other example

sqlite> explain query plan select * from Disruptions  where status = 2 
UNION ALL select * from Disruptions where status = 6;

selectid|order|from|detail
1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

I'm not sure which query is going to be faster. We'll have to try and 
see.


Your last suggestion of "select * from Disruptions  where status =2 
COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logically 
equivalent to "explain query plan select * from Disruptions where status 
= 2 OR status = 6;" now we have removed the collation from the index.


sqlite> explain query plan select * from Disruptions  where status =2 
COLLATE NOCASE or status = 6 COLLATE NOCASE;

selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

I'll check if we require all the fields, we require many (which I agree 
is not all) of the fields. Following this logic through, does this mean 
that it will do more file access bringing the records in from the file 
system?


The collation issue seems to be an artifact of the way Navcat for SQLite 
works. I suspect we need to be more careful about how we use the tool.


We'll now time the results of each query and run them twice to see the 
affect. No idea how long this will take but suspect a few hours :) I 
will post back the results as other people may (or may not) find this 
helpful.


Thanks

Rob

On 17 Mar 2017, at 10:57, Hick Gunter wrote:

On 17 Mar 2017, at 10:20am, Rob Willett 
<rob.sql...@robertwillett.com> wrote:


CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE 
NOCASE ASC);


[…]

As part of the larger more complex query, we are executing the query

```
select * from Disruptions where status = 2 OR status = 6;


The schema for the table says that "status" is INTEGER.
You are supplying numbers as arguments.
Those two match and should create no problem.

But your index has a collation order which is usually used for text.  
I don’t see that it is obviously wrong, but it does look a little 
weird.


Try creating another index which is just on "status", without the 
COLLATE clause.

Then do another ANALYZE, then try the SELECT again.

Simon.


If the index is deemed unsuitable by SQLite due to its collation 
sequence, then I expect it qwould also be ignored in "select ... 
status=1" (without the second ORed value)


If not, then (select ... where status =2 UNION ALL select where status 
= 6) should do the trick


Do you really require all the fields from Disruptions?

And yes, collating integers with NOCASE seems quite strange (there are 
no capital or lowercase numbers unless you are using roman numerals ;) 
); for text affinity, it should render the comparison operators 
caseblind, just like "like".



___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use 
of the intended recipient(s) only and may contain information that is 
confidential, privileged or legally protected. Any unauthorized use or 
dissemination of this communication is strictly prohibited. If you 
have received this communication in error, please immediately notify 
the sender by return e-mail message and delete all copies of the 
original communication. Thank you for your cooperation.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett

Gunter,

I would never presume to describe anybody as a Nerd!

We're just going back to very first position with the 'bad' collation 
index so we can do proper timings as we change things so we understand 
the speed up (we hope there is a speed up)


We've written a quick script to check each version. Once we've put the 
original index back in, we've added a step to generate the SQLite 
bytecode for you. It's the least we can do...


We'll post this when its completed but we suspect it may take most of 
the day now :)


echo "Using Index 'CREATE INDEX "Disruptions_idx4" ON Disruptions 
("status" COLLATE NOCASE ASC);'"
echo "explain select * from Disruptions where status = 2 OR status = 6;" 
| sqlite3 tfl.sqlite

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date

echo "---"

echo "Creating new index without collation"
echo "drop index Disruptions_idx4;" | sqlite3 tfl.sqlite
echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' | 
sqlite3 tfl.sqlite
echo "explain select * from Disruptions where status = 2 OR status = 6;" 
| sqlite3 tfl.sqlite

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date
echo "select * from Disruptions where status = 2 OR status = 6;" | 
sqlite3 tfl.sqlite > /dev/null

date

echo "---"

echo "Trying SELECT statement with UNION ALL"
echo "explain select * from Disruptions where status = 2 OR status = 6;" 
| sqlite3 tfl.sqlite

date
echo "select * from Disruptions  where status = 2 UNION ALL select * 
from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null

date
echo "select * from Disruptions  where status = 2 UNION ALL select * 
from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null

date


On 17 Mar 2017, at 11:30, Hick Gunter wrote:

Nerds with chronic byte code affinity like myself would like to see 
the output of "explain" (without "query plan"), i.e. the SQLite 
bytecode produced. I guess the query with OR will have a subprogram 
called once for each status value, whereas I expect the query with 
UNION ALL to have 2 copies of the search (which would not affect the 
run time) and maybe even a temporary table of results (which would 
take longer and use more memory).


-Ursprüngliche Nachricht-
Von: sqlite-users 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von 
Rob Willett

Gesendet: Freitag, 17. März 2017 12:19
An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index 
question ...


Gunter, Simon,

Thanks for the replies, You both seem to be drilling into the 
collation sequence as a possible issue. We now have a new index and we 
have just run the query again


sqlite> analyze;
sqlite> drop index Disruptions_idx4;
sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
sqlite> explain query plan select * from Disruptions where status = 2 
OR

status = 6;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

So we have a different response from the query planner, which I think 
is good.


If we use your other example

sqlite> explain query plan select * from Disruptions  where status = 2
UNION ALL select * from Disruptions where status = 6;
selectid|order|from|detail
1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

I'm not sure which query is going to be faster. We'll have to try and 
see.


Your last suggestion of "select * from Disruptions  where status =2 
COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logically 
equivalent to "explain query plan select * from Disruptions where 
status = 2 OR status = 6;" now we have removed the collation from the 
index.


sqlite> explain query plan select * from Disruptions  where status =2
COLLATE NOCASE or status = 6 COLLATE NOCASE;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

I'll check if we require all the fields, we require many (which I 
agree is not all) of the fields. Following this logic through, does 
this mean that it will do more file access bringing the records in 
from the file system?


The collation issue seems to be an artifact of the way Navcat for 
SQLite works. I suspect we need to be more careful about how we use 
the tool.



[sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett

Hi,

We've been struggling with a complex query that we have written. One of 
the elements of this complex query is a select statement that doesn't 
appear to use an index when we *think* it should do. We're not going to 
ask you to debug a large complex SQL query (unless you have nothing else 
to do today) but we're working our way through the query trying to 
understand where we've gone wrong. So we've broken down the query and 
are trying each section to see what it performs (or rather doesn't 
perform like).


The sub query is a simple select on a large table, Disruptions. The 
Disruptions table has 180M rows of data.


The schema for it is here. We've pulled it straight from Navicat for 
SQLite.


```
CREATE TABLE "Disruptions" (
 "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
 "version" integer NOT NULL,
 "Disruption_id" INTEGER NOT NULL,
 "status" integer NOT NULL,
 "severity" integer NOT NULL,
 "levelOfInterest" integer NOT NULL,
 "category" integer NOT NULL,
 "subCategory" integer NOT NULL,
 "startTime" TEXT NOT NULL,
 "endTime" text NOT NULL,
 "location" integer NOT NULL,
 "corridor" integer NOT NULL,
 "comments" integer NOT NULL,
 "currentUpdate" integer NOT NULL,
 "remarkTime" TEXT NOT NULL,
 "lastModTime" TEXT NOT NULL,
 "CauseAreaPointX" real NOT NULL,
 "CauseAreaPointY" real NOT NULL,
 "Direction" TEXT
);
INSERT INTO "main".sqlite_sequence (name, seq) VALUES ("Disruptions", 
'184626834');


-- 
--  Indexes structure for table Disruptions
-- 
CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE 
NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE ASC, 
"category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC, 
"levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, 
"subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id" 
COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE 
NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE 
NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE 
NOCASE ASC);
CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE 
ASC);


PRAGMA foreign_keys = true;
```

As part of the larger more complex query, we are executing the query

```
select * from Disruptions where status = 2 OR status = 6;
```

Status is one of six values, 1 to 6 and is probably not evenly 
distributed across the 180M rows.


If we do

```
sqlite> explain query plan select * from Disruptions where status = 2 OR 
status = 6;

selectid|order|from|detail
0|0|0|SCAN TABLE Disruptions
```

We can see that table scanning a 180M records is going to be slow, no 
matter what the rest of the complex query is like.


We have an index Disruptions_idx4 which we *think* should speed it up, 
but the query plan doesn't seem to take this into account.


We think that only having six values of Status means that the speed up 
from the index is not going to be fantastic but every little helps.


We have run analyze on the database and that hasn't helped.

Our gut feeling at this moment is that we have the design structure 
wrong in our DB and we're going to have to take a long hard look at what 
we're doing, do a redesign and a rebuild as we simply got it wrong from 
the beginning. Hindsight is wonderful ;) In the interim (as this is a 
2-3 month job), we need to speed our query up from 90 mins down to 
something in the tens of mins.


Any suggestions very much welcomed,

Thanks

Rob
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

2017-03-17 Thread Rob Willett

Simon,

Thank you for the quick response. We'll do this. We're just waiting for 
another analyze to finish. Our current DB is 45GB, the query works OK on 
smaller databases, but at this size its very slow, thats why we have a 
bad feeling that we're tinkering on the edges and we need a new DB 
design :(


The reason for the collation is the way NavCat SQLite works, its 
difficult to get it without the collate but we'll do this direct from 
the command line.


We'll post back in an hour or so as it'll probably take that long to run 
:)


Rob

On 17 Mar 2017, at 10:27, Simon Slavin wrote:

On 17 Mar 2017, at 10:20am, Rob Willett <rob.sql...@robertwillett.com> 
wrote:


CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE 
NOCASE ASC);


[…]

As part of the larger more complex query, we are executing the query

```
select * from Disruptions where status = 2 OR status = 6;


The schema for the table says that "status" is INTEGER.
You are supplying numbers as arguments.
Those two match and should create no problem.

But your index has a collation order which is usually used for text.  
I don’t see that it is obviously wrong, but it does look a little 
weird.


Try creating another index which is just on "status", without the 
COLLATE clause.

Then do another ANALYZE, then try the SELECT again.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite Port/ODBC Driver Question

2017-07-10 Thread Rob Willett

Vishal,

SQLite isn't a traditional client/server relational database, therefore 
there isn't a port to open up. It runs on a local machine.


Now there are wrappers around SQLite to extend it, I assume this ODBC 
driver is one of them.


I suspect people here *may* know the answer regarding any ports the ODBC 
driver uses, but you may be better off asking the maintainer of the ODBC 
driver.


Rob

On 10 Jul 2017, at 1:31, Shukla, Vishal wrote:


Hi,
Am trying to open a firewall to the machine having sqlite database. 
Does the SQLite database use a specific port number ? If not, then 
does the ODBC connection to SQLite using ODBC driver use a port ?


Any help will be greatly appreciated.

SQLite ODBC Driver:
http://www.ch-werner.de/sqliteodbc/sqliteodbc.exe

Regards,
Vishal Shukla

Confidential communication
Westpac Banking Corporation (ABN 33 007 457 141)
Westpac Institutional Bank is a division of Westpac Banking 
Corporation

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite Port/ODBC Driver Question

2017-07-10 Thread Rob Willett

It depends on what you mean remotely.

By itself SQLite doesn't have any networking library built in. It's an 
embedded database.


You can put application wrappers around the database, I believe that 
wrappers exist to make SQLIte into a true client/server but thats 
additional code. Also there is ODBC, but there's nothing (AFAIK) in the 
actual codebase itself that allows any remote connectivity.


Clearly you can put Apache/Nginx/PHP/SQlite into a software stack and 
make it work, we actually use Nginx/Mojolicious/SQLite as our platform 
stack but there's nothing in there that allows any remote access to 
SQLite.


If you are talking about hosting the database on a network volume, I 
would recommend that you read this


https://sqlite.org/whentouse.html

The very first paragraph states what Sqlite can do. I would also pay 
close attention to
"If there are many client programs sending SQL to the same database over 
a network, then use a client/server database engine instead of SQLite. 
SQLite will work over a network filesystem, but because of the latency 
associated with most network filesystems, performance will not be great. 
Also, file locking logic is buggy in many network filesystem 
implementations (on both Unix and Windows). If file locking does not 
work correctly, two or more clients might try to modify the same part of 
the same database at the same time, resulting in corruption. Because 
this problem results from bugs in the underlying filesystem 
implementation, there is nothing SQLite can do to prevent it.


A good rule of thumb is to avoid using SQLite in situations where the 
same database will be accessed directly (without an intervening 
application server) and simultaneously from many computers over a 
network."


Just my 2p worth,

Rob

On 10 Jul 2017, at 14:14, Igor Korot wrote:


Rob,

On Mon, Jul 10, 2017 at 7:06 AM, Rob Willett
<rob.sql...@robertwillett.com> wrote:

Vishal,

SQLite isn't a traditional client/server relational database, 
therefore

there isn't a port to open up. It runs on a local machine.


I believe SQLite can successfully be run remotely.

Thank you.



Now there are wrappers around SQLite to extend it, I assume this ODBC 
driver

is one of them.

I suspect people here *may* know the answer regarding any ports the 
ODBC
driver uses, but you may be better off asking the maintainer of the 
ODBC

driver.

Rob


On 10 Jul 2017, at 1:31, Shukla, Vishal wrote:


Hi,
Am trying to open a firewall to the machine having sqlite database. 
Does
the SQLite database use a specific port number ? If not, then does 
the ODBC

connection to SQLite using ODBC driver use a port ?

Any help will be greatly appreciated.

SQLite ODBC Driver:
http://www.ch-werner.de/sqliteodbc/sqliteodbc.exe

Regards,
Vishal Shukla

Confidential communication
Westpac Banking Corporation (ABN 33 007 457 141)
Westpac Institutional Bank is a division of Westpac Banking 
Corporation

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] -shm and -wal files left after trivially simple select statement from command line

2017-05-15 Thread Rob Willett

Hi,

We've encountered what we think is an odd situation and we can't find 
any explanation for why this is. We're also not sure if its a problem or 
not.


A brief summary is that we are doing a major database upgrade and are 
doing (lots) of testing. The box is a Linux box running Ubuntu 14.04 LTS 
Server. This is a standard box we have used for a few years, we *might* 
move to 16 LTS shortly.


As part of the testing we noticed that -shm and -wal files were being 
left after we used sqlite3 on the command line. This puzzled us as we 
didn't see any errors in our test scripts.


We then narrowed this problem down to doing a simple SELECT statement on 
a table.


e.g. This is our starting state. Note that nothing is touching these 
database as they are isolated in a directory, so there is no other 
process playing with them.


root@demonotifications:/jambuster/notifications/upgrade_test# ls -lt
total 66892
-rw-r--r-- 1 root root 34217984 May 15 10:38 accountmanagement.sqlite
-rw-r--r-- 1 root root58567 May 15 10:31 upgradedb.sql
-rw-r--r-- 1 root root 34217984 May 13 20:38 
accountmanagement.sqlite.orig


root@demonotifications:/jambuster/notifications/upgrade_test# cp 
accountmanagement.sqlite.orig accountmanagement.sqlite


root@demonotifications:/jambuster/notifications/upgrade_test# sqlite3 
accountmanagement.sqlite

-- Loading resources from /root/.sqliterc

SQLite version 3.8.2 2013-12-06 14:53:30
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select * from device;
UUID isValid Version 
WhatAmI Application
---  --  
--  --  --
<  1   1.1 
ios_app JambusterForLondon

.. Hundreds more lines of device data which looks correct
<  1   1.1 
ios_app JambusterForLondon


sqlite> .exit
root@demonotifications:/jambuster/notifications/upgrade_test# ls -lt
total 66924
-rw-r--r-- 1 root root32768 May 15 10:48 
accountmanagement.sqlite-shm
-rw-r--r-- 1 root root0 May 15 10:48 
accountmanagement.sqlite-wal

-rw-r--r-- 1 root root 34217984 May 15 10:47 accountmanagement.sqlite
-rw-r--r-- 1 root root58567 May 15 10:31 upgradedb.sql
-rw-r--r-- 1 root root 34217984 May 13 20:38 
accountmanagement.sqlite.orig

root@demonotifications:/jambuster/notifications/upgrade_test


As can be seen above, we have accountmanagement.sqlite-shm and 
accountmanagement.sqlite-wal files left which is odd. All of this from a 
select statement.


If we do an pragma integrity check we get

  integrity_check
  ---
  ok

All the reading we have done seems to indicate that having -shm and -wal 
files indicates an error, if this was our code, we'd be looking deep 
into that but a simple select statement shouldn't cause this sort or 
error, can it?


The select statement has a number of foreign key delete cascades but 
thats it, and as we are not deleting anything this shouldn't be an issue 
anyway.


Do we even have a problem? Going back into the database and closing it 
again, just with the command line utility sqlite3 doesn't clear the 
files.


Any help or suggestions welcomed.

Thanks

Rob
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


  1   2   >