Re: [sqlite] Backup and integrity check questions

2018-07-29 Thread Rowan Worth
On 28 July 2018 at 05:41, Rune Torgersen  wrote:

> > Rowan Worth Thu, 26 Jul 2018 22:02:50 -0700
> >
> > On 26 July 2018 at 05:56, Rune Torgersen  wrote:
> >
> > > The databases have been opened with two connections (one for reads, one
> > > for writes), and use the following options:
> > > sqlite3_busy_timeout(mDbConn, 500);
> > > sqlite3_exec(mDbConn, "PRAGMA locking_mode = EXCLUSIVE;", 0, 0, 0);
> > >
> >
> > Surely this effectively reduces your number of connections to one?
>
> No, both connections are from within the same application, and have seemed
> to work just fine for about 10 years now...
>

If the write connection is held open I honestly can't see how that's
possible, unless it is never actually used to write.

I tried a quick test with two connections on different threads configured
like this, and as soon as the writer obtains an exclusive lock for the
first time it holds onto it forever (as described in the PRAGMA docs). From
that point on the reader cannot even obtain a SHARED lock to read the
database, and spends the rest of its life in the busy handler.

Note that while the pragma talks about excluding other "processes" from
accessing the DB, "process" and "thread" are interchangeable as far as
sqlite is concerned. That is documented here:
https://www.sqlite.org/lockingv3.html

Last paragraph of section 2.0:

>
> The pager module effectively controls access for separate threads, or
> separate processes, or both. Throughout this document whenever the word
> "process" is written you may substitute the word "thread" without changing
> the truth of the statement.
>


Ah, unless you're in shared cache mode (which PRAGMA read_uncommitted would
suggest), which I don't really know anything about...

-Rowan
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficiency of partial indexes

2018-07-29 Thread D Burgess
On the systems I use, for log files, you cant beat a text file for
integrity and speed.

I would suffer the slower queries and use something like the CSV
extension on your text log files.

On Mon, Jul 30, 2018 at 10:44 AM, Simon Slavin  wrote:
> I have a particular logging task which is time-critical, both in reading and 
> writing.  It runs on Linux, but not a fast desktop computer, more like the 
> slow kind built into your WiFi router, with limited solid state storage.  I 
> can use any version of the SQLite C API I want, and currently use a minimal 
> build which omits almost all optional parts.  Remember: on this setup 
> processing is slow, storage space limited.
>
> Log entries are written from six a minute (peak time) to one an hour (night). 
>  In some use-cases queries will be frequent (a few a day).  In others, users 
> are not interested and weeks may pass without a query.  It has the age-old 
> problem when you need random access to the data:
>
> A) Create no indexes.  Fast writing, but slow when looking things up.
> B) Create indexes.  Slower writing, but faster when looking things up.
>
> Naturally, I want it all.  I'd been toying with the idea that initial writing 
> should be to a text file, and data flushed to SQLite just before a query is 
> executed.  But a recent SQLite innovation changes things.  Instead of using a 
> text file I can use partial indexes.
>
> So I add a column to my table called "searchable".  It starts off set to 
> FALSE.  I replace my indexes with partial indexes which count only 
> "searchable" rows.  When a search is done, before actually doing the search I 
> do
>
>UPDATE MyTable SET searchable = TRUE 
>
> This, theoretically, updates the indexes.  Does anyone have experience with 
> this ?  The programming is simpler if I use this trick, since I don't have to 
> handle and flush a text file.  But I haven't used partial indexes before.  
> Any advice or suggestions before I do testing ?  Is there a better way I've 
> missed entirely ?
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Efficiency of partial indexes

2018-07-29 Thread Simon Slavin
I have a particular logging task which is time-critical, both in reading and 
writing.  It runs on Linux, but not a fast desktop computer, more like the slow 
kind built into your WiFi router, with limited solid state storage.  I can use 
any version of the SQLite C API I want, and currently use a minimal build which 
omits almost all optional parts.  Remember: on this setup processing is slow, 
storage space limited.

Log entries are written from six a minute (peak time) to one an hour (night).  
In some use-cases queries will be frequent (a few a day).  In others, users are 
not interested and weeks may pass without a query.  It has the age-old problem 
when you need random access to the data:

A) Create no indexes.  Fast writing, but slow when looking things up.
B) Create indexes.  Slower writing, but faster when looking things up.

Naturally, I want it all.  I'd been toying with the idea that initial writing 
should be to a text file, and data flushed to SQLite just before a query is 
executed.  But a recent SQLite innovation changes things.  Instead of using a 
text file I can use partial indexes.

So I add a column to my table called "searchable".  It starts off set to FALSE. 
 I replace my indexes with partial indexes which count only "searchable" rows.  
When a search is done, before actually doing the search I do

   UPDATE MyTable SET searchable = TRUE 

This, theoretically, updates the indexes.  Does anyone have experience with 
this ?  The programming is simpler if I use this trick, since I don't have to 
handle and flush a text file.  But I haven't used partial indexes before.  Any 
advice or suggestions before I do testing ?  Is there a better way I've missed 
entirely ?

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sqlite Sharding HOWTO

2018-07-29 Thread Simon Slavin
On 29 Jul 2018, at 11:39pm, Gerlando Falauto  wrote:

> In the current use case thre's a single process. The way I see it, in
> the near future it would probably increase to 3-4 processes,
> each doing 10-100 writes per second or so. Each write would be around
> 1KB-20KB (one single text field, I guess).
> I wonder if writing data in batches would be helpful though.

The critical path in SQLite write operations is your storage device.  Your 
entire database is on one drive, your CPU has one path to it, and the 
processing code in SQLite is very efficient.  Consequently most of the waiting 
time during API execution is spent waiting for hardware to say "Okay, I've 
written that.".  The number of indexes can make a big difference since it 
increases the number of database pages which must be written.

So you can make a good guess whether sharding, multi-processing, or other 
parallel shenanigans will help.  Just imagine it's 10% processing and 90% 
hardware access.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sqlite Sharding HOWTO

2018-07-29 Thread R Smith

On 2018/07/30 12:39 AM, Gerlando Falauto wrote:



The question that needs to be answered specifically is: How many data
input sources are there? as in how many Processes will attempt to write
to the database at the same time? Two processes can obviously NOT write
at the same time, so if a concurrent write happens, one process has to
wait a few milliseconds. This gets compounded as more and more write
sources are added or as write-frequency increases.

When you say "a few milliseconds", how exactly do you measure that?

What is the timeframe where the second process has to wait if the
first one is doing a batch of a thousand writes?


This is very hard to say, hence my other questions in this regard. The 
time needed depends firstly on how much data needs to be inserted, then, 
how many inserts are required (if batched), then how many Indexes do you 
have on the table (each Index need to be modified for each insert), then 
does the inserts contain only data, or perhaps functions that need to be 
calculated (especially for UDFs) and then, are there any triggers that 
need to run upon Inserting, and most importantly of all, how quick is 
the hardware that has to drink all this data?  (The hardware IO is the 
typical bottleneck).


A query doing a single insert of a few bytes with no Indexes, no 
triggers, no functions will be stupendously fast, whereas any increase 
in one or more of the above will slow things down. How much exactly is 
something you need to test, any guesswork will not be useful. What I can 
say is that if you don't have any Unique Indexes or triggers, the insert 
speed /should/ not change much with size.





To get to the point: With the above in mind, do you still feel like you
need to go the sharding route? Could you perhaps quote figures such as
how many bytes would a typical data update be? How many updates per
second, from how many different Processes?  Do you have a maintenance
Window (as in, is there a time of day or on a weekend or such that you
can go a few minutes without logging so one can clean up the old logs,
perhaps Vacuum and re-Analyze?

In the current use case thre's a single process. The way I see it, in
the near future it would probably increase to 3-4 processes,


Why? Are you adding more other logging sources? Or is this just a 
feeling that more processes would handle better in some way?



each doing 10-100 writes per second or so. Each write would be around
1KB-20KB (one single text field, I guess).
100 x 20KB inserts = 2MB Inserted x 4 processes = 8MB per second 
inserted issuing around 400 file-locks... Let's assume only 1 Index 
(apart from the rowid) and no triggers, then this should be easily 
handled, even on slower hardware. This is significant enough though that 
it should be tested before-hand with real-world operating parameters in 
place.



I wonder if writing data in batches would be helpful though.
Most certainly. Less file-lock operations is better. Plan anything you 
do to collect logs together and fire them into the DB at say 1 to 5 
second intervals in single transactions.
How to decide the interval: How much data-time can you afford to lose if 
the system crashes? How fast is the data needed downstream by something 
reading the DB?



But yeah, I guess sharding makes much less sense right now. :-)


Agreed.

Cheers,
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sqlite Sharding HOWTO

2018-07-29 Thread Keith Medcalf

>In the current use case thre's a single process. The way I see it, in
>the near future it would probably increase to 3-4 processes,
>each doing 10-100 writes per second or so. Each write would be around
>1KB-20KB (one single text field, I guess).
>I wonder if writing data in batches would be helpful though.

Why do you think you want more than one process writing to the database?

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sqlite Sharding HOWTO

2018-07-29 Thread Gerlando Falauto
Hi Ryan,

thank you for your reply.

>I think you are perhaps missing a core idea here - the only use-case
>that requires sharding is where you have very high write-concurrency
>from multiple sources, and even then, the sharding, in order to have any
>helpful effect, needs to distinguish "write sources", not events or
>time-frames or such.

"Write sources" *could* be different in my case (not in the first
version though).

>SQLite will very happily run a 20GB (or much larger) database written to
>from many sources, and very happily delete old data from it and pump new
>data in without much in the way of "help" needed, AND then produce fast
>queries without much fanfare.

I sort of realized that in the meantime (at least for the fast queries part) ;-)

>The question that needs to be answered specifically is: How many data
>input sources are there? as in how many Processes will attempt to write
>to the database at the same time? Two processes can obviously NOT write
>at the same time, so if a concurrent write happens, one process has to
>wait a few milliseconds. This gets compounded as more and more write
>sources are added or as write-frequency increases.

When you say "a few milliseconds", how exactly do you measure that?

What is the timeframe where the second process has to wait if the
first one is doing a batch of a thousand writes?


>If a single process is writing data to a single DB from many different
>sources, there is zero reason for sharding. If many processes are
>running all with their own connection to the DB, AND they have high
>concurrency (i.e. high frequency updates from many DB connections which
>heightens the incidence of simultaneous write attempts to a single DB
>file) then it starts becoming a good idea to allocate two or more DB
>files so that we split the connections between those files, effectively
>lowering the write-collision frequency for a single file.

>Incidentally, all DBs offer some form of concurrency alleviation (load
>balancing, partitioning, etc.) which often also serves other purposes.

>To get to the point: With the above in mind, do you still feel like you
>need to go the sharding route? Could you perhaps quote figures such as
>how many bytes would a typical data update be? How many updates per
>second, from how many different Processes?  Do you have a maintenance
>Window (as in, is there a time of day or on a weekend or such that you
>can go a few minutes without logging so one can clean up the old logs,
>perhaps Vacuum and re-Analyze?

In the current use case thre's a single process. The way I see it, in
the near future it would probably increase to 3-4 processes,
each doing 10-100 writes per second or so. Each write would be around
1KB-20KB (one single text field, I guess).
I wonder if writing data in batches would be helpful though.

But yeah, I guess sharding makes much less sense right now. :-)

>This will allow much better advice, and someone on here is bound to
>already have something just like that running and will be able to
>quickly give some experience hints.

Thank you again!

Gerlando
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Possible memory leak in shell.c

2018-07-29 Thread Richard Hipp
On 7/29/18, Patricia Monteiro  wrote:
> I've been testing the latest version of SQLite with 3 static analysis tools
> (Infer, Clang Static Analyzer and Cppcheck) and they all reported a memory
> leak error in shell.c .

The shell code has now been modified so that it calls exit(1) if
realloc() ever returns NULL.  I think it was probably already doing
this, eventually, but perhaps not quickly enough for your static
analysis tools to figure that out.  So I've adjusted the code so that
it dies quickly and unambiguously now. Hopefully this will be enough
of a hint to your static analysis tools to get them to stop
complaining.
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug: Problem with sqlite3_prepare_v2

2018-07-29 Thread Richard Hipp
On 7/29/18, Jürgen Palm  wrote:
> Hi,
>
> in an application I encountered a problem, where the changing of the
> structure of a table by dropping and recreating it via one connection to
> a db was not directly seen by another connection to the same db.

Detecting a schema change requires starting a read transaction.  The
sqlite3_step() interface (usually) starts a read transaction, and so
it will detect the schema change, and will cause the statement to
automatically adjust to the new schema.  But sqlite3_column_count()
does *not* start a new transaction.  The sqlite3_column_count() bases
its answer on whatever schema was current during last transaction for
that connection.


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] virtual tables, xBestIndex/xFilter question

2018-07-29 Thread David Fletcher

First, thanks in advance for any help offered.

I'm pretty comfortable using sqlite but just now starting to develop 
with virtual tables.
I'm running into troubles and I'm pretty sure it's because my mental 
model of sqlite is wimpy.


I'm trying to build a tool that interfaces to C++ objects in memory that 
are basically
arranged as a tree (or graph or mesh).  For example, let's suppose I 
have a class called
A and objects of this class can own any number of B objects.  Every A 
and B object has

an ObjectID.  Each object also has an owner.

In C/C++ I would do this sort of thing:

    A* pA = GetOjectGivenID(12345);  B* pB;
    BIterator iter(pA);

    while ((pB = iter.Next()) != 0) {  assert(pB->pOwner == pA); }

That's simple enough.  I'm trying to achieve the same effect using 
virtual tables,

each of which follow this basic pattern:

    create table A(OID integer primary key, Owner integer, Attr1 
integer, Attr text, ...) without rowid;


That is, the first column of every table is OID, the object ID.  For 
most of the tables, there's also
a column called Owner and maybe other columns that act as foreign keys, 
too.  (I haven't actually

gone to the trouble to denote the columns are foreign keys just yet.)

Everything works when I have just a single table with SQL like this: 
select * from A where A.OID == 12345;
The statement will walk through all of the rows.  Behind the scenes, a 
C++ iterator is

doing all of the work.

I'm struggling with joins.  This statement doesn't work in my application:
    select * from A join B on A.OID == B.Owner where A.OID == 12345;

The xBestIndex function is called a couple of times if this is the first 
time
the table has been defined.  (xBestIndex is called fewer times if the 
table has
been seen before.  I'm guessing sqlite is caching some info.)  When the 
xFilter
function is called, it's being handed the virtual table for B, not A.  
Because
there's no object ID to act as the iterator's source, the xFilter 
function ends in failure.
I'm struggling to find aConstraintUsage settings that will cause A, the 
left-hand

table in my mental model, to be presented before B.

I'm setting the estimated number of rows correctly, I think.  I think the
argvIndex values are being set correctly, too, I think.  At least sqlite 
isn't complaining

about malformed expressions.  But, something is missing.

I spent some time looking at other examples but most of them seem to use
other tables within sqlite to hold the data vs. objects in memory. I did 
stumble
across https://osquery.io/  yesterday, which looks interesting and 
useful.  It's also
somewhat complex and I haven't delved into its xBestIndex/xFilter 
implementations

just yet.

I think I'd benefit from looking at any application that uses C/C++ objects
in memory plus iterators to traverse objects in a tree/graph/mesh/etc.  Can
anyone point out projects like this?

I think I'd also benefit with documentation that shows more of what's 
happening
behind the scenes with xBestIndex/xFilter.  Can anyone point me to 
documentation
that is more detailed?  I've looked at the sqlite docs, Jay Kreibich's 
book& Mike Owens

book and "Query Anything" documentation.

Thanks,

David


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to do autosaving (commit data to disk) within an transaction?

2018-07-29 Thread Robert M. Münch
On 29 Jul 2018, at 14:07, Simon Slavin wrote:

> On 29 Jul 2018, at 12:56pm, Robert M. Münch  
> wrote:
>
>> I want to add an autosave feature that stores some data at critical 
>> points/specific intervals. This data of course should be commited to disk, 
>> to survive any app crashes.
>
> Use a SAVEPOINT instead of COMMIT: 
> 

That won’t help. SAVEPOINTS are for undoing things. I want to write one thing 
definitely to disk to have it available in case of a crash. If the app crashes, 
all SAVEPOINTS are gone.

> Your use of an open transaction here is sort-of abuse of how SQLite works.  
> You have no real control over what is flashed to disk.

I do, that’s why I use an open transaction. In this case, everything is flushed 
into the .journal file, which is automatically rolled back if it exists on the 
next app start.

> Rather than relying on side-effects of transactions it might be better to 
> make your own data structure which reflects which tranche of data a 
> transaction is in.

Beside the brute force approach of copying the unmodified file before opening 
and changing it (without a global open transaction than) the session extension 
seems to be the way to go. So creating changesets which are removed on user 
save. If the app crashes, the user can choose to start from the last manual 
save point or from the latests point minimizing data loss.

-- 

Robert M. Münch


signature.asc
Description: OpenPGP digital signature
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sqlite Sharding HOWTO

2018-07-29 Thread Abroży Nieprzełoży
> Ideally, I would like to have a way of "seeing" the whole dataset with a
> single query spanning all  available databases.

I think swarmvtab may be helpful.
https://www.sqlite.org/swarmvtab.html


2018-07-29 10:34 GMT+02:00, Gerlando Falauto :
> Hi,
>
> I'm totally new to sqlite and I'd like to use it for some logging
> application on an embedded
> linux-based device.  Data comes from multiple (~10), similar sources at a
> steady rate.
> The rolling data set would be in the size of 20 GB. Such an amount of
> storage would suffice to retain data from the previous week or so.
>
> Reading the documentation https://www.sqlite.org/whentouse.html somehow
> suggests the usage of sharding:
>
>>Concurrency is also improved by "database sharding": using separate
> database files for
>> different subdomains. For example, the server might have a separate
> SQLite database for each
>> user, so that the server can handle hundreds or thousands of simultaneous
> connections, but
>> each SQLite database is only used by one connection.
>
> In my case I would be doing sharding on the data source and/or the day of
> the timestamp, so to have individual files in the size of a few hundreds MB.
> This way, deleting the oldest data would be as easy as deleting the
> corresponding file.
>
> However, I did not find any reference whatsoever on sharding being
> available _within_ sqlite.
> Ideally, I would like to have a way of "seeing" the whole dataset with a
> single query spanning all  available databases.
>
> Would that be at all feasible? I saw the "attach database" statement which
> seems closely related but whose use-case I honestly don't get.
> If not, is there any state-of-the-art adapter layer that would be
> performing (and hide) the underlying sharding? I don't really care about
> query performance (e.g. if such a global query spanning 20 different
> databases is indeed performed serially, thereby take 20 times longer), I
> just need a way of hiding this detail.
>
> I saw some reference to SPHiveDB
> https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg43575.html
> but the project looks stale (9 years since the last commit).
>
> I also looked into AtomDB but it looks overly complicated for my use-case
> (single, embedded server), plus it somehow requires the underlying sharding
> to be totally exposed.
>
> Any ideas?
> Gerlando
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to do autosaving (commit data to disk) within an transaction?

2018-07-29 Thread Simon Slavin
On 29 Jul 2018, at 12:56pm, Robert M. Münch  wrote:

> I want to add an autosave feature that stores some data at critical 
> points/specific intervals. This data of course should be commited to disk, to 
> survive any app crashes.

Use a SAVEPOINT instead of COMMIT:



Your use of an open transaction here is sort-of abuse of how SQLite works.  You 
have no real control over what is flashed to disk.  Rather than relying on 
side-effects of transactions it might be better to make your own data structure 
which reflects which tranche of data a transaction is in.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-29 Thread Simon Slavin
Please try moving your COLLATE clauses into the table definition.  e.g. instead 
of

CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE 
NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE NOCASE ASC);

Your table definition should have

 "version" integer NOT NULL COLLATE NOCASE,
 "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
...
 "location" integer NOT NULL COLLATE NOCASE,

and the index should be

CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
("version" ASC, "Disruption_id" ASC, "location" ASC);

Once data has been entered, do ANALYZE.  This step may take a long time.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] How to do autosaving (commit data to disk) within an transaction?

2018-07-29 Thread Robert M. Münch
When the user opens a file I start an outer-most transaction to be able to 
revert the file. So far so good.

I want to add an autosave feature that stores some data at critical 
points/specific intervals. This data of course should be commited to disk, to 
survive any app crashes.

But how can I commit stuff to disk, while an outer-most transaction is open? Is 
there something like an „UPDATE IMMEDIATELY…“ or so to bypass an open 
transaction? How would one do this?

I would like to avoid using a 2nd database-file if possible.

-- 

Robert M. Münch


signature.asc
Description: OpenPGP digital signature
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sqlite Sharding HOWTO

2018-07-29 Thread R Smith


On 2018/07/29 10:34 AM, Gerlando Falauto wrote:

Hi,

I'm totally new to sqlite and I'd like to use it for some logging


Welcome Gerlando. :)


application on an embedded
linux-based device.  Data comes from multiple (~10), similar sources at a
steady rate.
The rolling data set would be in the size of 20 GB. Such an amount of
storage would suffice to retain data from the previous week or so.

Reading the documentation https://www.sqlite.org/whentouse.html somehow
suggests the usage of sharding:


Concurrency is also improved by "database sharding": using separate

database files for

different subdomains. For example, the server might have a separate

SQLite database for each

user, so that the server can handle hundreds or thousands of simultaneous

connections, but

each SQLite database is only used by one connection.

In my case I would be doing sharding on the data source and/or the day of
the timestamp, so to have individual files in the size of a few hundreds MB.
This way, deleting the oldest data would be as easy as deleting the
corresponding file.


I think you are perhaps missing a core idea here - the only use-case 
that requires sharding is where you have very high write-concurrency 
from multiple sources, and even then, the sharding, in order to have any 
helpful effect, needs to distinguish "write sources", not events or 
time-frames or such.


SQLite will very happily run a 20GB (or much larger) database written to 
from many sources, and very happily delete old data from it and pump new 
data in without much in the way of "help" needed, AND then produce fast 
queries without much fanfare.


The question that needs to be answered specifically is: How many data 
input sources are there? as in how many Processes will attempt to write 
to the database at the same time? Two processes can obviously NOT write 
at the same time, so if a concurrent write happens, one process has to 
wait a few milliseconds. This gets compounded as more and more write 
sources are added or as write-frequency increases.


If a single process is writing data to a single DB from many different 
sources, there is zero reason for sharding. If many processes are 
running all with their own connection to the DB, AND they have high 
concurrency (i.e. high frequency updates from many DB connections which 
heightens the incidence of simultaneous write attempts to a single DB 
file) then it starts becoming a good idea to allocate two or more DB 
files so that we split the connections between those files, effectively 
lowering the write-collision frequency for a single file.


Incidentally, all DBs offer some form of concurrency alleviation (load 
balancing, partitioning, etc.) which often also serves other purposes.


To get to the point: With the above in mind, do you still feel like you 
need to go the sharding route? Could you perhaps quote figures such as 
how many bytes would a typical data update be? How many updates per 
second, from how many different Processes?  Do you have a maintenance 
Window (as in, is there a time of day or on a weekend or such that you 
can go a few minutes without logging so one can clean up the old logs, 
perhaps Vacuum and re-Analyze?


This will allow much better advice, and someone on here is bound to 
already have something just like that running and will be able to 
quickly give some experience hints.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-29 Thread Rob Willett

Thanks for the mail.

We ran analyze with no indexes, made no difference.

We recreated the indexes and ran analyze again. The very long commit is 
back, this time it took 14 secs :)


It does appear that the indexes have something to do with this whereby 
they have not been an issue to now.


Rob

On 29 Jul 2018, at 11:45, J. King wrote:

On July 29, 2018 5:47:29 AM EDT, Rob Willett 
 wrote:

John,

Thanks for the prompt reply and a very good question..

We've dropped the indexes and the commit is now very quick, approx 
two

seconds

However the overall performance of the run is much the same as other
areas of the code are now significantly slower, whereas before they
were
quick.

Where were you going with that question?


Might ANALYZE help?
--
J. King
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-29 Thread Rob Willett

Anton,

Dropped the indexes and created them without order.

We'll need to look at what your second para means. It could be a major 
and massive change.


Rob



On 29 Jul 2018, at 11:52, Djelf wrote:


Rob,

Try creating indexes without order.

Or, try to make a column with a hash of the values entering the index 
and
search for the value by the index of. This will complicate the logic 
of your

program, but it will decrease the volume of the database, and possibly
significantly speed up both reading and writing.

---
Anton Azanov



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-29 Thread Djelf
Rob,

Try creating indexes without order.

Or, try to make a column with a hash of the values entering the index and
search for the value by the index of. This will complicate the logic of your
program, but it will decrease the volume of the database, and possibly
significantly speed up both reading and writing.

---
Anton Azanov



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Possible memory leak in shell.c

2018-07-29 Thread Patricia Monteiro
I've been testing the latest version of SQLite with 3 static analysis tools
(Infer, Clang Static Analyzer and Cppcheck) and they all reported a memory
leak error in shell.c .

The Infer error report is as follows:

shell.c:585: error: MEMORY_LEAK

memory dynamically allocated by call to `realloc()` at line 585, column 15
is not reachable after line 585, column 7.

  583.   if( n+100>nLine ){

  584. nLine = nLine*2 + 100;

  585. >   zLine = realloc(zLine, nLine);

  586. if( zLine==0 ) return 0;

  587.   }

This error occurs because if the realloc function fails the memory
previously pointed out by zLine becomes inaccessible, you need to ensure
that realloc! = NULL before doing zLine = realloc (zLine, nLine).
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Sqlite Sharding HOWTO

2018-07-29 Thread Gerlando Falauto
Hi,

I'm totally new to sqlite and I'd like to use it for some logging
application on an embedded
linux-based device.  Data comes from multiple (~10), similar sources at a
steady rate.
The rolling data set would be in the size of 20 GB. Such an amount of
storage would suffice to retain data from the previous week or so.

Reading the documentation https://www.sqlite.org/whentouse.html somehow
suggests the usage of sharding:

>Concurrency is also improved by "database sharding": using separate
database files for
> different subdomains. For example, the server might have a separate
SQLite database for each
> user, so that the server can handle hundreds or thousands of simultaneous
connections, but
> each SQLite database is only used by one connection.

In my case I would be doing sharding on the data source and/or the day of
the timestamp, so to have individual files in the size of a few hundreds MB.
This way, deleting the oldest data would be as easy as deleting the
corresponding file.

However, I did not find any reference whatsoever on sharding being
available _within_ sqlite.
Ideally, I would like to have a way of "seeing" the whole dataset with a
single query spanning all  available databases.

Would that be at all feasible? I saw the "attach database" statement which
seems closely related but whose use-case I honestly don't get.
If not, is there any state-of-the-art adapter layer that would be
performing (and hide) the underlying sharding? I don't really care about
query performance (e.g. if such a global query spanning 20 different
databases is indeed performed serially, thereby take 20 times longer), I
just need a way of hiding this detail.

I saw some reference to SPHiveDB
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg43575.html
but the project looks stale (9 years since the last commit).

I also looked into AtomDB but it looks overly complicated for my use-case
(single, embedded server), plus it somehow requires the underlying sharding
to be totally exposed.

Any ideas?
Gerlando
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-29 Thread J. King
On July 29, 2018 5:47:29 AM EDT, Rob Willett  
wrote:
>John,
>
>Thanks for the prompt reply and a very good question..
>
>We've dropped the indexes and the commit is now very quick, approx two 
>seconds
>
>However the overall performance of the run is much the same as other 
>areas of the code are now significantly slower, whereas before they
>were 
>quick.
>
>Where were you going with that question?
>
Might ANALYZE help? 
-- 
J. King
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-29 Thread R Smith
Also, kindly clarify one bit - I'm not sure whether you use the word 
"commit" as an easy substitute for the entire process of updating the DB 
(i.e all SQL INSERT/UPDATE code that runs up to and including the COMMIT 
statement), of if you actually mean the "COMMIT" operation, because the 
things that take time are the updating processes, the "COMMIT" function 
is simply /typically/ a single quick file operation (depending on 
Journal mode of course), and that is typically quick, and if not, might 
give a clue towards the problem.




On 2018/07/29 11:47 AM, Rob Willett wrote:



What is the performance without the indexes?

On Sun, 29 Jul 2018 10:20:11 +0100
"Rob Willett"  wrote:


Update 1

We've copied the disruptions table to a new table, dropped the old
table, copied the new table back in and recreated all the indexes.

Exactly the sam commit performance.

We've also tracked the -shm and -wal files and they are around 5MB in
size.

Mmmm tricky (as Deepthought said).

Rob

On 29 Jul 2018, at 9:14, Rob Willett wrote:


Hi,

Background

We've been refactoring our database to reduce the size of it. Through
some simple logic we've managed to pull out 99% of the data to reduce
the size from 51GB down to approx 600MB. This logic has been to remove
rows that are almost the same but not quite identical. As with all
things, the thinking was the difficult bit, the execution somewhat
easier.

As part of the testing we've been doing, we've now hit on an odd and
weird problem to do with the COMMIT statement. A commit of a few
hundred (circa 600-800) rows takes approx 7 seconds whereas before we
never even noticed it, though we now know it was two seconds before.
Each row is probably 1-2K of data, so its not very much at all.

Details of what we have tried:

1. We've turned synchronous  on and off

PRAGMA synchronous=ON

and thats not made any difference.

2. We are using and have been using WAL mode for years.

PRAGMA journal_mode;
journal_mode
wal

3. We've tested that the server copies OK, we get a consistent
69MB/sec. This is not as fast we would like, but it's the same across
all our virtual servers.

4. We've tested the commit on our existing 60GB database and it takes
2 seconds, which is longer than we thought it would be. The server for
the 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu
14.04. The server we are testing on is a 2GB/2 core test server
running Ubuntu 16.04. Whilst the test server is smaller, we wouldn't
expect it to take 3 times longer to do a commit.

5. The code is identical across the servers. We are running Perl and
the DBI module. The code for doing a commit in Perl::DBI is
 $dbh->do("COMMIT");

  We are getting the expected performance elsewhere on the system and
in the code. It's just the commit that is taking a long time.

6. The code we are committing is adding 600-800 lines to a table that
used to be 200,000,000 rows in size. It's now 400,000 lines in size.
We are wondering if the deletion of the lines has had an impact we
didn't expect. We have vacuumed and analysed the database.

The schema for the table we insert into is

CREATE TABLE IF NOT EXISTS "Disruptions" (
 "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
 "version" integer NOT NULL,
 "Disruption_id" INTEGER NOT NULL,
 "status" integer NOT NULL,
 "severity" integer NOT NULL,
 "levelOfInterest" integer NOT NULL,
 "category" integer NOT NULL,
 "subCategory" integer NOT NULL,
 "startTime" TEXT NOT NULL,
 "endTime" text NOT NULL,
 "location" integer NOT NULL,
 "corridor" integer NOT NULL,
 "comments" integer NOT NULL,
 "currentUpdate" integer NOT NULL,
 "remarkTime" TEXT NOT NULL,
 "lastModTime" TEXT NOT NULL,
 "CauseAreaPointX" real NOT NULL,
 "CauseAreaPointY" real NOT NULL,
 "Direction" TEXT
);
CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE
ASC, "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
"levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
"subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id"
COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
COLLATE NOCASE ASC);
CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
"Disruption_id", "Severity", "levelOfInterest", "category",
"subCategory", "version");

We have checked that this schema is consistent across the databases.

We're about to recreate the table to see if that makes a difference.

Any help or advice welcomed.

Thanks

Rob
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org

Re: [sqlite] Very, very slow commits

2018-07-29 Thread R Smith

On 2018/07/29 11:47 AM, Rob Willett wrote:

John,

Thanks for the prompt reply and a very good question..

We've dropped the indexes and the commit is now very quick, approx two 
seconds


However the overall performance of the run is much the same as other 
areas of the code are now significantly slower, whereas before they 
were quick.


Where were you going with that question?


Possibly you have too many indexes.

An Index is an expensive thing to maintain for a DB, it's only ever 
useful when the improvement gained for look-ups significantly outweighs 
the cost of updating the Indexes.


The typical process here is to remove all Indexes, then add them back 
one by one and run all queries, noting which adds benefit and which not, 
then when all are installed, remove them in the same order (i.e. if you 
added A then B then C... start removing A then B then C also) and 
measure again, you will quickly find the useless Indexes.


This is the very last step in design though, it's the kind of 
optimization everyone talks about when they warn against "premature 
optimization". A prior step would be to study the queries and see if you 
can find better Indexes, or ones that covers (i.e. is helpful with) a 
wider range of queries, etc.


Apart from all that... did you add any triggers since the big DB? Which 
thing is re-forming the previously "big" sets of data records into the 
new streamlined set? If this is a Trigger or UDF, does that not eat any 
time?



Cheers,
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Bug: Problem with sqlite3_prepare_v2

2018-07-29 Thread Jürgen Palm

Hi,

in an application I encountered a problem, where the changing of the 
structure of a table by dropping and recreating it via one connection to 
a db was not directly seen by another connection to the same db. I could 
reproduce the problem with the test program below. Inserting a 
sqlite3_step (case 2) changes the output results of the program, which 
represents actually the problem I have.


Could you please have a look at it?

SQLite version: 3.24 amalgination
OS: Windows 10
Compiler: Visual Studio Community 2017

Output case 1:

Column count: 1 -> OK
Column count: 2 -> OK
Column count: 1 -> should be 2
Column count: 1 -> should be 2

Output case 2:

Column count: 1 -> OK
Column count: 2 -> OK
Column count: 1 -> should be 2
Column count: 2 -> OK

Test program:

int main()
{
  sqlite3* conn1;
  sqlite3* conn2;

  sqlite3_open("test.db", );

  // the next two lines are only for preparing the db. They are not 
relevant for the problem.

  sqlite3_exec(conn1, "drop table if exists test", NULL, NULL, NULL);
  sqlite3_exec(conn1, "create table test(a text)",NULL,NULL,NULL);

  const char* select = "select * from test";
  sqlite3_stmt* stmt;

  sqlite3_prepare_v2(conn1, select, -1, , NULL);
  printf("Column count: %d\n", sqlite3_column_count(stmt));
  sqlite3_finalize(stmt);

  sqlite3_open("test.db", );

  sqlite3_exec(conn2, "drop table test;", NULL, NULL, NULL);
  sqlite3_exec(conn2, "create table test(a text,b text);", NULL, NULL, 
NULL);


  sqlite3_prepare_v2(conn2, select, -1, , NULL);
  printf("Column count: %d\n", sqlite3_column_count(stmt));
  sqlite3_finalize(stmt);

  // closing the second connection can also be done at the end of the 
program. The result doesn't change.

  sqlite3_close(conn2);

  sqlite3_prepare_v2(conn1, select, -1, , NULL);
  printf("Column count: %d\n", sqlite3_column_count(stmt));

  /** only case 2 ***/
  sqlite3_step(stmt);
  /**/

  sqlite3_finalize(stmt);

  sqlite3_prepare_v2(conn1, select, -1, , NULL);
  printf("Column count: %d\n", sqlite3_column_count(stmt));
  sqlite3_finalize(stmt);

  sqlite3_close(conn1);

  return 0;
}

Regards,
Jürgen
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-29 Thread Rob Willett

John,

Thanks for the prompt reply and a very good question..

We've dropped the indexes and the commit is now very quick, approx two 
seconds


However the overall performance of the run is much the same as other 
areas of the code are now significantly slower, whereas before they were 
quick.


Where were you going with that question?

Thanks

Rob

On 29 Jul 2018, at 10:33, John Found wrote:


What is the performance without the indexes?

On Sun, 29 Jul 2018 10:20:11 +0100
"Rob Willett"  wrote:


Update 1

We've copied the disruptions table to a new table, dropped the old
table, copied the new table back in and recreated all the indexes.

Exactly the sam commit performance.

We've also tracked the -shm and -wal files and they are around 5MB in
size.

Mmmm tricky (as Deepthought said).

Rob

On 29 Jul 2018, at 9:14, Rob Willett wrote:


Hi,

Background

We've been refactoring our database to reduce the size of it. 
Through
some simple logic we've managed to pull out 99% of the data to 
reduce
the size from 51GB down to approx 600MB. This logic has been to 
remove

rows that are almost the same but not quite identical. As with all
things, the thinking was the difficult bit, the execution somewhat
easier.

As part of the testing we've been doing, we've now hit on an odd and
weird problem to do with the COMMIT statement. A commit of a few
hundred (circa 600-800) rows takes approx 7 seconds whereas before 
we

never even noticed it, though we now know it was two seconds before.
Each row is probably 1-2K of data, so its not very much at all.

Details of what we have tried:

1. We've turned synchronous  on and off

PRAGMA synchronous=ON

and thats not made any difference.

2. We are using and have been using WAL mode for years.

PRAGMA journal_mode;
journal_mode
wal

3. We've tested that the server copies OK, we get a consistent
69MB/sec. This is not as fast we would like, but it's the same 
across

all our virtual servers.

4. We've tested the commit on our existing 60GB database and it 
takes
2 seconds, which is longer than we thought it would be. The server 
for

the 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu
14.04. The server we are testing on is a 2GB/2 core test server
running Ubuntu 16.04. Whilst the test server is smaller, we wouldn't
expect it to take 3 times longer to do a commit.

5. The code is identical across the servers. We are running Perl and
the DBI module. The code for doing a commit in Perl::DBI is
 $dbh->do("COMMIT");

  We are getting the expected performance elsewhere on the system 
and

in the code. It's just the commit that is taking a long time.

6. The code we are committing is adding 600-800 lines to a table 
that

used to be 200,000,000 rows in size. It's now 400,000 lines in size.
We are wondering if the deletion of the lines has had an impact we
didn't expect. We have vacuumed and analysed the database.

The schema for the table we insert into is

CREATE TABLE IF NOT EXISTS "Disruptions" (
 "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
 "version" integer NOT NULL,
 "Disruption_id" INTEGER NOT NULL,
 "status" integer NOT NULL,
 "severity" integer NOT NULL,
 "levelOfInterest" integer NOT NULL,
 "category" integer NOT NULL,
 "subCategory" integer NOT NULL,
 "startTime" TEXT NOT NULL,
 "endTime" text NOT NULL,
 "location" integer NOT NULL,
 "corridor" integer NOT NULL,
 "comments" integer NOT NULL,
 "currentUpdate" integer NOT NULL,
 "remarkTime" TEXT NOT NULL,
 "lastModTime" TEXT NOT NULL,
 "CauseAreaPointX" real NOT NULL,
 "CauseAreaPointY" real NOT NULL,
 "Direction" TEXT
);
CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE
ASC, "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
"levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
"subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions 
("Disruption_id"

COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
COLLATE NOCASE ASC);
CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
"Disruption_id", "Severity", "levelOfInterest", "category",
"subCategory", "version");

We have checked that this schema is consistent across the databases.

We're about to recreate the table to see if that makes a difference.

Any help or advice welcomed.

Thanks

Rob
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing 

Re: [sqlite] Very, very slow commits

2018-07-29 Thread John Found
What is the performance without the indexes?

On Sun, 29 Jul 2018 10:20:11 +0100
"Rob Willett"  wrote:

> Update 1
> 
> We've copied the disruptions table to a new table, dropped the old 
> table, copied the new table back in and recreated all the indexes.
> 
> Exactly the sam commit performance.
> 
> We've also tracked the -shm and -wal files and they are around 5MB in 
> size.
> 
> Mmmm tricky (as Deepthought said).
> 
> Rob
> 
> On 29 Jul 2018, at 9:14, Rob Willett wrote:
> 
> > Hi,
> >
> > Background
> >
> > We've been refactoring our database to reduce the size of it. Through 
> > some simple logic we've managed to pull out 99% of the data to reduce 
> > the size from 51GB down to approx 600MB. This logic has been to remove 
> > rows that are almost the same but not quite identical. As with all 
> > things, the thinking was the difficult bit, the execution somewhat 
> > easier.
> >
> > As part of the testing we've been doing, we've now hit on an odd and 
> > weird problem to do with the COMMIT statement. A commit of a few 
> > hundred (circa 600-800) rows takes approx 7 seconds whereas before we 
> > never even noticed it, though we now know it was two seconds before. 
> > Each row is probably 1-2K of data, so its not very much at all.
> >
> > Details of what we have tried:
> >
> > 1. We've turned synchronous  on and off
> >
> > PRAGMA synchronous=ON
> >
> > and thats not made any difference.
> >
> > 2. We are using and have been using WAL mode for years.
> >
> > PRAGMA journal_mode;
> > journal_mode
> > wal
> >
> > 3. We've tested that the server copies OK, we get a consistent 
> > 69MB/sec. This is not as fast we would like, but it's the same across 
> > all our virtual servers.
> >
> > 4. We've tested the commit on our existing 60GB database and it takes 
> > 2 seconds, which is longer than we thought it would be. The server for 
> > the 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu 
> > 14.04. The server we are testing on is a 2GB/2 core test server 
> > running Ubuntu 16.04. Whilst the test server is smaller, we wouldn't 
> > expect it to take 3 times longer to do a commit.
> >
> > 5. The code is identical across the servers. We are running Perl and 
> > the DBI module. The code for doing a commit in Perl::DBI is
> >  $dbh->do("COMMIT");
> >
> >   We are getting the expected performance elsewhere on the system and 
> > in the code. It's just the commit that is taking a long time.
> >
> > 6. The code we are committing is adding 600-800 lines to a table that 
> > used to be 200,000,000 rows in size. It's now 400,000 lines in size. 
> > We are wondering if the deletion of the lines has had an impact we 
> > didn't expect. We have vacuumed and analysed the database.
> >
> > The schema for the table we insert into is
> >
> > CREATE TABLE IF NOT EXISTS "Disruptions" (
> >  "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
> >  "version" integer NOT NULL,
> >  "Disruption_id" INTEGER NOT NULL,
> >  "status" integer NOT NULL,
> >  "severity" integer NOT NULL,
> >  "levelOfInterest" integer NOT NULL,
> >  "category" integer NOT NULL,
> >  "subCategory" integer NOT NULL,
> >  "startTime" TEXT NOT NULL,
> >  "endTime" text NOT NULL,
> >  "location" integer NOT NULL,
> >  "corridor" integer NOT NULL,
> >  "comments" integer NOT NULL,
> >  "currentUpdate" integer NOT NULL,
> >  "remarkTime" TEXT NOT NULL,
> >  "lastModTime" TEXT NOT NULL,
> >  "CauseAreaPointX" real NOT NULL,
> >  "CauseAreaPointY" real NOT NULL,
> >  "Direction" TEXT
> > );
> > CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE 
> > NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE 
> > ASC, "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC, 
> > "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, 
> > "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
> > CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id" 
> > COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE 
> > NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
> > CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" 
> > COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" 
> > COLLATE NOCASE ASC);
> > CREATE INDEX Disruptions_Idx5 ON Disruptions ("status", 
> > "Disruption_id", "Severity", "levelOfInterest", "category", 
> > "subCategory", "version");
> >
> > We have checked that this schema is consistent across the databases.
> >
> > We're about to recreate the table to see if that makes a difference.
> >
> > Any help or advice welcomed.
> >
> > Thanks
> >
> > Rob
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> 

Re: [sqlite] Very, very slow commits

2018-07-29 Thread Rob Willett

Update 1

We've copied the disruptions table to a new table, dropped the old 
table, copied the new table back in and recreated all the indexes.


Exactly the sam commit performance.

We've also tracked the -shm and -wal files and they are around 5MB in 
size.


Mmmm tricky (as Deepthought said).

Rob

On 29 Jul 2018, at 9:14, Rob Willett wrote:


Hi,

Background

We've been refactoring our database to reduce the size of it. Through 
some simple logic we've managed to pull out 99% of the data to reduce 
the size from 51GB down to approx 600MB. This logic has been to remove 
rows that are almost the same but not quite identical. As with all 
things, the thinking was the difficult bit, the execution somewhat 
easier.


As part of the testing we've been doing, we've now hit on an odd and 
weird problem to do with the COMMIT statement. A commit of a few 
hundred (circa 600-800) rows takes approx 7 seconds whereas before we 
never even noticed it, though we now know it was two seconds before. 
Each row is probably 1-2K of data, so its not very much at all.


Details of what we have tried:

1. We've turned synchronous  on and off

PRAGMA synchronous=ON

and thats not made any difference.

2. We are using and have been using WAL mode for years.

PRAGMA journal_mode;
journal_mode
wal

3. We've tested that the server copies OK, we get a consistent 
69MB/sec. This is not as fast we would like, but it's the same across 
all our virtual servers.


4. We've tested the commit on our existing 60GB database and it takes 
2 seconds, which is longer than we thought it would be. The server for 
the 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu 
14.04. The server we are testing on is a 2GB/2 core test server 
running Ubuntu 16.04. Whilst the test server is smaller, we wouldn't 
expect it to take 3 times longer to do a commit.


5. The code is identical across the servers. We are running Perl and 
the DBI module. The code for doing a commit in Perl::DBI is

 $dbh->do("COMMIT");

  We are getting the expected performance elsewhere on the system and 
in the code. It's just the commit that is taking a long time.


6. The code we are committing is adding 600-800 lines to a table that 
used to be 200,000,000 rows in size. It's now 400,000 lines in size. 
We are wondering if the deletion of the lines has had an impact we 
didn't expect. We have vacuumed and analysed the database.


The schema for the table we insert into is

CREATE TABLE IF NOT EXISTS "Disruptions" (
 "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
 "version" integer NOT NULL,
 "Disruption_id" INTEGER NOT NULL,
 "status" integer NOT NULL,
 "severity" integer NOT NULL,
 "levelOfInterest" integer NOT NULL,
 "category" integer NOT NULL,
 "subCategory" integer NOT NULL,
 "startTime" TEXT NOT NULL,
 "endTime" text NOT NULL,
 "location" integer NOT NULL,
 "corridor" integer NOT NULL,
 "comments" integer NOT NULL,
 "currentUpdate" integer NOT NULL,
 "remarkTime" TEXT NOT NULL,
 "lastModTime" TEXT NOT NULL,
 "CauseAreaPointX" real NOT NULL,
 "CauseAreaPointY" real NOT NULL,
 "Direction" TEXT
);
CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE 
NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE 
ASC, "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC, 
"levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, 
"subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id" 
COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE 
NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" 
COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" 
COLLATE NOCASE ASC);
CREATE INDEX Disruptions_Idx5 ON Disruptions ("status", 
"Disruption_id", "Severity", "levelOfInterest", "category", 
"subCategory", "version");


We have checked that this schema is consistent across the databases.

We're about to recreate the table to see if that makes a difference.

Any help or advice welcomed.

Thanks

Rob
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Very, very slow commits

2018-07-29 Thread Rob Willett

Hi,

Background

We've been refactoring our database to reduce the size of it. Through 
some simple logic we've managed to pull out 99% of the data to reduce 
the size from 51GB down to approx 600MB. This logic has been to remove 
rows that are almost the same but not quite identical. As with all 
things, the thinking was the difficult bit, the execution somewhat 
easier.


As part of the testing we've been doing, we've now hit on an odd and 
weird problem to do with the COMMIT statement. A commit of a few hundred 
(circa 600-800) rows takes approx 7 seconds whereas before we never even 
noticed it, though we now know it was two seconds before. Each row is 
probably 1-2K of data, so its not very much at all.


Details of what we have tried:

1. We've turned synchronous  on and off

PRAGMA synchronous=ON

and thats not made any difference.

2. We are using and have been using WAL mode for years.

PRAGMA journal_mode;
journal_mode
wal

3. We've tested that the server copies OK, we get a consistent 69MB/sec. 
This is not as fast we would like, but it's the same across all our 
virtual servers.


4. We've tested the commit on our existing 60GB database and it takes 2 
seconds, which is longer than we thought it would be. The server for the 
60GB database is a large VPS with 8GB/8 cores and runs Ubuntu 14.04. The 
server we are testing on is a 2GB/2 core test server running Ubuntu 
16.04. Whilst the test server is smaller, we wouldn't expect it to take 
3 times longer to do a commit.


5. The code is identical across the servers. We are running Perl and the 
DBI module. The code for doing a commit in Perl::DBI is

 $dbh->do("COMMIT");

  We are getting the expected performance elsewhere on the system and 
in the code. It's just the commit that is taking a long time.


6. The code we are committing is adding 600-800 lines to a table that 
used to be 200,000,000 rows in size. It's now 400,000 lines in size. We 
are wondering if the deletion of the lines has had an impact we didn't 
expect. We have vacuumed and analysed the database.


The schema for the table we insert into is

CREATE TABLE IF NOT EXISTS "Disruptions" (
 "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
 "version" integer NOT NULL,
 "Disruption_id" INTEGER NOT NULL,
 "status" integer NOT NULL,
 "severity" integer NOT NULL,
 "levelOfInterest" integer NOT NULL,
 "category" integer NOT NULL,
 "subCategory" integer NOT NULL,
 "startTime" TEXT NOT NULL,
 "endTime" text NOT NULL,
 "location" integer NOT NULL,
 "corridor" integer NOT NULL,
 "comments" integer NOT NULL,
 "currentUpdate" integer NOT NULL,
 "remarkTime" TEXT NOT NULL,
 "lastModTime" TEXT NOT NULL,
 "CauseAreaPointX" real NOT NULL,
 "CauseAreaPointY" real NOT NULL,
 "Direction" TEXT
);
CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE 
NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE ASC, 
"category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC, 
"levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, 
"subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id" 
COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE 
NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE 
NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE 
NOCASE ASC);
CREATE INDEX Disruptions_Idx5 ON Disruptions ("status", "Disruption_id", 
"Severity", "levelOfInterest", "category", "subCategory", "version");


We have checked that this schema is consistent across the databases.

We're about to recreate the table to see if that makes a difference.

Any help or advice welcomed.

Thanks

Rob
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users