Re: [sqlite] Corrupted FTS5 index? disk image is malformed - Part II

2018-07-17 Thread Nick

On 2018-07-10 21:17, Dan Kennedy wrote:

On 07/11/2018 02:56 AM, Nick wrote:
Using sqlite cli version 3.13 I have a simple schema with a virtual 
FTS5 table providing full index searching. It is accessed by a python 
application using apsw==3.13.0.post1.


I could successfully use the full index functionality during manual 
testing of the db at creation time (probably a year ago now) however, 
recently I've been getting "Error: database disk image is malformed" 
messages when running queries on the FTS5 virtual table.


In an attempt to explore further I downloaded the latest 3.24 version. 
With this latest version I used the ".backup" command to create a copy 
of the file in the hope of eliminating HDD errors being a culprit.


Running pragma quick_check and integrity_check on the copied db both 
return ok.


The schema of the FTS5 table is:

CREATE VIRTUAL TABLE IF NOT EXISTS [i_epg] USING fts5 (
[mangled_title],
[mangled_subtitle],
[mangled_summary],
content=[t_epg],
content_rowid=[tid]
);

The table is exclusive kept up to date using triggers:

-- Triggers to keep the FTS index up to date.

CREATE TRIGGER IF NOT EXISTS i_epg_ai AFTER INSERT ON [t_epg] BEGIN
   INSERT INTO [i_epg](rowid, [mangled_title], [mangled_subtitle], 
[mangled_summary]) VALUES (new.[tid], new.[mangled_title], 
new.[mangled_subtitle], new.[mangled_summary]);

END;
CREATE TRIGGER IF NOT EXISTS i_epg_ad AFTER DELETE ON [t_epg] BEGIN
   INSERT INTO [i_epg]([i_epg], rowid, [mangled_title], 
[mangled_subtitle], [mangled_summary]) VALUES('delete', old.[tid], 
old.[mangled_title], old.[mangled_subtitle], old.[mangled_summary]);

END;
CREATE TRIGGER IF NOT EXISTS i_epg_au AFTER UPDATE ON [t_epg] BEGIN
   INSERT INTO [i_epg]([i_epg], rowid, [mangled_title], 
[mangled_subtitle], [mangled_summary]) VALUES('delete', old.[tid], 
old.[mangled_title], old.[mangled_subtitle], old.[mangled_summary]);
   INSERT INTO [i_epg](rowid, [mangled_title], [mangled_subtitle], 
[mangled_summary]) VALUES (new.[tid], new.[mangled_title], 
new.[mangled_subtitle], new.[mangled_summary]);

END;

Running SQL queries on the normal tables all work as expected. Digging 
further on the FTS5 queries I noticed the following behaviour:


SELECT * FROM [i_epg] WHERE [i_epg] MATCH '{ mangled_title } : big + 
ban*';
- expect results - actually returns "Error: database disk image is 
malformed" immediately


SELECT * FROM [i_epg] WHERE [i_epg] MATCH '{ mangled_title } : big + 
ban*';

- expect no results - returns no results

SELECT [mangled_title] FROM [i_epg] WHERE [i_epg] MATCH '{ 
mangled_title } : black + adder';

- expect results - returns results not matching request
The Black Adder
The Black Adder
The Black Adder
The Black Adder
The Black Adder
Morning Show Exclusives
Deal of the Day
Four in a Bed
The Black Adder
The Black Adder
The Black Adder
The Black Adder
Denim & Co
The Shoe Stylist
Our World: Crisis in Catalonia
The Black Adder
The Black Adder
The Black Adder

I've never come across a disk image malformed error in my years of 
using sqlite3 so not sure where to turn to next. Questions are:


1. Is this a known issue with FTS5 tables and if so is there a 
workaround?


2. It appears the FTS5 virtual table is corrupt. Is there a way to 
rebuild the FTS5 (drop table and recreate?) from just the sqlite cli 
tool?


Try running the FTS5 integrity-check command with the 3.24.0 command
line to ensure it really is corrupt:

  https://www.sqlite.org/fts5.html#the_integrity_check_command

The index can be rebuilt using the rebuild command:

  https://www.sqlite.org/fts5.html#the_rebuild_command

3.13.0 was about 2 years ago. There have been a couple of fixes for
fts5 corruption bugs since then. This one, for example:

  https://www.sqlite.org/src/info/9a2de4f05fabf7e7

So you may have hit a known issue. Hard to say.

Dan.



Part II

With the help from Dan the FTS5 table was fixed and then subsequently 
worked as expected. For belt and braces, using the 3.24 sqlite cli 
client, I created a new db with the below PRAGMA statements and then ran 
".dump"' to copy over the records from the previous db.


PRAGMA legacy_file_format = off;
PRAGMA page_size = 4096;
PRAGMA auto_vacuum = 2;
PRAGMA foreign_keys = on;
PRAGMA journal_mode = wal;
PRAGMA application_id = 19;

Both PRAGMA and FTS integrity returned ok and manual testing showed the 
new db worked as expected. At the same time I've upgrade apsw to the 
latest version (I saw it downloaded 3.24 file during compiling).


A number of days later I've gone back and ran the  INSERT INTO 
[i_epg]([i_epg]) VALUES('integrity-check') cmd and disappointingly it 
returned Error: database disk image is malformed.


However unlike my first report above the same FTS5 queries are all 
working and returning results as expected.


I'm at a loss.

Regards
Nick
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org

Re: [sqlite] Any ideas on how long it takes to drop a large table...rather an important question now...

2018-07-17 Thread Rob Willett

Simon,

No, we knew about normalisation, the database is normalised, that part 
of the design hasn't changed in years.


The reasons for the massive reduction in database size is that we worked 
out how to handle repeating lines of data that change just enough that 
we thought they needed to be kept. With some small tweaks we could 
eliminate data that we thought we needed to preserve. We had assumed 
that we needed to keep everything, but by going back in, looking at what 
actual data we now had, we worked out we could store a small amount of 
extra information that stores some state change but this meant we could 
dump most of the database.


It was something that we hadn't appreciated a few years ago, and the 
ever increasing database meant we had to do something about the DB size.


Its taken a few months of work, mainly in a dark room with a wet towel 
on our foreheads, to do the tiny changes necessary to make a big 
difference.


In hindsight the fact that SQLite is so easy and fast to use was a 
slight disadvantage to us, it allowed us to be a bit quick and dirty 
with designs, when we should have thought through some of the issues. 
However thats what startups are like ;)


I think all the explanations to date have been helpful and appreciate 
the time take to answer, we're going to think a little more carefully 
about how we manage our database on a VM. I'm in the process of moving 
home so the only real hardware (of any note) is sitting in storage so 
the only testing we can do is either virtualised or on Mac laptops. 
Neither of which will help us in this instance.


Rob

On 17 Jul 2018, at 13:16, Simon Slavin wrote:


On 17 Jul 2018, at 1:10pm, R Smith  wrote:

What kind of data did you store (maybe some examples if possible) 
that you could condense it by ~99% like that?


I think that the OP's organisation discovered the 'relational' part of 
RDBMS and implemented normalisation.


To Rob Willett: Ryan Smith's explanation is better than mine.  Please 
read his post and ignore my wooly one.  I tried three ways to get 
across the sequential-access vs. random-access point and wasn't really 
happy with anything I wrote.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Any ideas on how long it takes to drop a large table...rather an important question now...

2018-07-17 Thread Rob Willett

Dan,

Thanks for the reply.

pragma secure_delete;
1
sqlite>

Which is a bit of a surprise as we have never seen it before. We had to 
look it up using your reference.


No idea why that is set (if 1 means it is).

Should we simply ctrl-C the deletion, turn secure_delete off and then 
try again? My working assumption is that since there is a -wal file we 
are safe to do this.


Rob

On 16 Jul 2018, at 21:07, Dan Kennedy wrote:


On 07/17/2018 03:01 AM, rob.sql...@robertwillett.com wrote:

Hi,

We're doing a massive tidy on our database which is approx 50GB.

One table is approx 49GB of that 50GB which we need to delete as we 
have

recorded the information in a far better format thats approx 99% more
efficient. If only we had been this clever when we started

We've just 'dropped' the table and were assuming that dropping the 
table

would be quite quick. It's not. So far we've been waiting for 30 mins
and nothing has come back yet. We can see that the -wal file is upto
2.5GB. We have this terrible feeling that it'll need to get to 49GB 
or
so before the table gets dropped. We can just about handle that in 
the

current filesystem.

We're now getting nervous about dropping this table. We had assumed 
that

it would be a really quick and easy operation based on absolutely no
checking whatsoever. When we looked on line all we could see was a
reference to a very, very old and outdated page
(https://sqlite.org/speed.html) which talks about speed and at the
bottom of that page the comments

"SQLite is slower than the other databases when it comes to dropping
tables. This probably is because when SQLite drops a table, it has to 
go
through and erase the records in the database file that deal with 
that

table. MySQL and PostgreSQL, on the other hand, use separate files to
represent each table so they can drop a table simply by deleting a 
file,

which is much faster.

On the other hand, dropping tables is not a very common operation so 
if

SQLite takes a little longer, that is not seen as a big problem."

Is this still the case, is it going to take a long time? If we assume
that the table is 49GB then will we need to wait until the -wal file 
is

at 49GB. By our estimates thats approximately 10 hours away.

Any help or idea or suggestions welcomed, but please be quick.


Is secure-delete turned on?

  https://www.sqlite.org/pragma.html#pragma_secure_delete

Dan.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Any ideas on how long it takes to drop a large table...rather an important question now...

2018-07-17 Thread Rob Willett

Simon,

No, we knew about normalisation, the database is normalised, that part 
of the design hasn't changed in years.


The reasons for the massive reduction in database size is that we worked 
out how to handle repeating lines of data that change just enough that 
we thought they needed to be kept. With some small tweaks we could 
eliminate data that we thought we needed to preserve. We had assumed 
that we needed to keep everything, but by going back in, looking at what 
actual data we now had, we worked out we could store a small amount of 
extra information that stores some state change but this meant we could 
dump most of the database.


It was something that we hadn't appreciated a few years ago, and the 
ever increasing database meant we had to do something about the DB size.


Its taken a few months of work, mainly in a dark room with a wet towel 
on our foreheads, to do the tiny changes necessary to make a big 
difference.


In hindsight the fact that SQLite is so easy and fast to use was a 
slight disadvantage to us, it allowed us to be a bit quick and dirty 
with designs, when we should have thought through some of the issues. 
However thats what startups are like ;)


I think all the explanations to date have been helpful and appreciate 
the time take to answer, we're going to think a little more carefully 
about how we manage our database on a VM. I'm in the process of moving 
home so the only real hardware (of any note) is sitting in storage so 
the only testing we can do is either virtualised or on Mac laptops. 
Neither of which will help us in this instance.


Rob

On 17 Jul 2018, at 13:16, Simon Slavin wrote:


On 17 Jul 2018, at 1:10pm, R Smith  wrote:

What kind of data did you store (maybe some examples if possible) 
that you could condense it by ~99% like that?


I think that the OP's organisation discovered the 'relational' part of 
RDBMS and implemented normalisation.


To Rob Willett: Ryan Smith's explanation is better than mine.  Please 
read his post and ignore my wooly one.  I tried three ways to get 
across the sequential-access vs. random-access point and wasn't really 
happy with anything I wrote.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Any ideas on how long it takes to drop a large table...rather an important question now...

2018-07-17 Thread Simon Slavin
On 17 Jul 2018, at 1:10pm, R Smith  wrote:

> What kind of data did you store (maybe some examples if possible) that you 
> could condense it by ~99% like that?

I think that the OP's organisation discovered the 'relational' part of RDBMS 
and implemented normalisation.

To Rob Willett: Ryan Smith's explanation is better than mine.  Please read his 
post and ignore my wooly one.  I tried three ways to get across the 
sequential-access vs. random-access point and wasn't really happy with anything 
I wrote.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Any ideas on how long it takes to drop a large table...rather an important question now...

2018-07-17 Thread R Smith

On 2018/07/17 1:20 PM, Rob Willett wrote:


Thanks for this. You make some interesting points about cache hits and 
misses. Up until now, we hadn't seen a massive performance hit, whilst 
some database operations didn't work as fast as we would like them to, 
nothing works as fast we would like.


This is actually a common thread seen in this industry. It's quite 
normal for engineers or devs to put a system (of which they typically 
have only one at the start) and then simply assume that whatever speed 
it is going at is the correct speed, and often only bother to check 
actual speed deficiencies once they hit some real time barrier (which 
typically only happens much later) - and this is mostly just fine.


I propose as a test in future, to make a great big DB in whatever format 
you intend to use, then run it on some fast hardware, and then the 
typical VM, see the difference and decide if it's horrible or not and 
need further investigation or not.
That said, even in your current problem, you probably wouldn't have 
noticed any real performance hit right until you decided to drop that 
49GB table. Even now I'm thinking, you've been using your system for 
ages, it's been working great, one sunny afternoon you had to wait half 
a day for one maintenance operation... is that really worth changing an 
architecture for? Half-days are ten-a-penny.




I'm not familiar with these issues with virtualisation. The VPI we use 
has OpenVZ at it's core (no pun intended). We can see a little on the 
internet about this (well one article specifically about MySQL and 
OpenVZ) but nothing else.


We are in the process of evaluating whether to move to a different 
VPS, some use OpenVZ, some use KVM, internally we use VMWare ESXI. 
Very little real DB work is done on the actual metal, its all 
virtualised.


I am not familiar with usage figures on OpenVZ, but I don't think it 
should be worse than any other virtualization. The problems described by 
others will exist for all of them, and if it is particularly worse in 
any area, Google should find a zillion results on the problem. The mere 
scantness of results tells me it probably works just fine.





Do you have any pointers to stuff we can read up on? We don't 
understand your comment "SQLite spent that whole time accessing your 
50GB database file in an apparently random order." and would like to 
try and get more information about it.


Simon (and previously Richard) simply referred to the way in which an 
SQLite table stores information is not similar to the way a file system 
stores it, and as such may take hits (especially in the cache misses) 
because the VM is optimized to handle file access in the way that normal 
programs (the other 98% of stuff out there) read files - sequentially. 
Doing random-access reads in a file happens, but is not common and so if 
you make a VM and have to choose which model to cater for, SQLIte's 
model never wins. It's usually not horrible either, but a 49GB sqlite 
table drop will bring out the worst in every VM - that's the only point.




We have the option of moving off OpenVZ to KVM or ESXI so if we can 
understand the issue, we can make a more informed choice. Whilst our 
DB has dropped down to 500MB we still need to do a fair amount of 
testing and checking to make sure there are no unusual edge cases (or 
bugs) based before we promote it to live.


Accurate testing will save you every time.
May I ask a curiosity... What kind of data did you store (maybe some 
examples if possible) that you could condense it by ~99% like that? Did 
you just throw away stuff? Were fields duplicate? Did you discover the 
World's bestest new compression method? Did you simply elect to store 
stuff that were unneeded or implied or somehow irrelevant and now simply 
don't store it anymore? Do you possess Alien technology?




Cheers,
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Any ideas on how long it takes to drop a large table...rather an important question now...

2018-07-17 Thread Simon Slavin
On 17 Jul 2018, at 12:20pm, Rob Willett  wrote:

> Do you have any pointers to stuff we can read up on? We don't understand your 
> comment "SQLite spent that whole time accessing your 50GB database file in an 
> apparently random order." and would like to try and get more information 
> about it.

I don't know much about specific virtualisation packages and have no 
recommendation.  I was just comparing virtual machines in general with 
operations on "actual metal" as you put it.

SQLite database files are split up into pages.  Each page contains data for 
exactly one table or index.  So even just a table with a primary index requires 
two different kinds of pages, one for the row of data, one for the 
corresponding entry in the primary key index.  And making changes to them both 
requires changes at other locations: the places where the page indexes are 
stored.

Worse still, unless all your indexes are monatonic (with values in those 
indexes only ever increasing) anything progressing through them in order would 
not be reading through the database file in order.  This means you get 
continuous cache misses: the next piece of data you need is rarely already in 
the virtual computer's cache, because it's usually in a part of the file far 
away from the one you just read.

On a real computer, storage is only one motherboard connection and storage 
driver away from the CPU.  On a virtual computer the virtual storage is on a 
file server far away from the processing, and each time data isn't already in 
the processor's cache something has to work out where, in the racks of file 
servers, that page of file is stored, talk to that file server, and route 
traffic back and forth to the virtual processor.  So file operations are more 
slow and expensive than they are for real computers.  And an operation which 
generates cache misses for 50Gig of data ends up doing more work done than it 
would for a simple "real metal" setup.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Any ideas on how long it takes to drop a large table...rather an important question now...

2018-07-17 Thread Rob Willett

Simon,

Thanks for this. You make some interesting points about cache hits and 
misses. Up until now, we hadn't seen a massive performance hit, whilst 
some database operations didn't work as fast as we would like them to, 
nothing works as fast we would like.


I'm not familiar with these issues with virtualisation. The VPI we use 
has OpenVZ at it's core (no pun intended). We can see a little on the 
internet about this (well one article specifically about MySQL and 
OpenVZ) but nothing else.


We are in the process of evaluating whether to move to a different VPS, 
some use OpenVZ, some use KVM, internally we use VMWare ESXI. Very 
little real DB work is done on the actual metal, its all virtualised.


Do you have any pointers to stuff we can read up on? We don't understand 
your comment "SQLite spent that whole time accessing your 50GB database 
file in an apparently random order." and would like to try and get more 
information about it.


We have the option of moving off OpenVZ to KVM or ESXI so if we can 
understand the issue, we can make a more informed choice. Whilst our DB 
has dropped down to 500MB we still need to do a fair amount of testing 
and checking to make sure there are no unusual edge cases (or bugs) 
based before we promote it to live.


Many thanks

Rob

On 17 Jul 2018, at 12:05, Simon Slavin wrote:

On 17 Jul 2018, at 8:37am, Rob Willett  
wrote:


I suspect that part of the issue is the VPS provider we use has a 
rate limiter on IOPS which is not normally an issue for us, but that 
might have slowed it down somewhat. However I don't think that it 
would have slowed it down by hours.


Actually I think VPS had a lot to do with the time the operation took. 
 Any kind of virtual machine takes a terrible hit during the sort of 
storage access involved in dropping the table.


SQLite spent that whole time accessing your 50GB database file in an 
apparently random order.   So you had nine hours of cache misses, 
causing the virtual machine to continually write virtual pages back to 
real storage and read other pages into memory.  Virtual systems are 
optimized for cache hits, not cache misses.


I can't prove it without a lot of pointless data manipulation on your 
type of VPS, but I think you found its least optimal operation.  The 
good part is that now your database is less than 1GB long you're going 
to see a massive increase in speed since the whole database may well 
fit in the cache of your virtual machine.


Must remember in future, when people report unusually slow operations, 
to ask whether they're using a virtual machine or real hardware.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Any ideas on how long it takes to drop a large table...rather an important question now...

2018-07-17 Thread Simon Slavin
On 17 Jul 2018, at 8:37am, Rob Willett  wrote:

> I suspect that part of the issue is the VPS provider we use has a rate 
> limiter on IOPS which is not normally an issue for us, but that might have 
> slowed it down somewhat. However I don't think that it would have slowed it 
> down by hours.

Actually I think VPS had a lot to do with the time the operation took.  Any 
kind of virtual machine takes a terrible hit during the sort of storage access 
involved in dropping the table.

SQLite spent that whole time accessing your 50GB database file in an apparently 
random order.   So you had nine hours of cache misses, causing the virtual 
machine to continually write virtual pages back to real storage and read other 
pages into memory.  Virtual systems are optimized for cache hits, not cache 
misses.

I can't prove it without a lot of pointless data manipulation on your type of 
VPS, but I think you found its least optimal operation.  The good part is that 
now your database is less than 1GB long you're going to see a massive increase 
in speed since the whole database may well fit in the cache of your virtual 
machine.

Must remember in future, when people report unusually slow operations, to ask 
whether they're using a virtual machine or real hardware.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Any ideas on how long it takes to drop a large table...rather an important question now...

2018-07-17 Thread Rob Willett
The top table finally finished around 07:00 this morning. It took 9.5 
hours to drop the circa 190M rows.


I suspect that part of the issue is the VPS provider we use has a rate 
limiter on IOPS which is not normally an issue for us, but that might 
have slowed it down somewhat. However I don't think that it would have 
slowed it down by hours.


Lessons to be learnt here:

1. Design your database correctly at the start. This was our biggest 
problem, we failed to understand how large a single table would grow and 
didn't manage the administration correctly and in good time.
2. Think carefully about the data you actually need. We managed to throw 
away 99% of our data and combined it together to get a 50GB database 
down to 500MB.
3. Dropping a table may not be the fastest method, creating all the 
other smaller tables in a separate database would have been far quicker. 
There's always more than one way to solve a problem.

4. Turn off secure_delete if it's on.
5. Use the mailing group for advice. Its brilliant!
6. Assess, plan, act. Thats a diving phrase but very pertinent here, 
assess what the problem actually is before you plan and execute.


Now we have other checks to do to assure us the database is accurate for 
our needs.


Thanks to all the people who offered advice and help

Rob

On 17 Jul 2018, at 7:02, Rob Willett wrote:


Richard,

Thanks for the comprehensive update.

We appreciate that there are tradeoffs and that dropping a table is a 
rarer operation than most others. The upside of the Sqlite design is 
that we treat a database as a single file which we copy around with 
ease. You cannot do that with many other databases, this has a 
downside though of when you need to drop a table.


A less charitable person would have stated that it was our own fault 
for designing the database and using it with approx 190M records in a 
single table with little thought about how we would manage it.  We now 
know this was a dumb idea, however you live and learn.


The impact of this is simply time. We've worked out how to shrink the 
database from 50GB to approx 1GB, after this we can prune the database 
on a weekly basis so this never becomes an issue.


I supposed the key question is would we give up the portability of 
sqlite for a fast table drop? The answer is No.


Rob

On 16 Jul 2018, at 22:59, Richard Hipp wrote:


On 7/16/18, Rob Willett  wrote:


It does look as if one of sqlite's weaknesses is dropping very, very
large tables.



Right.  If every table were stored in a separate file, a DROP TABLE
could be translated to a relatively fast unlink().  But then a
database would be a directory full of files, rather than a single
file, which would undermine a lot of the usefulness of SQLite.
Furthermore, DROP TABLE is an uncommon operation.  We prefer to
provide a database where all content is contained in a single file 
and

that is optimized SELECTs and for INSERTs and DELETEs of a subset of
the rows in the table, as that seems to be far more useful in most
cases.  There are always engineering trade-offs.  SQLite provides
single-file databases and fast queries in exchange for slower DROP
TABLEs.

A filesystem is able to implement unlink() quickly because it has the
entire partition available for laying out the locations of files.
Space can be allocated to a file in large chunks, which makes
deallocation faster.  In other words, in a filesystem, the files can
be positioned sparsely on disk, with lots of unused space in between
the various file to accommodate growth . But in SQLite, the goal is 
to

keep the database file size as small as possible, and to not have
unnecessary unused pages in the middle of the database.  Hence, space
for tables much be allocated in relatively small 1-page chunks, which
means that there will be a large number of chunks to deallocate when
dropping a large table.  If you were to construct a filesystem that
tried to keep all file content tightly packed at the beginning of a
partition (say, for example, to make shrinking of a partition faster)
then unlink() would necessarily be slower on that filesystem.  That
seems like a bad engineering trade-off for a filesystem, but it is
(for most applications) a good trade-off for a database such as
SQLite.

You can work around this.  If you have one or more tables that you
think might need to be DROP-ed frequently, then consider storing them
in a separate database files and ATTACH-ing those separate database
files to your main database.  Then, to drop those tables, you can
DETACH them, and then unlink() the corresponding database file.  That
should go much faster.

--
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org

Re: [sqlite] Any ideas on how long it takes to drop a large table...rather an important question now...

2018-07-17 Thread Rob Willett

Richard,

Thanks for the comprehensive update.

We appreciate that there are tradeoffs and that dropping a table is a 
rarer operation than most others. The upside of the Sqlite design is 
that we treat a database as a single file which we copy around with 
ease. You cannot do that with many other databases, this has a downside 
though of when you need to drop a table.


A less charitable person would have stated that it was our own fault for 
designing the database and using it with approx 190M records in a single 
table with little thought about how we would manage it.  We now know 
this was a dumb idea, however you live and learn.


The impact of this is simply time. We've worked out how to shrink the 
database from 50GB to approx 1GB, after this we can prune the database 
on a weekly basis so this never becomes an issue.


I supposed the key question is would we give up the portability of 
sqlite for a fast table drop? The answer is No.


Rob

On 16 Jul 2018, at 22:59, Richard Hipp wrote:


On 7/16/18, Rob Willett  wrote:


It does look as if one of sqlite's weaknesses is dropping very, very
large tables.



Right.  If every table were stored in a separate file, a DROP TABLE
could be translated to a relatively fast unlink().  But then a
database would be a directory full of files, rather than a single
file, which would undermine a lot of the usefulness of SQLite.
Furthermore, DROP TABLE is an uncommon operation.  We prefer to
provide a database where all content is contained in a single file and
that is optimized SELECTs and for INSERTs and DELETEs of a subset of
the rows in the table, as that seems to be far more useful in most
cases.  There are always engineering trade-offs.  SQLite provides
single-file databases and fast queries in exchange for slower DROP
TABLEs.

A filesystem is able to implement unlink() quickly because it has the
entire partition available for laying out the locations of files.
Space can be allocated to a file in large chunks, which makes
deallocation faster.  In other words, in a filesystem, the files can
be positioned sparsely on disk, with lots of unused space in between
the various file to accommodate growth . But in SQLite, the goal is to
keep the database file size as small as possible, and to not have
unnecessary unused pages in the middle of the database.  Hence, space
for tables much be allocated in relatively small 1-page chunks, which
means that there will be a large number of chunks to deallocate when
dropping a large table.  If you were to construct a filesystem that
tried to keep all file content tightly packed at the beginning of a
partition (say, for example, to make shrinking of a partition faster)
then unlink() would necessarily be slower on that filesystem.  That
seems like a bad engineering trade-off for a filesystem, but it is
(for most applications) a good trade-off for a database such as
SQLite.

You can work around this.  If you have one or more tables that you
think might need to be DROP-ed frequently, then consider storing them
in a separate database files and ATTACH-ing those separate database
files to your main database.  Then, to drop those tables, you can
DETACH them, and then unlink() the corresponding database file.  That
should go much faster.

--
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users