The top table finally finished around 07:00 this morning. It took 9.5 hours to drop the circa 190M rows.

I suspect that part of the issue is the VPS provider we use has a rate limiter on IOPS which is not normally an issue for us, but that might have slowed it down somewhat. However I don't think that it would have slowed it down by hours.

Lessons to be learnt here:

1. Design your database correctly at the start. This was our biggest problem, we failed to understand how large a single table would grow and didn't manage the administration correctly and in good time. 2. Think carefully about the data you actually need. We managed to throw away 99% of our data and combined it together to get a 50GB database down to 500MB. 3. Dropping a table may not be the fastest method, creating all the other smaller tables in a separate database would have been far quicker. There's always more than one way to solve a problem.
4. Turn off secure_delete if it's on.
5. Use the mailing group for advice. Its brilliant!
6. Assess, plan, act. Thats a diving phrase but very pertinent here, assess what the problem actually is before you plan and execute.

Now we have other checks to do to assure us the database is accurate for our needs.

Thanks to all the people who offered advice and help

Rob

On 17 Jul 2018, at 7:02, Rob Willett wrote:

Richard,

Thanks for the comprehensive update.

We appreciate that there are tradeoffs and that dropping a table is a rarer operation than most others. The upside of the Sqlite design is that we treat a database as a single file which we copy around with ease. You cannot do that with many other databases, this has a downside though of when you need to drop a table.

A less charitable person would have stated that it was our own fault for designing the database and using it with approx 190M records in a single table with little thought about how we would manage it. We now know this was a dumb idea, however you live and learn.

The impact of this is simply time. We've worked out how to shrink the database from 50GB to approx 1GB, after this we can prune the database on a weekly basis so this never becomes an issue.

I supposed the key question is would we give up the portability of sqlite for a fast table drop? The answer is No.

Rob

On 16 Jul 2018, at 22:59, Richard Hipp wrote:

On 7/16/18, Rob Willett <rob.sql...@robertwillett.com> wrote:

It does look as if one of sqlite's weaknesses is dropping very, very
large tables.


Right.  If every table were stored in a separate file, a DROP TABLE
could be translated to a relatively fast unlink().  But then a
database would be a directory full of files, rather than a single
file, which would undermine a lot of the usefulness of SQLite.
Furthermore, DROP TABLE is an uncommon operation.  We prefer to
provide a database where all content is contained in a single file and
that is optimized SELECTs and for INSERTs and DELETEs of a subset of
the rows in the table, as that seems to be far more useful in most
cases.  There are always engineering trade-offs.  SQLite provides
single-file databases and fast queries in exchange for slower DROP
TABLEs.

A filesystem is able to implement unlink() quickly because it has the
entire partition available for laying out the locations of files.
Space can be allocated to a file in large chunks, which makes
deallocation faster.  In other words, in a filesystem, the files can
be positioned sparsely on disk, with lots of unused space in between
the various file to accommodate growth . But in SQLite, the goal is to
keep the database file size as small as possible, and to not have
unnecessary unused pages in the middle of the database.  Hence, space
for tables much be allocated in relatively small 1-page chunks, which
means that there will be a large number of chunks to deallocate when
dropping a large table.  If you were to construct a filesystem that
tried to keep all file content tightly packed at the beginning of a
partition (say, for example, to make shrinking of a partition faster)
then unlink() would necessarily be slower on that filesystem.  That
seems like a bad engineering trade-off for a filesystem, but it is
(for most applications) a good trade-off for a database such as
SQLite.

You can work around this.  If you have one or more tables that you
think might need to be DROP-ed frequently, then consider storing them
in a separate database files and ATTACH-ing those separate database
files to your main database.  Then, to drop those tables, you can
DETACH them, and then unlink() the corresponding database file.  That
should go much faster.

--
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to