Accidentally sent my first reply direct, responding to the list. I'm
now wondering if there's a lot of binary data hidden in each row.
Trying to figure out how to determine that if length() doesn't show
anything.


Prior response:

An excellent idea, thank you :-) .  The output is included below.

I am continuing to do my own debugging in parallel. I am now leaning
towards a new hypothesis that there may be a bug handling
strange/malformed unicode that is resulting in a large amount of
garbage data being stored into the record. I'm trying to figure out
the exact conditions of copying/loading data into the table that
causes the issue to propagate or disappear.


 % ./sqlite3_analyzer ~/Downloads/200k-per-row.sqlite
/** Disk-Space Utilization Report For /Users/nadia/Downloads/200k-per-row.sqlite

Page size in bytes................................ 4096
Pages in the whole file (measured)................ 143
Pages in the whole file (calculated).............. 143
Pages that store data............................. 143        100.0%
Pages on the freelist (per header)................ 0            0.0%
Pages on the freelist (calculated)................ 0            0.0%
Pages of auto-vacuum overhead..................... 0            0.0%
Number of tables in the database.................. 2
Number of indices................................. 0
Number of defined indices......................... 0
Number of implied indices......................... 0
Size of the file in bytes......................... 585728
Bytes of user payload stored...................... 571322      97.5%

*** Page counts for all tables with their indices *****************************

COPIED............................................ 142         99.30%
SQLITE_MASTER..................................... 1            0.70%

*** Page counts for all tables and indices separately *************************

COPIED............................................ 142         99.30%
SQLITE_MASTER..................................... 1            0.70%

*** All tables ****************************************************************

Percentage of total database...................... 100.0%
Number of entries................................. 4
Bytes of storage consumed......................... 585728
Bytes of payload.................................. 571412      97.6%
Bytes of metadata................................. 1284         0.22%
Average payload per entry......................... 142853.00
Average unused bytes per entry.................... 3393.00
Average metadata per entry........................ 321.00
Average fanout.................................... 3.00
Maximum payload per entry......................... 194280
Entries that use overflow......................... 3           75.0%
Index pages used.................................. 1
Primary pages used................................ 4
Overflow pages used............................... 138
Total pages used.................................. 143
Unused bytes on index pages....................... 4070        99.37%
Unused bytes on primary pages..................... 9502        58.0%
Unused bytes on overflow pages.................... 0            0.0%
Unused bytes on all pages......................... 13572        2.3%

*** Table COPIED **************************************************************

Percentage of total database......................  99.30%
Number of entries................................. 3
Bytes of storage consumed......................... 581632
Bytes of payload.................................. 571322      98.2%
Bytes of metadata................................. 1172         0.20%
B-tree depth...................................... 2
Average payload per entry......................... 190440.67
Average unused bytes per entry.................... 3226.00
Average metadata per entry........................ 390.67
Average fanout.................................... 3.00
Non-sequential pages.............................. 0            0.0%
Maximum payload per entry......................... 194280
Entries that use overflow......................... 3          100.0%
Index pages used.................................. 1
Primary pages used................................ 3
Overflow pages used............................... 138
Total pages used.................................. 142
Unused bytes on index pages....................... 4070        99.37%
Unused bytes on primary pages..................... 5608        45.6%
Unused bytes on overflow pages.................... 0            0.0%
Unused bytes on all pages......................... 9678         1.7%

*** Table SQLITE_MASTER *******************************************************

Percentage of total database......................   0.70%
Number of entries................................. 1
Bytes of storage consumed......................... 4096
Bytes of payload.................................. 90           2.2%
Bytes of metadata................................. 112          2.7%
B-tree depth...................................... 1
Average payload per entry......................... 90.00
Average unused bytes per entry.................... 3894.00
Average metadata per entry........................ 112.00
Maximum payload per entry......................... 90
Entries that use overflow......................... 0            0.0%
Primary pages used................................ 1
Overflow pages used............................... 0
Total pages used.................................. 1
Unused bytes on primary pages..................... 3894        95.1%
Unused bytes on overflow pages.................... 0
Unused bytes on all pages......................... 3894        95.1%

On Fri, Jan 10, 2020 at 9:45 AM Richard Hipp <d...@sqlite.org> wrote:
>
> On 1/10/20, Ryan Mack <ryan.m...@gmail.com> wrote:
> >
> > I'm trying to understand unexplained table bloat
>
> The sqlite3_analyzer command-line utility program (available  in the
> "Precompiled binaries" bundles on the https://sqlite.org/download.html
> page) is designed to help understand these kinds of problems.  Please
> run that utility on the database and perhaps post the output here.
> --
> D. Richard Hipp
> d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to