On Tue, Jun 12, 2018 at 8:03 AM Clemens Ladisch <[email protected]> wrote:

> Dominique Devienne wrote:
> > On Mon, Jun 11, 2018 at 4:27 PM Clemens Ladisch <[email protected]>
> wrote:
> >> It does write to the same pages, but those pages must be copied to the
> >> rollback journal so that they can be restored if the transaction is
> >> rolled back.  (Or are the two passes inside the same transaction?)
> >
> > They are part of the same transactions. So the no-rewrite pages should
> > still apply, no?
>
> Indeed.  I'd guess that just writing all those zero-filled pages is too
> much overhead.
>
> And the incremental blob I/O function are intended to access partial
> blobs; when you write the entire value at once, just use SQL.
>
> >> Don't insert the zero blobs in the first pass; this still rewrites all
> >> rows, but there is less data that might need to be rolled back.
> >>
> >> Consider moving the blobs to a separate table.
> >
> >  Is that really/still relevant, considering the above?
>
> Well, there's only one way to find out ...  ;-)
>

OK! Output below should be self-explanatory.

Blobs are still saved in a 2nd pass, still in the same transaction as 1st
pass inserting the rows,
but now the blobs are inserted directly (sqlite3_bind_blob) but in a
separate (tab, col, row, val) "blob table",
as suggested. Still in JOURNAL mode.

Bottom line is that not calling zeroblob(N) helps, but the commit time (in
Epilog) is still around 10s :(

I guess my mental model of JOURNAL mode and heavy BLOB insert is not
correct...

In WAL mode, new data goes to WAL file, and then is checkpointed later to
the DB file.
In JOURNAL mode, new data goes to DB file directly, and modified pages go
to the JOURNAL file.
And since here this is INSERT-only, from empty tables, I assumed pages
copied to the JOURNAL
file should be minimal.

So what's going on during the commit to surpass writing all row *AND* blob
data combined???

That commit time represent 30% of my ETL, and that 1GB project export is a
"small" one...

FWIW, here's a rough "histogram" of blob sizes and max size (1/2 MB) for
this small project:
Read 20771 LOBs; min-max: 3-508032 bytes; 1333 <1KB; 3625 <8KB; 10599
<64KB; 5214 <512KB;

D:\>... --export-sqlite GL-blobs-inline.db ... --blobs-as-hdf5=false
--blobs-inline=true
...
Total        =  37.586s
...
Write ROWs   =   7.068s (18.8%)
Write BLOBs  =   3.016s ( 8.0%)
Epilog       =  10.637s (28.3%)

D:\>... --export-sqlite GL-blobs-outofline.db ... --blobs-as-hdf5=false
--blobs-inline=false
...
Total        =  34.609s
...
Write ROWs   =   4.358s (12.6%)
Write BLOBs  =   3.130s ( 9.0%)
Epilog       =  10.221s (29.5%)

D:>... --export-sqlite GL-blobs-ashdf5.db ... --blobs-as-hdf5=true
--blobs-inline=false
...
Total        =  25.646s
...
Write ROWs   =   4.368s (17.0%)
Write BLOBs  =   4.179s (16.3%)
Epilog       =   0.461s ( 1.8%)

D:\> dir GL*
...
06/12/2018  10:51 AM        64,475,136 GL-blobs-ashdf5.db
06/12/2018  10:51 AM     1,039,652,416 GL-blobs-ashdf5.h5
06/12/2018  10:53 AM     1,110,253,568 GL-blobs-inline.db
06/12/2018  10:50 AM     1,111,736,320 GL-blobs-outofline.db
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to