I'm surprised about the commit time of SQLite, when writing blobs is involved. Can anybody shed light on this subject? Below's a description of what I do, with the results.
I've exporting data into SQLite, spread in several tables. I process only about 240,000 rows, and write around 1GB in 20,000 blobs. I process the data in two passes: 1) write the rows, record rowid when a row has a blob, zeroblob(N) during insert, to leave room for Incremental Blob I/O later in 2nd pass. 2) process blobs only Then I compare this with writing the blobs in HDF5 instead. Instead of writing the blob itself in-row, I just insert its size in SQLite. (HDF5 is a "filesystem in a file", non-transactional, with "files" being N-dimensional typed arrays, thus a 1D array of bytes approximates a "file") Results writing both rows and blobs in SQLite: Write ROWs = 6.968s (18.7%) (includes the zeroblob(N)) Write BLOBs = 2.764s ( 7.4%) Commit TX = 10.566s (28.4%) Results is a single 1,110,253,568 bytes SQLite DB file. Now writing only rows to SQLite and blobs to HDF5: Write ROWs = 4.265s (16.6%) Write BLOBs = 4.010s (15.6%) Commit TX = 0.195s ( 0.8%) (includes flushing the HDF5 file too) Results in two files, 64,475,136 bytes SQLite DB, and 1,039,652,416 bytes HDF5 file. SQLite (version 3.19.3) is used in journal mode, and DB written "from scratch", so there's not much to put in the journal, AFAIK. My assumption was that after the zeroblob(N), there was enough room in the main DBs pages, such that the subsequent blob open+write+close did not need to generate any "page churn" (i.e. journal activity) and could write directly to the pages created on initial insert. But what surprised me was the commit time jumping to 10s, i.e. 50x more that when not writing the blobs. After the initial zeroblob(N) + sqlite3_last_insert_rowid(), part of "Write ROWs", the code writing the blobs is below (simplified, error handling removed). sqlite3_blob* p_blob = nullptr; sqlite3_blob_open(db, "main", tab, col, rowid, /*read-write*/1, &p_blob); assert(sqlite3_blob_bytes(p_blob) == length, ;); sqlite3_blob_write(p_blob, buffer, static_cast<int>(length), 0); sqlite3_blob_close(p_blob); So if the code above "the write way" to write blobs? I don't yet have the buffers to write the blobs directly at-row-insert-time, for reasons beyond this particular problem. Please assume this is by-design. I can understand that 64MB I/O vs 1GB I/O takes more time, and that's reflected in the 4.2s writing just the rows, versus the 7s + 2.7s = 9.7s when writing both rows and blobs, but why the commit time jumps from 0.2s to 10.5s ??? Thanks for any insights in this. Thanks, --DD _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users