Re: [sqlite] SQLite for datalogging - best practices

Keith Medcalf Sun, 28 Oct 2018 17:52:15 -0700

On Sunday, 28 October, 2018 17:48, Gerlando Falauto 
<gerlando.fala...@gmail.com> wrote:
>On Sun, Oct 28, 2018 at 6:18 PM Simon Slavin <slav...@bigfraud.org> wrote:
>> On 28 Oct 2018, at 2:06pm, Gerlando Falauto <gerlando.fala...@gmail.com> 
>> wrote:


>>> - the SSD's wear due to continuous writes should be reduced to a minimum

>> I assume your App generates items for the log one by one.

>Yes, items are indeed generated one by one. However, since they're
>coming from several (about a dozen) more-or-less-synchronized sources, they
>would generally come in bursts.

>> There's not much you can do to reduce wear if you want each one
>logged as soon as possible.

>I guess I'm looking for the best trade-off, but I don't really know
>how things really work under the hood so I'd like to avoid making totally
>wrong assumptions or stupid mistakes.

>So for instance, on a bare NOR flash, I believe I might assume I can
>safely write at any granularity (even a single bit) as long as I change 1's
>to 0's.

>On an SSD, I have no clue what's going on -- will writing a 2 MB
>block one page at a time be much worse than writing an entire block at once?

Perhaps, but that depends on the particular SSD memory controller, some are 
much better than others.  The "crappy" ones will re-write the 2MB block over 
and over again each time you update some data within the block (so for example 
4096 sequential 512 byte writes will cause the underlying 2 MB storage block to 
be erased and overwritten (or rather re-allocated) 4096 times.  Better 
controllers will cache the updates in simpler storage and only push the block 
(2MB) down into flash when it is full (or the cache space is full).  

Generally speaking what happens is directly related to price -- you get what 
you pay for.  If the device costs $25 it is likely operating in the former mode 
and will die quickly.  If the device cost $2500 then it is likely operating in 
the latter mode (and probably with even more optimizations to when it actually 
pushes down to the flash blocks) and will last much longer.  If well designed 
you will probably find that the $2500 device lasts 100 times longer than the 
$25 device, and probably much more than 100 times longer.  So it depends 
whether you want to replace the $25 device every day or the $2500 device every 
quarter.  

You still end up paying out the same for the physical devices.  It depends 
entirely on what your "time" and "effort" is worth for each replacement and the 
relative risk (and consequence) of that activity.  

Personally I would buy a $25000 dollar device (assuming that the lifetime 
enhancement is linear) and leave the replacement as an exercise for whomever 
replaced me after I retired :)  Then again I worked in a field where "if it 
ain't broke don't fix it" and "lets pay $X for something that will work 
flawlessly for Y years without having to be touched, and then we will spend $X 
again to replace the whole kit and kaboodle with something else" are entirely 
valid and preferred risk and consequence management strategies.  In other 
words, that the cost associated with doing things in this manner is trivial in 
comparison to the cost of failure -- the cost unit of failures being on the 
scale of "Yearly Salaries per Hour"; a cost which might even justify buying the 
$250,000 dollar version of the storage device ...

>> In terms of deleting rows, if you want the least wear on your SSD,
>> do the deletes seldomly.  Definitely no more than once a day.

> Hmmm, I don't understand, could you please elaborate on that?
> I first tried this kind of approach (DELETE once a day) for
> simplicity, but deleting big amounts of rows at once seemed to 
> potentially take a relatively long time and generate a very big 
> WAL file. 
> So my uneducated guess was then that I'd be better served by making
> room (DELETE) just before adding new data (INSERT) so to reuse just-
> deleted pages.

I don't think this follows (that deleting less frequently reduces "wear" on the 
SSD).  If you are overwriting X blocks it really makes no difference whether 
you overwrite those X blocks at an even rate spread over 24 hours or "all at 
once", you are still modifying (erasing) the same amount of blocks.  However, I 
would generally prefer the "check and delete" every time before inserting 
because it has better performance (as you said) and it allows that storage 
controller of the SSD device may be able to more efficiently coalesce the 
updates and manage the underlying storage.

>> I found the "Total_Erase_Count" reported by smartmontools to be an
>> > interesting end-to-end metric in the long run

>> If you care about that level of detail, using SQLite is the wrong
>> approach. 

>> You should be logging by appending each entry to a text file.  Or
>> maybe to a new textfile every day/week/month.  Then, to prune the 
>> log just delete one old textfile each time you start a new one.

>I was afraid I would get that kind of answer at some point. It's not
>that I do (or don't) care about that level of detail.
>It's just that I'd like to get a rough idea of how much worse it is
>to use SQLite compared to this approach, in terms of SSD wearout.

>If it's a 2x factor, I'm probably quite OK with that. If it's a 100x
>factor, then I have a problem.

Again, this mostly is dependent on the implementation of the storage controller 
of the flash/ssd storage device.  The "factor" is determined by how much you 
pay for the hardware and the hardware controller implementation.  Simply 
appending a "row" to a text file will require overwriting the entire block 
containing the changed data so there is not really much difference in where you 
"insert" that new row, so there is not likely to be much difference in the 
access pattern of the underlying storage blocks in either case, simply whether 
or not the storage controller on the device can handle that kind of updating.

For example, the controllers on so-called Thumb drives are optimized for 
sequentially writing a big hunk-o-data in a single go -- something like copying 
a movie to a flash drive so you can copy it to another device -- and are rather 
dismal at random access.  They may last many years of copying Petabytes of data 
in this manner (coping entire big files).  However, if used for "random access" 
(which includes appending small amounts of data to existing blocks) you might 
find that they fail in a few days with only a few Gigabytes of data written.

>If there was a simple way to split the database into say 10 different
>files (which could be queried against as a single database though), and
>just rotate among those, I'd be very happy about it. Is there? Using
>ATTACHed databases and carefully-crafted UNION statements perhaps?

This is doable and there is even an extension to do that.  However, it is not 
likely to provide any advantage at the storage device level and suffers from 
the inherent unreliability introduced by adding complexity.  It might make a $2 
dollar flash drive last a month rather than a day, but it is unlikely to have 
any effect on "designed for purpose" storage devices -- and the more you paid 
for that storage device, the less of an effect it is likely to have.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SQLite for datalogging - best practices

Reply via email to