On Sunday, 28 October, 2018 17:48, Gerlando Falauto <gerlando.fala...@gmail.com> wrote: >On Sun, Oct 28, 2018 at 6:18 PM Simon Slavin <slav...@bigfraud.org> wrote: >> On 28 Oct 2018, at 2:06pm, Gerlando Falauto <gerlando.fala...@gmail.com> >> wrote:
>>> - the SSD's wear due to continuous writes should be reduced to a minimum >> I assume your App generates items for the log one by one. >Yes, items are indeed generated one by one. However, since they're >coming from several (about a dozen) more-or-less-synchronized sources, they >would generally come in bursts. >> There's not much you can do to reduce wear if you want each one >logged as soon as possible. >I guess I'm looking for the best trade-off, but I don't really know >how things really work under the hood so I'd like to avoid making totally >wrong assumptions or stupid mistakes. >So for instance, on a bare NOR flash, I believe I might assume I can >safely write at any granularity (even a single bit) as long as I change 1's >to 0's. >On an SSD, I have no clue what's going on -- will writing a 2 MB >block one page at a time be much worse than writing an entire block at once? Perhaps, but that depends on the particular SSD memory controller, some are much better than others. The "crappy" ones will re-write the 2MB block over and over again each time you update some data within the block (so for example 4096 sequential 512 byte writes will cause the underlying 2 MB storage block to be erased and overwritten (or rather re-allocated) 4096 times. Better controllers will cache the updates in simpler storage and only push the block (2MB) down into flash when it is full (or the cache space is full). Generally speaking what happens is directly related to price -- you get what you pay for. If the device costs $25 it is likely operating in the former mode and will die quickly. If the device cost $2500 then it is likely operating in the latter mode (and probably with even more optimizations to when it actually pushes down to the flash blocks) and will last much longer. If well designed you will probably find that the $2500 device lasts 100 times longer than the $25 device, and probably much more than 100 times longer. So it depends whether you want to replace the $25 device every day or the $2500 device every quarter. You still end up paying out the same for the physical devices. It depends entirely on what your "time" and "effort" is worth for each replacement and the relative risk (and consequence) of that activity. Personally I would buy a $25000 dollar device (assuming that the lifetime enhancement is linear) and leave the replacement as an exercise for whomever replaced me after I retired :) Then again I worked in a field where "if it ain't broke don't fix it" and "lets pay $X for something that will work flawlessly for Y years without having to be touched, and then we will spend $X again to replace the whole kit and kaboodle with something else" are entirely valid and preferred risk and consequence management strategies. In other words, that the cost associated with doing things in this manner is trivial in comparison to the cost of failure -- the cost unit of failures being on the scale of "Yearly Salaries per Hour"; a cost which might even justify buying the $250,000 dollar version of the storage device ... >> In terms of deleting rows, if you want the least wear on your SSD, >> do the deletes seldomly. Definitely no more than once a day. > Hmmm, I don't understand, could you please elaborate on that? > I first tried this kind of approach (DELETE once a day) for > simplicity, but deleting big amounts of rows at once seemed to > potentially take a relatively long time and generate a very big > WAL file. > So my uneducated guess was then that I'd be better served by making > room (DELETE) just before adding new data (INSERT) so to reuse just- > deleted pages. I don't think this follows (that deleting less frequently reduces "wear" on the SSD). If you are overwriting X blocks it really makes no difference whether you overwrite those X blocks at an even rate spread over 24 hours or "all at once", you are still modifying (erasing) the same amount of blocks. However, I would generally prefer the "check and delete" every time before inserting because it has better performance (as you said) and it allows that storage controller of the SSD device may be able to more efficiently coalesce the updates and manage the underlying storage. >> I found the "Total_Erase_Count" reported by smartmontools to be an >> > interesting end-to-end metric in the long run >> If you care about that level of detail, using SQLite is the wrong >> approach. >> You should be logging by appending each entry to a text file. Or >> maybe to a new textfile every day/week/month. Then, to prune the >> log just delete one old textfile each time you start a new one. >I was afraid I would get that kind of answer at some point. It's not >that I do (or don't) care about that level of detail. >It's just that I'd like to get a rough idea of how much worse it is >to use SQLite compared to this approach, in terms of SSD wearout. >If it's a 2x factor, I'm probably quite OK with that. If it's a 100x >factor, then I have a problem. Again, this mostly is dependent on the implementation of the storage controller of the flash/ssd storage device. The "factor" is determined by how much you pay for the hardware and the hardware controller implementation. Simply appending a "row" to a text file will require overwriting the entire block containing the changed data so there is not really much difference in where you "insert" that new row, so there is not likely to be much difference in the access pattern of the underlying storage blocks in either case, simply whether or not the storage controller on the device can handle that kind of updating. For example, the controllers on so-called Thumb drives are optimized for sequentially writing a big hunk-o-data in a single go -- something like copying a movie to a flash drive so you can copy it to another device -- and are rather dismal at random access. They may last many years of copying Petabytes of data in this manner (coping entire big files). However, if used for "random access" (which includes appending small amounts of data to existing blocks) you might find that they fail in a few days with only a few Gigabytes of data written. >If there was a simple way to split the database into say 10 different >files (which could be queried against as a single database though), and >just rotate among those, I'd be very happy about it. Is there? Using >ATTACHed databases and carefully-crafted UNION statements perhaps? This is doable and there is even an extension to do that. However, it is not likely to provide any advantage at the storage device level and suffers from the inherent unreliability introduced by adding complexity. It might make a $2 dollar flash drive last a month rather than a day, but it is unlikely to have any effect on "designed for purpose" storage devices -- and the more you paid for that storage device, the less of an effect it is likely to have. --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users