Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
On 08/19/2010 01:27 PM, Taras Glek wrote: Hi, I really appreciate that sqlite got this feature to reduce fragmentation, but why not expose this as a pragma? In many cases it is not feasible to pass the chunk size via a C API. For example with a pragma I could do fragmentation testing via an sqlite shell, now this option is out without modifying the shell source. I presume that a similar problem exists for users of scripting languages until every single scripting host provides support for this. Perhaps this should even persist similar to how page_size persists across connections. Taras To the patronizing "i don't believe this is a problem" and "even if it is, Linux is superior to windows" concerns: Attached my earlier email to the dev list showing that fragmented files can be 3x slower to access than non-fragmented ones. Trivia: XFS on Linux gets a prize for being the most fragmentation-prone filesystem in my survey. It is roughly 10-30x worse than everybody else. Ext4 is ok, not great. Windows has a builtin autodefrag that on some occasions ensures that our Firefox databases are not fragmented(ie it beats the crap out of Linux contrary to the "classic" wisdom). Even Apple's horrific filesystem occasionally results in less fragmented files due to autodefrag. For more details see http://blog.mozilla.com/tglek/2010/07/22/file-fragmentation/ I already have a patch for Firefox that makes use of the new API, we don't require a new pragma. It just seems like an important feature for performance-tuning sqlite that isn't easy to get at for many users. Taras ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
On Thu, Aug 19, 2010 at 06:46:07PM -0400, Jim Wilcoxson scratched on the wall: > On 8/19/10, Simon Slavinwrote: > > > > On 19 Aug 2010, at 9:27pm, Taras Glek wrote: > > > >> I really appreciate that sqlite got this feature to reduce > >> fragmentation, but why not expose this as a pragma? > > > > Do you have figures which suggest that reducing fragmentation leads to any > > improvement in performance ? > > Whether SQLITE_FCNTL_CHUNKS_SIZE is useful is a different discussion. > I myself would love to see features exposed via pragmas whenever > possible, for the simple reason that I don't use the C API and can't > make use of the features otherwise. I would assume that since the > SQLite developers added the feature to the C API, there must be a use > for it or they wouldn't have bothered. From discussions in the past, I've gotten the feeling that the SQLite team considers the C API and the SQL interfaces (including PRAGMA commands) to have different security requirements. This is why you need to make an API call to enable extension loading-- because it is viewed as something "dangerous" you can do from the SQL level. Similarly, this is why there is no SQL PRAGMA interface to such functions as sqlite3_limit(). Personally, I've found very few situations where a user might have access to an SQL prompt when they would not have access to database file (e.g. they can run sqlite3 or their own program and do whatever they want), but I suppose this may be true in some web environments or some scripting environments. Given how easy it is to build SQL wrappers in an extension (or built-in code), I don't see it as a major issue. One of the "how to write a custom SQL function" examples in "Using SQLite" builds a wrapper around sqlite3_limit(). I know people that are not using the C interface are faced with additional challenges when it comes to loading extensions or modifying the core SQLite library, but if it is that important, it can usually be done. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
On Thu, Aug 19, 2010 at 10:56:58PM +0100, Simon Slavin scratched on the wall: > It might be worth noting that fragmentation is normally seen as an issue > only under Windows which is very sensitive to it however. Most systems are sensitive to fragmentation. It just happens that many other system address the problem with more advanced filesystems and/or filesystem drivers. HFS+, for example (the primary filesystem of Darwin and Mac OS X) will automatically defrag smaller files (< 20MB). I'm told HFS+ will also tend to migrate frequently used files to the middle of the disk platters (smaller average seek time), but I haven't found definitive documentation on this. A number of the more advanced filesystems commonly found in the UNIX world have similar features. I know Vista and Win7 introduced some auto-defrag features, but I don't have any significant personal experience with those systems. > Also, many installations of SQLite are on solid state devices where, > of course, fragmentation has no effect at all. Umm... no. SSDs have very different access characteristics compared to spinning platters, but they most definitely do not allow uniform access times to the whole address space. "Seek time" isn't exactly the right word, but they have read delays based off the last block accessed, and a whole slew of other factors, including how the individual storage chips are "stacked", how the chips are cut up into banks, how the controller is designed, and on and on. Also, because of the way the SSD devices work at the chip level, they tend to "stream" data, rather than just access it, so there are some designs that are even more sensitive to fragmentation. In many ways they're much more challenging to write drivers for, as each device has different characteristics that cannot be accounted for by just looking at the device geometry. This is all made even more interesting by the fact that many SSDs utilize file systems that intentionally fragment and move blocks around with every write (including writes to existing space in existing files) to spread out the write cycles. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
> If you make PRAGMA freelist_count writable as you suggest, I would > expect the database to reserve space once, and not use a larger pice of > the disk every time ist has to be expanded in the future. > > Martin To throw in my $0.02, I would suggest a *different* name for the pragma, something more like PRAGMA reserve_space(N) where N is in pages or bytes or whatever. That could call whatever functions are needed to force the database to grow such that the minimum number of free pages/bytes in the database is N. -- -- Stevie-O Real programmers use COPY CON PROGRAM.EXE ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
Am 20.08.2010 13:38, schrieb Max Vlasov: >> In my case (which is certainly not typical), a (several GB) large >> database is built up in several batches, one table at a time, while in >> parallel many intermediate files on the disk are created. This resulted >> in a very fragmented database file. After that, also several times, the >> data is selected in a way that uses 80-90% of the data in the database, >> using joins of all tables and sorting. >> >> ... >> >> With the new feature available, i can remove my own workaround, which >> does not work so well annyway. Many thanks to the developers. >> >> > Martin, you gave a good example of the case when this really helps. Although > I suppose you still need some tweaking. As Dan Kennedy wrote you have to set > the the parameter and "From that point on, connection "db" extends and > truncates the db file in 1MB chunks". So for example if you just created a > db and maybe did minor changes to the db and have plans to extend it to > larger size, you have to set SQLITE_FCNTL_CHUNK_SIZE with > sqlite3_file_control and also write something new and not only write but be > sure it's not going to be written to a previously disposed page. > > As long as cases like yours is real and can be used in real life, maybe a > change to existing freelist_count pragma is possible? If it is writable > (PRAGMA freelist_count=1024;), sqlite compares the value supplied with the > current count and if it is bigger allocates necessary space. It seems this > syntax will be straightforward and self-explaing. What you think? > > Max Hello Max, Personally, I use the C Api and do not need any other interface to the functionality. If you make PRAGMA freelist_count writable as you suggest, I would expect the database to reserve space once, and not use a larger pice of the disk every time ist has to be expanded in the future. Martin > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
On 20 Aug 2010, at 7:58am, Martin Engelschalk wrote: > Under Windows, the insert speed did not change measurably, but the speed > of the later selects increased by about 15-20%. Also, my customer was > happy. Okay, this is real-world data. In that case there may be some point to defragmentation. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
> In my case (which is certainly not typical), a (several GB) large > database is built up in several batches, one table at a time, while in > parallel many intermediate files on the disk are created. This resulted > in a very fragmented database file. After that, also several times, the > data is selected in a way that uses 80-90% of the data in the database, > using joins of all tables and sorting. > > ... > > With the new feature available, i can remove my own workaround, which > does not work so well annyway. Many thanks to the developers. > > Martin, you gave a good example of the case when this really helps. Although I suppose you still need some tweaking. As Dan Kennedy wrote you have to set the the parameter and "From that point on, connection "db" extends and truncates the db file in 1MB chunks". So for example if you just created a db and maybe did minor changes to the db and have plans to extend it to larger size, you have to set SQLITE_FCNTL_CHUNK_SIZE with sqlite3_file_control and also write something new and not only write but be sure it's not going to be written to a previously disposed page. As long as cases like yours is real and can be used in real life, maybe a change to existing freelist_count pragma is possible? If it is writable (PRAGMA freelist_count=1024;), sqlite compares the value supplied with the current count and if it is bigger allocates necessary space. It seems this syntax will be straightforward and self-explaing. What you think? Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
On Thu, Aug 19, 2010 at 2:56 PM, Simon Slavinwrote: > It might be worth noting that fragmentation is normally seen as an issue only > under Windows which is very sensitive to it however. Other operating systems > use different ways of handling disk access, however, real figures from > real-world examples may disprove this classic view. Also, many installations > of SQLite are on solid state devices where, of course, fragmentation has no > effect at all. Really? I can think of at least 350 million installations of SQLite that very likely aren't on an SSD (hint: it's a web browser). Cheers, Shawn ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
Am 19.08.2010 23:56, schrieb Simon Slavin: > On 19 Aug 2010, at 9:27pm, Taras Glek wrote: > >> I really appreciate that sqlite got this feature to reduce >> fragmentation, but why not expose this as a pragma? > Do you have figures which suggest that reducing fragmentation leads to any > improvement in performance ? Yes, see below. > It might be worth noting that fragmentation is normally seen as an issue only > under Windows which is very sensitive to it however. Other operating systems > use different ways of handling disk access, however, real figures from > real-world examples may disprove this classic view. Also, many installations > of SQLite are on solid state devices where, of course, fragmentation has no > effect at all. > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users Hello Simon, I agree completely. In my case (which is certainly not typical), a (several GB) large database is built up in several batches, one table at a time, while in parallel many intermediate files on the disk are created. This resulted in a very fragmented database file. After that, also several times, the data is selected in a way that uses 80-90% of the data in the database, using joins of all tables and sorting. The fragmentation was not a problem for me, but one of my customers did not like it. As far as i understood, some automatic tool monitored disk fragmentation and generated alarms; also, a backup tool slowed down. So, while inserting, at strategic places, I created a dummy table with a blob field and filled it with a very large empty blob. Then I dropped the table. In this way i simulated the new feature. Under Windows, the insert speed did not change measurably, but the speed of the later selects increased by about 15-20%. Also, my customer was happy. With the new feature available, i can remove my own workaround, which does not work so well annyway. Many thanks to the developers. Martin ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
Whether SQLITE_FCNTL_CHUNKS_SIZE is useful is a different discussion. > I myself would love to see features exposed via pragmas whenever > possible, for the simple reason that I don't use the C API and can't > make use of the features otherwise. I would assume that since the > SQLite developers added the feature to the C API, there must be a use > for it or they wouldn't have bothered. > > You have your reason for wanting pragma more, but for a general developer using Pragma instead of api calls is worse in the long run since Pragma generates no errors and the docs says there's no guarantee the syntax will not change and the results will be the same (The recent WAL-related changes to journal_mode is a good example). In case of api calls, we at least have compiler's errors and warnings. Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
On 8/19/10, Simon Slavinwrote: > > On 19 Aug 2010, at 9:27pm, Taras Glek wrote: > >> I really appreciate that sqlite got this feature to reduce >> fragmentation, but why not expose this as a pragma? > > Do you have figures which suggest that reducing fragmentation leads to any > improvement in performance ? Whether SQLITE_FCNTL_CHUNKS_SIZE is useful is a different discussion. I myself would love to see features exposed via pragmas whenever possible, for the simple reason that I don't use the C API and can't make use of the features otherwise. I would assume that since the SQLite developers added the feature to the C API, there must be a use for it or they wouldn't have bothered. Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
On 19 Aug 2010, at 9:27pm, Taras Glek wrote: > I really appreciate that sqlite got this feature to reduce > fragmentation, but why not expose this as a pragma? Do you have figures which suggest that reducing fragmentation leads to any improvement in performance ? It might be worth noting that fragmentation is normally seen as an issue only under Windows which is very sensitive to it however. Other operating systems use different ways of handling disk access, however, real figures from real-world examples may disprove this classic view. Also, many installations of SQLite are on solid state devices where, of course, fragmentation has no effect at all. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
On Fri, Aug 20, 2010 at 12:27 AM, Taras Glekwrote: > Hi, > I really appreciate that sqlite got this feature to reduce > fragmentation, but why not expose this as a pragma? > Taras, I think that you're overestimating the feature. On the OS level it won't matter how far the file pointer will go, a larger block still can be fragmented if the OS don't find a continuous block. On the sqlite pager level it's the same, for identical appends your data will probably get the same pages in the file regardless of the chunk size. I suppose it has some benefits in some specific cases and if it gets its own pragma it will just add confusion. Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
Hi, I really appreciate that sqlite got this feature to reduce fragmentation, but why not expose this as a pragma? In many cases it is not feasible to pass the chunk size via a C API. For example with a pragma I could do fragmentation testing via an sqlite shell, now this option is out without modifying the shell source. I presume that a similar problem exists for users of scripting languages until every single scripting host provides support for this. Perhaps this should even persist similar to how page_size persists across connections. Taras ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users