Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Taras Glek

 On 08/19/2010 01:27 PM, Taras Glek wrote:

 Hi,
I really appreciate that sqlite got this feature to reduce 
fragmentation, but why not expose this as a pragma? In many cases it 
is not feasible to pass the chunk size via a C API. For example with a 
pragma I could do fragmentation testing via an sqlite shell, now this 
option is out without modifying the shell source. I presume that a 
similar problem exists for users of scripting languages until every 
single scripting host provides support for this.


Perhaps this should even persist similar to how page_size persists 
across connections.


Taras
To the patronizing "i don't believe this is a problem" and "even if it 
is, Linux is superior to windows" concerns:
Attached my earlier email to the dev list showing that fragmented files 
can be 3x slower to access than non-fragmented ones.


Trivia:
XFS on Linux gets a prize for being the most fragmentation-prone 
filesystem in my survey. It is roughly 10-30x worse than everybody else. 
Ext4 is ok, not great. Windows has a builtin autodefrag that on some 
occasions ensures that our Firefox databases are not fragmented(ie it 
beats the crap out of Linux contrary to the "classic" wisdom). Even 
Apple's horrific filesystem occasionally results in less fragmented 
files due to autodefrag.


For more details see

http://blog.mozilla.com/tglek/2010/07/22/file-fragmentation/

I already have a patch for Firefox that makes use of the new API, we 
don't require a new pragma. It just seems like an important feature for 
performance-tuning sqlite that isn't easy to get at for many users.


Taras
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Jay A. Kreibich
On Thu, Aug 19, 2010 at 06:46:07PM -0400, Jim Wilcoxson scratched on the wall:
> On 8/19/10, Simon Slavin  wrote:
> >
> > On 19 Aug 2010, at 9:27pm, Taras Glek wrote:
> >
> >> I really appreciate that sqlite got this feature to reduce
> >> fragmentation, but why not expose this as a pragma?
> >
> > Do you have figures which suggest that reducing fragmentation leads to any
> > improvement in performance ?
> 
> Whether SQLITE_FCNTL_CHUNKS_SIZE is useful is a different discussion.
> I myself would love to see features exposed via pragmas whenever
> possible, for the simple reason that I don't use the C API and can't
> make use of the features otherwise.  I would assume that since the
> SQLite developers added the feature to the C API, there must be a use
> for it or they wouldn't have bothered.

  From discussions in the past, I've gotten the feeling that the SQLite
  team considers the C API and the SQL interfaces (including PRAGMA
  commands) to have different security requirements.  This is why you
  need to make an API call to enable extension loading-- because it is
  viewed as something "dangerous" you can do from the SQL level. 
  Similarly, this is why there is no SQL PRAGMA interface to such
  functions as sqlite3_limit().

  Personally, I've found very few situations where a user might have
  access to an SQL prompt when they would not have access to database
  file (e.g. they can run sqlite3 or their own program and do whatever
  they want), but I suppose this may be true in some web environments
  or some scripting environments.

  Given how easy it is to build SQL wrappers in an extension (or
  built-in code), I don't see it as a major issue.  One of the "how
  to write a custom SQL function" examples in "Using SQLite" builds a
  wrapper around sqlite3_limit().  I know people that are not using the
  C interface are faced with additional challenges when it comes to
  loading extensions or modifying the core SQLite library, but if it is
  that important, it can usually be done.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Jay A. Kreibich
On Thu, Aug 19, 2010 at 10:56:58PM +0100, Simon Slavin scratched on the wall:

> It might be worth noting that fragmentation is normally seen as an issue
> only under Windows which is very sensitive to it however.

  Most systems are sensitive to fragmentation.  It just happens that
  many other system address the problem with more advanced filesystems
  and/or filesystem drivers.  HFS+, for example (the primary filesystem
  of Darwin and Mac OS X) will automatically defrag smaller files
  (< 20MB).  I'm told HFS+ will also tend to migrate frequently used
  files to the middle of the disk platters (smaller average seek time),
  but I haven't found definitive documentation on this. 
  
  A number of the more advanced filesystems commonly found in the UNIX
  world have similar features.

  I know Vista and Win7 introduced some auto-defrag features, but I
  don't have any significant personal experience with those systems.

> Also, many installations of SQLite are on solid state devices where,
> of course, fragmentation has no effect at all.

  Umm... no.  SSDs have very different access characteristics compared to
  spinning platters, but they most definitely do not allow uniform access
  times to the whole address space.  "Seek time" isn't exactly the right
  word, but they have read delays based off the last block accessed,
  and a whole slew of other factors, including how the individual
  storage chips are "stacked", how the chips are cut up into banks, how
  the controller is designed, and on and on.  Also, because of the way
  the SSD devices work at the chip level, they tend to "stream" data,
  rather than just access it, so there are some designs that are even
  more sensitive to fragmentation.

  In many ways they're much more challenging to write drivers for, as
  each device has different characteristics that cannot be accounted
  for by just looking at the device geometry.

  This is all made even more interesting by the fact that many SSDs
  utilize file systems that intentionally fragment and move blocks
  around with every write (including writes to existing space in
  existing files) to spread out the write cycles.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Stephen Oberholtzer
> If you make PRAGMA freelist_count writable as you suggest, I would
> expect the database to reserve space once, and not use a larger pice of
> the disk every time ist has to be expanded in the future.
>
> Martin


To throw in my $0.02, I would suggest a *different* name for the
pragma, something more like

PRAGMA reserve_space(N) where N is in pages or bytes or whatever.

That could call whatever functions are needed to force the database to
grow such that the minimum number of free pages/bytes in the database
is N.

-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Martin Engelschalk


Am 20.08.2010 13:38, schrieb Max Vlasov:
>> In my case (which is certainly not typical), a (several GB) large
>> database is built up in several batches, one table at a time, while in
>> parallel many intermediate files on the disk are created. This resulted
>> in a very fragmented database file. After that, also several times, the
>> data is selected in a way that uses 80-90% of the data in the database,
>> using joins of all tables and sorting.
>>
>> ...
>>
>> With the new feature available, i can remove my own workaround, which
>> does not work so well annyway. Many thanks to the developers.
>>
>>
> Martin, you gave a good example of the case when this really helps. Although
> I suppose you still need some tweaking. As Dan Kennedy wrote you have to set
> the the parameter and "From that point on, connection "db" extends and
> truncates  the db file in 1MB chunks". So for example if you just created a
> db and maybe did minor changes to the db and have plans to extend it to
> larger size, you have to set SQLITE_FCNTL_CHUNK_SIZE with
> sqlite3_file_control and also write something new and not only write but be
> sure it's not going to be written to a previously disposed page.
>
> As long as cases like yours is real and can be used in real life, maybe a
> change to existing freelist_count pragma is possible? If it is writable
> (PRAGMA freelist_count=1024;), sqlite compares the value supplied with the
> current count and if it is bigger allocates necessary space. It seems this
> syntax will be straightforward and self-explaing. What you think?
>
> Max

Hello Max,

Personally, I use the C Api and do not need any other interface to the 
functionality.

If you make PRAGMA freelist_count writable as you suggest, I would 
expect the database to reserve space once, and not use a larger pice of 
the disk every time ist has to be expanded in the future.

Martin

> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Simon Slavin

On 20 Aug 2010, at 7:58am, Martin Engelschalk wrote:

> Under Windows, the insert speed did not change measurably, but the speed 
> of the later selects increased by about 15-20%. Also, my customer was 
> happy.

Okay, this is real-world data.  In that case there may be some point to 
defragmentation.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Max Vlasov
> In my case (which is certainly not typical), a (several GB) large
> database is built up in several batches, one table at a time, while in
> parallel many intermediate files on the disk are created. This resulted
> in a very fragmented database file. After that, also several times, the
> data is selected in a way that uses 80-90% of the data in the database,
> using joins of all tables and sorting.
>
> ...
>
> With the new feature available, i can remove my own workaround, which
> does not work so well annyway. Many thanks to the developers.
>
>
Martin, you gave a good example of the case when this really helps. Although
I suppose you still need some tweaking. As Dan Kennedy wrote you have to set
the the parameter and "From that point on, connection "db" extends and
truncates  the db file in 1MB chunks". So for example if you just created a
db and maybe did minor changes to the db and have plans to extend it to
larger size, you have to set SQLITE_FCNTL_CHUNK_SIZE with
sqlite3_file_control and also write something new and not only write but be
sure it's not going to be written to a previously disposed page.

As long as cases like yours is real and can be used in real life, maybe a
change to existing freelist_count pragma is possible? If it is writable
(PRAGMA freelist_count=1024;), sqlite compares the value supplied with the
current count and if it is bigger allocates necessary space. It seems this
syntax will be straightforward and self-explaing. What you think?

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Shawn Wilsher
On Thu, Aug 19, 2010 at 2:56 PM, Simon Slavin  wrote:
> It might be worth noting that fragmentation is normally seen as an issue only 
> under Windows which is very sensitive to it however.  Other operating systems 
> use different ways of handling disk access, however, real figures from 
> real-world examples may disprove this classic view.  Also, many installations 
> of SQLite are on solid state devices where, of course, fragmentation has no 
> effect at all.
Really?  I can think of at least 350 million installations of SQLite
that very likely aren't on an SSD (hint: it's a web browser).

Cheers,

Shawn
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-20 Thread Martin Engelschalk


Am 19.08.2010 23:56, schrieb Simon Slavin:
> On 19 Aug 2010, at 9:27pm, Taras Glek wrote:
>
>> I really appreciate that sqlite got this feature to reduce
>> fragmentation, but why not expose this as a pragma?
> Do you have figures which suggest that reducing fragmentation leads to any 
> improvement in performance ?
Yes, see below.
> It might be worth noting that fragmentation is normally seen as an issue only 
> under Windows which is very sensitive to it however.  Other operating systems 
> use different ways of handling disk access, however, real figures from 
> real-world examples may disprove this classic view.  Also, many installations 
> of SQLite are on solid state devices where, of course, fragmentation has no 
> effect at all.
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Hello Simon,

I agree completely.

In my case (which is certainly not typical), a (several GB) large 
database is built up in several batches, one table at a time, while in 
parallel many intermediate files on the disk are created. This resulted 
in a very fragmented database file. After that, also several times, the 
data is selected in a way that uses 80-90% of the data in the database, 
using joins of all tables and sorting.

The fragmentation was not a problem for me, but one of my customers did 
not like it. As far as i understood, some automatic tool monitored disk 
fragmentation and generated alarms; also, a backup tool slowed down.
So, while inserting, at strategic places, I created a dummy table with a 
blob field and filled it with a very large empty blob. Then I dropped 
the table. In this way i simulated the new feature.

Under Windows, the insert speed did not change measurably, but the speed 
of the later selects increased by about 15-20%. Also, my customer was 
happy.

With the new feature available, i can remove my own workaround, which 
does not work so well annyway. Many thanks to the developers.

Martin


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-19 Thread Max Vlasov
Whether SQLITE_FCNTL_CHUNKS_SIZE is useful is a different discussion.
> I myself would love to see features exposed via pragmas whenever
> possible, for the simple reason that I don't use the C API and can't
> make use of the features otherwise.  I would assume that since the
> SQLite developers added the feature to the C API, there must be a use
> for it or they wouldn't have bothered.
>
>
You have your reason for wanting pragma more, but for a general developer
using Pragma instead of api calls is worse in the long run since Pragma
generates no errors and the docs says there's no guarantee the syntax will
not change and the results will be the same  (The recent WAL-related changes
to journal_mode is a good example). In case of api calls, we at least have
compiler's errors and warnings.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-19 Thread Jim Wilcoxson
On 8/19/10, Simon Slavin  wrote:
>
> On 19 Aug 2010, at 9:27pm, Taras Glek wrote:
>
>> I really appreciate that sqlite got this feature to reduce
>> fragmentation, but why not expose this as a pragma?
>
> Do you have figures which suggest that reducing fragmentation leads to any
> improvement in performance ?

Whether SQLITE_FCNTL_CHUNKS_SIZE is useful is a different discussion.
I myself would love to see features exposed via pragmas whenever
possible, for the simple reason that I don't use the C API and can't
make use of the features otherwise.  I would assume that since the
SQLite developers added the feature to the C API, there must be a use
for it or they wouldn't have bothered.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-19 Thread Simon Slavin

On 19 Aug 2010, at 9:27pm, Taras Glek wrote:

> I really appreciate that sqlite got this feature to reduce 
> fragmentation, but why not expose this as a pragma?

Do you have figures which suggest that reducing fragmentation leads to any 
improvement in performance ?

It might be worth noting that fragmentation is normally seen as an issue only 
under Windows which is very sensitive to it however.  Other operating systems 
use different ways of handling disk access, however, real figures from 
real-world examples may disprove this classic view.  Also, many installations 
of SQLite are on solid state devices where, of course, fragmentation has no 
effect at all.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-19 Thread Max Vlasov
On Fri, Aug 20, 2010 at 12:27 AM, Taras Glek  wrote:

>  Hi,
> I really appreciate that sqlite got this feature to reduce
> fragmentation, but why not expose this as a pragma?
>

Taras, I think that you're overestimating the feature. On the OS level it
won't matter how far the file pointer will go, a larger block still can be
fragmented if the OS don't find a continuous block. On the sqlite pager
level it's the same, for identical appends your data will probably get the
same pages in the file regardless of the chunk size. I suppose it has some
benefits in some specific cases and if it gets its own pragma it will just
add confusion.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-19 Thread Taras Glek
  Hi,
I really appreciate that sqlite got this feature to reduce 
fragmentation, but why not expose this as a pragma? In many cases it is 
not feasible to pass the chunk size via a C API. For example with a 
pragma I could do fragmentation testing via an sqlite shell, now this 
option is out without modifying the shell source. I presume that a 
similar problem exists for users of scripting languages until every 
single scripting host provides support for this.

Perhaps this should even persist similar to how page_size persists 
across connections.

Taras
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users