Re: [sqlite] Read only scaling optimization
Pavel, Thank you for this information. It is exactly what I was looking for. That said if you are absolutely sure that no other process will write to the database while you are reading you can create your own VFS and make xAccess and xCheckReservedLock methods a no-op. You can read more about VFS starting from here http://www.sqlite.org/c3ref/vfs_find.html (follow the links to read all related info). Pavel ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read only scaling optimization
> Just by having 2 programs access 2 > different files on the same disk give that much of a performance/scaling > gain causes me to believe that there exists a vital area of code that is > mutexed that is the bottleneck. I hope you are talking about your OS kernel, because SQLite won't ever care whether 2 different files reside on a same hdd or on different ones. In both cases it executes the same code. > So I'll ask the question. Is there any compile time or pragma's engineered > towards read only performance as locking a table is no longer needed. To > save mutexing in cache we would also be ok if each thread would have it's > cache so it doesn't have to obtain a lock going into it. With default settings each connection has its own cache, they have very little interference with each other. And even if you have some custom settings different files have different caches anyway. Regarding optimization of read-only access: the problem is not that you only reading a file and don't need locks, the problem is to prevent any other process that could write into the same file from doing that while your reading transaction is active. So no, SQLite doesn't have any settings or pragmas to optimize such access. That said if you are absolutely sure that no other process will write to the database while you are reading you can create your own VFS and make xAccess and xCheckReservedLock methods a no-op. You can read more about VFS starting from here http://www.sqlite.org/c3ref/vfs_find.html (follow the links to read all related info). Pavel On Mon, Aug 15, 2011 at 9:00 AM, Drew Kozicki wrote: > The bottleneck appears to be mutex's locking access to the file and cache > inside the same program. I'm guessing that the reason why 2 programs on 2 > files on 2 hdd's did not perform as well as the one too me suggests that > both hdd's do not have the same performance and that it is not access to the > physical disk that is the problem. Just by having 2 programs access 2 > different files on the same disk give that much of a performance/scaling > gain causes me to believe that there exists a vital area of code that is > mutexed that is the bottleneck. > > The source code, unfortunately, I can't post as it is the property of the > company I work for. > > So I'll ask the question. Is there any compile time or pragma's engineered > towards read only performance as locking a table is no longer needed. To > save mutexing in cache we would also be ok if each thread would have it's > cache so it doesn't have to obtain a lock going into it. > > Thank you all once again for your help, > > Drew > > Message: 17 > Date: Fri, 12 Aug 2011 19:28:33 +0100 > From: Simon Slavin > Subject: Re: [sqlite] Read only scaling optimization > To: General Discussion of SQLite Database > Message-ID: > Content-Type: text/plain; charset=us-ascii > > > On 12 Aug 2011, at 7:01pm, Drew Kozicki wrote: > >> I have a Driver doing this pulling in 32 queries aimed at randomness and >> different tables, much like that would be experienced in typical usage. > Best >> performance comes from having 2 separate programs running on 2 separate >> files. > > I'm no expert, but that suggests to me that your bottleneck is access to the > physical file on disk. So your greatest speed increases will come not from > more threads but from a very fast hard disk drive, lots of hard drive > caching, etc.. > > That's a great set of benchmarks, by the way. > > Simon. > > -- > > Message: 18 > Date: Fri, 12 Aug 2011 14:35:29 -0400 > From: Pavel Ivanov > Subject: Re: [sqlite] Read only scaling optimization > To: General Discussion of SQLite Database > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > >>> I have a Driver doing this pulling in 32 queries aimed at randomness and >>> different tables, much like that would be experienced in typical usage. > Best >>> performance comes from having 2 separate programs running on 2 separate >>> files. >> >> I'm no expert, but that suggests to me that your bottleneck is access to > the physical file on disk. ?So your greatest speed increases will come not > from more threads but from a very fast hard disk drive, lots of hard drive > caching, etc.. > > It's a little surprising to me that with all the same conditions 2 > files residing on the same drive have better performance than the same > files residing on different drives. Theoretically that shouldn't > happen. > > > Pavel > > > Message: 2 > Date: Sun, 14 Aug 2011 17:53:26 +0400 > From: Alexey Pechnikov > Subject
Re: [sqlite] Read only scaling optimization
The bottleneck appears to be mutex's locking access to the file and cache inside the same program. I'm guessing that the reason why 2 programs on 2 files on 2 hdd's did not perform as well as the one too me suggests that both hdd's do not have the same performance and that it is not access to the physical disk that is the problem. Just by having 2 programs access 2 different files on the same disk give that much of a performance/scaling gain causes me to believe that there exists a vital area of code that is mutexed that is the bottleneck. The source code, unfortunately, I can't post as it is the property of the company I work for. So I'll ask the question. Is there any compile time or pragma's engineered towards read only performance as locking a table is no longer needed. To save mutexing in cache we would also be ok if each thread would have it's cache so it doesn't have to obtain a lock going into it. Thank you all once again for your help, Drew Message: 17 Date: Fri, 12 Aug 2011 19:28:33 +0100 From: Simon Slavin Subject: Re: [sqlite] Read only scaling optimization To: General Discussion of SQLite Database Message-ID: Content-Type: text/plain; charset=us-ascii On 12 Aug 2011, at 7:01pm, Drew Kozicki wrote: > I have a Driver doing this pulling in 32 queries aimed at randomness and > different tables, much like that would be experienced in typical usage. Best > performance comes from having 2 separate programs running on 2 separate > files. I'm no expert, but that suggests to me that your bottleneck is access to the physical file on disk. So your greatest speed increases will come not from more threads but from a very fast hard disk drive, lots of hard drive caching, etc.. That's a great set of benchmarks, by the way. Simon. -- Message: 18 Date: Fri, 12 Aug 2011 14:35:29 -0400 From: Pavel Ivanov Subject: Re: [sqlite] Read only scaling optimization To: General Discussion of SQLite Database Message-ID: Content-Type: text/plain; charset=ISO-8859-1 >> I have a Driver doing this pulling in 32 queries aimed at randomness and >> different tables, much like that would be experienced in typical usage. Best >> performance comes from having 2 separate programs running on 2 separate >> files. > > I'm no expert, but that suggests to me that your bottleneck is access to the physical file on disk. ?So your greatest speed increases will come not from more threads but from a very fast hard disk drive, lots of hard drive caching, etc.. It's a little surprising to me that with all the same conditions 2 files residing on the same drive have better performance than the same files residing on different drives. Theoretically that shouldn't happen. Pavel Message: 2 Date: Sun, 14 Aug 2011 17:53:26 +0400 From: Alexey Pechnikov Subject: Re: [sqlite] Read only scaling optimization To: General Discussion of SQLite Database Message-ID: Content-Type: text/plain; charset=ISO-8859-1 2011/8/12 Pavel Ivanov : > It's a little surprising to me that with all the same conditions 2 > files residing on the same drive have better performance than the same > files residing on different drives. Theoretically that shouldn't > happen. Yes, it's not right behaviour. Is needed the sources of the test programm. -- Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read only scaling optimization
2011/8/12 Pavel Ivanov : > It's a little surprising to me that with all the same conditions 2 > files residing on the same drive have better performance than the same > files residing on different drives. Theoretically that shouldn't > happen. Yes, it's not right behaviour. Is needed the sources of the test programm. -- Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read only scaling optimization
>> I have a Driver doing this pulling in 32 queries aimed at randomness and >> different tables, much like that would be experienced in typical usage. Best >> performance comes from having 2 separate programs running on 2 separate >> files. > > I'm no expert, but that suggests to me that your bottleneck is access to the > physical file on disk. So your greatest speed increases will come not from > more threads but from a very fast hard disk drive, lots of hard drive > caching, etc.. It's a little surprising to me that with all the same conditions 2 files residing on the same drive have better performance than the same files residing on different drives. Theoretically that shouldn't happen. Pavel On Fri, Aug 12, 2011 at 2:28 PM, Simon Slavin wrote: > > On 12 Aug 2011, at 7:01pm, Drew Kozicki wrote: > >> I have a Driver doing this pulling in 32 queries aimed at randomness and >> different tables, much like that would be experienced in typical usage. Best >> performance comes from having 2 separate programs running on 2 separate >> files. > > I'm no expert, but that suggests to me that your bottleneck is access to the > physical file on disk. So your greatest speed increases will come not from > more threads but from a very fast hard disk drive, lots of hard drive > caching, etc.. > > That's a great set of benchmarks, by the way. > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read only scaling optimization
On 12 Aug 2011, at 7:01pm, Drew Kozicki wrote: > I have a Driver doing this pulling in 32 queries aimed at randomness and > different tables, much like that would be experienced in typical usage. Best > performance comes from having 2 separate programs running on 2 separate > files. I'm no expert, but that suggests to me that your bottleneck is access to the physical file on disk. So your greatest speed increases will come not from more threads but from a very fast hard disk drive, lots of hard drive caching, etc.. That's a great set of benchmarks, by the way. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read only scaling optimization
Alright so to sum up all the testing here it goes. I have a Driver doing this pulling in 32 queries aimed at randomness and different tables, much like that would be experienced in typical usage. Best performance comes from having 2 separate programs running on 2 separate files. Single file different connection each thread: 1 thread: 64,100,000 Queries/HR 2 threds: 98,800,000 Queries/HR 4 threads: 115,600,000 Queries/Hr 8 threads: 119,200,000 Queries/HR 16 Threads: 129,600,000 Queries /HR Different file different hdd 2 programs 2 total threads : 102,300,000 Queries/HR 4 total threads: 156,320,000 Queries/ HR 8 total threads 155,440,000 Quereis/HR 2 Programs same file 2 thread total: 107,000,000 Quereis/HR 4 thread total: 117,800,000 Quereis/HR 8 thread total: 133,400,000 Queries/HR 16 thread total:133,840,000 Queries/HR 2 programs different file same hdd 2 thread total: 127,400,000 Queries/Hr 4 Threads total: 169,400,00 Queries/Hr 8 threads total: 180,400,000 Queries/HR Different file different hdd same program 50% of threads per file 2 threads: 87,000,000 4 threads: 117,200,000 8 Threads: 129,600,000 Different file same hdd same program 50% of threads per file 2 Threads: 110,400,000 4 Threads: 130,000,000 8 Threads: 136,000,000 On 11 Aug 2011, at 9:51pm, Drew Kozicki wrote: > Unfortunately reading from disk appears not to be the problem. Even at 32 > threads the IO appears to be very minimal. Our inability to scale appears to > be caused by a mutex in the caching. My CPU usage is at 30% and my disk is > near silent watching the performance monitor. This is on a 10 GB database > executing 32 different queries against 7 different tables. Purely as a method of research, having done some time trials and measured the results, as you obviously have done, a) Make a second copy of the data on the same disk, have half look at one copy, half look at the other copy, and see what that does to the times. b) If you have another shared physical volume available, do it again but this time put the second copy on the other hard disk. One tells you about contention for one file. The other tells you about contention for access to a single hard disk. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read only scaling optimization
On 11 Aug 2011, at 9:51pm, Drew Kozicki wrote: > Unfortunately reading from disk appears not to be the problem. Even at 32 > threads the IO appears to be very minimal. Our inability to scale appears to > be caused by a mutex in the caching. My CPU usage is at 30% and my disk is > near silent watching the performance monitor. This is on a 10 GB database > executing 32 different queries against 7 different tables. Purely as a method of research, having done some time trials and measured the results, as you obviously have done, a) Make a second copy of the data on the same disk, have half look at one copy, half look at the other copy, and see what that does to the times. b) If you have another shared physical volume available, do it again but this time put the second copy on the other hard disk. One tells you about contention for one file. The other tells you about contention for access to a single hard disk. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read only scaling optimization
I'll try this. Unfortunately reading from disk appears not to be the problem. Even at 32 threads the IO appears to be very minimal. Our inability to scale appears to be caused by a mutex in the caching. My CPU usage is at 30% and my disk is near silent watching the performance monitor. This is on a 10 GB database executing 32 different queries against 7 different tables. > You > may try increasing page size - bigger block means less near-random > reads from the disc. It's good way. With page size 8k instead of default 1k selects performance may increasing ~3x. Note: PostgreSQL use 8k disk pages. Thank you all for your responses. On Wed, Aug 10, 2011 at 9:23 AM, Drew Kozicki wrote: > To Answer several questions at once. > > Simon, > Just checking: by 'queries' you mean 'SELECT', right ? You're not making > changes, just searching > > Yes to optimize we average about 5-6 indexes per table. > > D. Richard Hipp, > Open a separate database connection for each thread. Don't try to use the > same database connection on all threads because access to the database > connection is serialized. > > I'll look into this. Thank you > > Teg, > Why multiple threads? What kind of performance do you get if you only > use a single thread? > > Is it one thread per database perhaps? > > This program is ran on massive servers and the people that use it are > talking of running 100's of millions of records through and we're trying to > let them scale so that they can benefit from the new servers. We seem to > have peeked out single thread performance at aprox. 2-10 million > records/hour. > > Thank you once again in advance, > Drew > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read only scaling optimization
To Answer several questions at once. Simon, Just checking: by 'queries' you mean 'SELECT', right ? You're not making changes, just searching Yes to optimize we average about 5-6 indexes per table. D. Richard Hipp, Open a separate database connection for each thread. Don't try to use the same database connection on all threads because access to the database connection is serialized. I'll look into this. Thank you Teg, Why multiple threads? What kind of performance do you get if you only use a single thread? Is it one thread per database perhaps? This program is ran on massive servers and the people that use it are talking of running 100's of millions of records through and we're trying to let them scale so that they can benefit from the new servers. We seem to have peeked out single thread performance at aprox. 2-10 million records/hour. Thank you once again in advance, Drew ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read only scaling optimization
2011/8/10 Wiktor Adamski : > You > may try increasing page size - bigger block means less near-random > reads from the disc. It's good way. With page size 8k instead of default 1k selects performance may increasing ~3x. Note: PostgreSQL use 8k disk pages. -- Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read only scaling optimization
Look at disc transfer. With 16 queries in one thread on single disc, disc may be accesed linearly (depeding on query). With 16 threads accesing disc at the same time linear disc access is impossible (however os may do some prefetching) and queries will by slower. You may try increasing page size - bigger block means less near-random reads from the disc. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read only scaling optimization
Hello Drew, Why multiple threads? What kind of performance do you get if you only use a single thread? Is it one thread per database perhaps? C Tuesday, August 9, 2011, 4:46:13 PM, you wrote: DK> We have an application that does not scale well especially 8 and 16 DK> threads we are getting about 10% at best to -50% at worst (2 and 4 are okay DK> aprox. 40-50%). DK> We have narrowed the problem down to the SQLite code. DK> Looking for best practices from anyone that overcame scaling issues. DK> First we generate 35 different SQLite databases, this is done in a separate DK> program and performance is not an issue. DK> The application is a C++ app. that accesses all the databases in a read DK> only style. Size of database range from 1MB to 10GB. DK> Each thread does the following in a loop DK> 1. Grab a record from an external system (not SQLite) DK> 2. Runs several SQLite queries to process the record DK> We are currently using 3.6.14 (has this been improved in new versions?) DK> Is there any compile time options or pragma's that are specifically DK> engineered at optimizing for read only and/or scaling? DK> Thank you in advance, DK> Drew DK> ___ DK> sqlite-users mailing list DK> sqlite-users@sqlite.org DK> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- Best regards, Tegmailto:t...@djii.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read only scaling optimization
On Tue, Aug 9, 2011 at 4:46 PM, Drew Kozicki wrote: > We have an application that does not scale well – especially 8 and 16 > threads we are getting about 10% at best to -50% at worst (2 and 4 are okay > aprox. 40-50%). > Open a separate database connection for each thread. Don't try to use the same database connection on all threads because access to the database connection is serialized. > > > > We have narrowed the problem down to the SQLite code. > > > > Looking for best practices from anyone that overcame scaling issues. > > > First we generate 35 different SQLite databases, this is done in a separate > program and performance is not an issue. > > > > The application is a C++ app. that accesses all the databases in a read > only style. Size of database range from 1MB to 10GB. > > > Each thread does the following in a loop > > 1. Grab a record from an external system (not SQLite) > > 2. Runs several SQLite queries to process the record > > > > > > We are currently using 3.6.14 (has this been improved in new versions?) > > > Is there any compile time options or pragma's that are specifically > engineered at optimizing for read only and/or scaling? > > > Thank you in advance, > > Drew > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read only scaling optimization
On 9 Aug 2011, at 9:46pm, Drew Kozicki wrote: > The application is a C++ app. that accesses all the databases in a read > only style. Size of database range from 1MB to 10GB. > > > Each thread does the following in a loop > > 1. Grab a record from an external system (not SQLite) > > 2. Runs several SQLite queries to process the record Just checking: by 'queries' you mean 'SELECT', right ? You're not making changes, just searching. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Read only scaling optimization
We have an application that does not scale well – especially 8 and 16 threads we are getting about 10% at best to -50% at worst (2 and 4 are okay aprox. 40-50%). We have narrowed the problem down to the SQLite code. Looking for best practices from anyone that overcame scaling issues. First we generate 35 different SQLite databases, this is done in a separate program and performance is not an issue. The application is a C++ app. that accesses all the databases in a read only style. Size of database range from 1MB to 10GB. Each thread does the following in a loop 1. Grab a record from an external system (not SQLite) 2. Runs several SQLite queries to process the record We are currently using 3.6.14 (has this been improved in new versions?) Is there any compile time options or pragma's that are specifically engineered at optimizing for read only and/or scaling? Thank you in advance, Drew ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users