Re: [sqlite] Read only scaling optimization

2011-08-17 Thread Drew Kozicki
Pavel,
Thank you for this information. It is exactly what I was looking for.


That said if you are absolutely sure that no other process will write
to the database while you are reading you can create your own VFS and
make xAccess and xCheckReservedLock methods a no-op. You can read more
about VFS starting from here http://www.sqlite.org/c3ref/vfs_find.html
(follow the links to read all related info).


Pavel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read only scaling optimization

2011-08-15 Thread Pavel Ivanov
> Just by having 2 programs access 2
> different files on the same disk give that much of a performance/scaling
> gain causes me to believe that there exists a vital area of code that is
> mutexed that is the bottleneck.

I hope you are talking about your OS kernel, because SQLite won't ever
care whether 2 different files reside on a same hdd or on different
ones. In both cases it executes the same code.

> So I'll ask the question. Is there any compile time or pragma's engineered
> towards read only performance as locking a table is no longer needed. To
> save mutexing in cache we would also be ok if each thread would have it's
> cache so it doesn't have to obtain a lock going into it.

With default settings each connection has its own cache, they have
very little interference with each other. And even if you have some
custom settings different files have different caches anyway.

Regarding optimization of read-only access: the problem is not that
you only reading a file and don't need locks, the problem is to
prevent any other process that could write into the same file from
doing that while your reading transaction is active. So no, SQLite
doesn't have any settings or pragmas to optimize such access.

That said if you are absolutely sure that no other process will write
to the database while you are reading you can create your own VFS and
make xAccess and xCheckReservedLock methods a no-op. You can read more
about VFS starting from here http://www.sqlite.org/c3ref/vfs_find.html
(follow the links to read all related info).


Pavel


On Mon, Aug 15, 2011 at 9:00 AM, Drew Kozicki  wrote:
> The bottleneck appears to be mutex's locking access to the file and cache
> inside the same program. I'm guessing that the reason why 2 programs on 2
> files on 2 hdd's did not perform as well as the one too me suggests that
> both hdd's do not have the same performance and that it is not access to the
> physical disk that is the problem. Just by having 2 programs access 2
> different files on the same disk give that much of a performance/scaling
> gain causes me to believe that there exists a vital area of code that is
> mutexed that is the bottleneck.
>
> The source code, unfortunately, I can't post as it is the property of the
> company I work for.
>
> So I'll ask the question. Is there any compile time or pragma's engineered
> towards read only performance as locking a table is no longer needed. To
> save mutexing in cache we would also be ok if each thread would have it's
> cache so it doesn't have to obtain a lock going into it.
>
> Thank you all once again for your help,
>
> Drew
>
> Message: 17
> Date: Fri, 12 Aug 2011 19:28:33 +0100
> From: Simon Slavin 
> Subject: Re: [sqlite] Read only scaling optimization
> To: General Discussion of SQLite Database 
> Message-ID: 
> Content-Type: text/plain; charset=us-ascii
>
>
> On 12 Aug 2011, at 7:01pm, Drew Kozicki wrote:
>
>> I have a Driver doing this pulling in 32 queries aimed at randomness and
>> different tables, much like that would be experienced in typical usage.
> Best
>> performance comes from having 2 separate programs running on 2 separate
>> files.
>
> I'm no expert, but that suggests to me that your bottleneck is access to the
> physical file on disk.  So your greatest speed increases will come not from
> more threads but from a very fast hard disk drive, lots of hard drive
> caching, etc..
>
> That's a great set of benchmarks, by the way.
>
> Simon.
>
> --
>
> Message: 18
> Date: Fri, 12 Aug 2011 14:35:29 -0400
> From: Pavel Ivanov 
> Subject: Re: [sqlite] Read only scaling optimization
> To: General Discussion of SQLite Database 
> Message-ID:
>       
> Content-Type: text/plain; charset=ISO-8859-1
>
>>> I have a Driver doing this pulling in 32 queries aimed at randomness and
>>> different tables, much like that would be experienced in typical usage.
> Best
>>> performance comes from having 2 separate programs running on 2 separate
>>> files.
>>
>> I'm no expert, but that suggests to me that your bottleneck is access to
> the physical file on disk. ?So your greatest speed increases will come not
> from more threads but from a very fast hard disk drive, lots of hard drive
> caching, etc..
>
> It's a little surprising to me that with all the same conditions 2
> files residing on the same drive have better performance than the same
> files residing on different drives. Theoretically that shouldn't
> happen.
>
>
> Pavel
>
>
> Message: 2
> Date: Sun, 14 Aug 2011 17:53:26 +0400
> From: Alexey Pechnikov 
> Subject

Re: [sqlite] Read only scaling optimization

2011-08-15 Thread Drew Kozicki
The bottleneck appears to be mutex's locking access to the file and cache
inside the same program. I'm guessing that the reason why 2 programs on 2
files on 2 hdd's did not perform as well as the one too me suggests that
both hdd's do not have the same performance and that it is not access to the
physical disk that is the problem. Just by having 2 programs access 2
different files on the same disk give that much of a performance/scaling
gain causes me to believe that there exists a vital area of code that is
mutexed that is the bottleneck.

The source code, unfortunately, I can't post as it is the property of the
company I work for.

So I'll ask the question. Is there any compile time or pragma's engineered
towards read only performance as locking a table is no longer needed. To
save mutexing in cache we would also be ok if each thread would have it's
cache so it doesn't have to obtain a lock going into it.

Thank you all once again for your help,

Drew

Message: 17
Date: Fri, 12 Aug 2011 19:28:33 +0100
From: Simon Slavin 
Subject: Re: [sqlite] Read only scaling optimization
To: General Discussion of SQLite Database 
Message-ID: 
Content-Type: text/plain; charset=us-ascii


On 12 Aug 2011, at 7:01pm, Drew Kozicki wrote:

> I have a Driver doing this pulling in 32 queries aimed at randomness and
> different tables, much like that would be experienced in typical usage.
Best
> performance comes from having 2 separate programs running on 2 separate
> files.

I'm no expert, but that suggests to me that your bottleneck is access to the
physical file on disk.  So your greatest speed increases will come not from
more threads but from a very fast hard disk drive, lots of hard drive
caching, etc..

That's a great set of benchmarks, by the way.

Simon.

--

Message: 18
Date: Fri, 12 Aug 2011 14:35:29 -0400
From: Pavel Ivanov 
Subject: Re: [sqlite] Read only scaling optimization
To: General Discussion of SQLite Database 
Message-ID:
   
Content-Type: text/plain; charset=ISO-8859-1

>> I have a Driver doing this pulling in 32 queries aimed at randomness and
>> different tables, much like that would be experienced in typical usage.
Best
>> performance comes from having 2 separate programs running on 2 separate
>> files.
>
> I'm no expert, but that suggests to me that your bottleneck is access to
the physical file on disk. ?So your greatest speed increases will come not
from more threads but from a very fast hard disk drive, lots of hard drive
caching, etc..

It's a little surprising to me that with all the same conditions 2
files residing on the same drive have better performance than the same
files residing on different drives. Theoretically that shouldn't
happen.


Pavel


Message: 2
Date: Sun, 14 Aug 2011 17:53:26 +0400
From: Alexey Pechnikov 
Subject: Re: [sqlite] Read only scaling optimization
To: General Discussion of SQLite Database 
Message-ID:
   
Content-Type: text/plain; charset=ISO-8859-1

2011/8/12 Pavel Ivanov :
> It's a little surprising to me that with all the same conditions 2
> files residing on the same drive have better performance than the same
> files residing on different drives. Theoretically that shouldn't
> happen.

Yes, it's not right behaviour. Is needed the sources of the test programm.

--
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read only scaling optimization

2011-08-14 Thread Alexey Pechnikov
2011/8/12 Pavel Ivanov :
> It's a little surprising to me that with all the same conditions 2
> files residing on the same drive have better performance than the same
> files residing on different drives. Theoretically that shouldn't
> happen.

Yes, it's not right behaviour. Is needed the sources of the test programm.

-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read only scaling optimization

2011-08-12 Thread Pavel Ivanov
>> I have a Driver doing this pulling in 32 queries aimed at randomness and
>> different tables, much like that would be experienced in typical usage. Best
>> performance comes from having 2 separate programs running on 2 separate
>> files.
>
> I'm no expert, but that suggests to me that your bottleneck is access to the 
> physical file on disk.  So your greatest speed increases will come not from 
> more threads but from a very fast hard disk drive, lots of hard drive 
> caching, etc..

It's a little surprising to me that with all the same conditions 2
files residing on the same drive have better performance than the same
files residing on different drives. Theoretically that shouldn't
happen.


Pavel


On Fri, Aug 12, 2011 at 2:28 PM, Simon Slavin  wrote:
>
> On 12 Aug 2011, at 7:01pm, Drew Kozicki wrote:
>
>> I have a Driver doing this pulling in 32 queries aimed at randomness and
>> different tables, much like that would be experienced in typical usage. Best
>> performance comes from having 2 separate programs running on 2 separate
>> files.
>
> I'm no expert, but that suggests to me that your bottleneck is access to the 
> physical file on disk.  So your greatest speed increases will come not from 
> more threads but from a very fast hard disk drive, lots of hard drive 
> caching, etc..
>
> That's a great set of benchmarks, by the way.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read only scaling optimization

2011-08-12 Thread Simon Slavin

On 12 Aug 2011, at 7:01pm, Drew Kozicki wrote:

> I have a Driver doing this pulling in 32 queries aimed at randomness and
> different tables, much like that would be experienced in typical usage. Best
> performance comes from having 2 separate programs running on 2 separate
> files.

I'm no expert, but that suggests to me that your bottleneck is access to the 
physical file on disk.  So your greatest speed increases will come not from 
more threads but from a very fast hard disk drive, lots of hard drive caching, 
etc..

That's a great set of benchmarks, by the way.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read only scaling optimization

2011-08-12 Thread Drew Kozicki
Alright so to sum up all the testing here it goes.
I have a Driver doing this pulling in 32 queries aimed at randomness and
different tables, much like that would be experienced in typical usage. Best
performance comes from having 2 separate programs running on 2 separate
files.

Single file different connection each thread:
1 thread: 64,100,000 Queries/HR
2 threds: 98,800,000  Queries/HR
4 threads: 115,600,000 Queries/Hr
8 threads: 119,200,000 Queries/HR
16 Threads: 129,600,000 Queries /HR

Different file different hdd 2 programs
2 total threads : 102,300,000 Queries/HR
4 total threads: 156,320,000 Queries/ HR
8 total threads 155,440,000 Quereis/HR

2 Programs  same file
2 thread total: 107,000,000 Quereis/HR
4 thread total: 117,800,000 Quereis/HR
8 thread total: 133,400,000 Queries/HR
16 thread total:133,840,000 Queries/HR

2 programs different file same hdd
2 thread total: 127,400,000 Queries/Hr
4 Threads total: 169,400,00 Queries/Hr
8 threads total: 180,400,000 Queries/HR

Different file different hdd same program 50% of threads per file
2 threads: 87,000,000
4 threads: 117,200,000
8 Threads: 129,600,000

Different file same hdd same program 50% of threads per file
2 Threads: 110,400,000
4 Threads: 130,000,000
8 Threads: 136,000,000



On 11 Aug 2011, at 9:51pm, Drew Kozicki wrote:

> Unfortunately reading from disk appears not to be the problem. Even at 32
> threads the IO appears to be very minimal. Our inability to scale appears
to
> be caused by a mutex in the caching. My CPU usage is at 30% and my disk is
> near silent watching the performance monitor. This is on a 10 GB database
> executing 32 different queries against 7 different tables.

Purely as a method of research, having done some time trials and measured
the results, as you obviously have done,

a) Make a second copy of the data on the same disk, have half look at one
copy, half look at the other copy, and see what that does to the times.

b) If you have another shared physical volume available, do it again but
this time put the second copy on the other hard disk.

One tells you about contention for one file.  The other tells you about
contention for access to a single hard disk.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read only scaling optimization

2011-08-11 Thread Simon Slavin

On 11 Aug 2011, at 9:51pm, Drew Kozicki wrote:

> Unfortunately reading from disk appears not to be the problem. Even at 32
> threads the IO appears to be very minimal. Our inability to scale appears to
> be caused by a mutex in the caching. My CPU usage is at 30% and my disk is
> near silent watching the performance monitor. This is on a 10 GB database
> executing 32 different queries against 7 different tables.

Purely as a method of research, having done some time trials and measured the 
results, as you obviously have done,

a) Make a second copy of the data on the same disk, have half look at one copy, 
half look at the other copy, and see what that does to the times.

b) If you have another shared physical volume available, do it again but this 
time put the second copy on the other hard disk.

One tells you about contention for one file.  The other tells you about 
contention for access to a single hard disk.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read only scaling optimization

2011-08-11 Thread Drew Kozicki
I'll try this.
Unfortunately reading from disk appears not to be the problem. Even at 32
threads the IO appears to be very minimal. Our inability to scale appears to
be caused by a mutex in the caching. My CPU usage is at 30% and my disk is
near silent watching the performance monitor. This is on a 10 GB database
executing 32 different queries against 7 different tables.


> You
> may try increasing page size - bigger block means less near-random
> reads from the disc.

It's good way. With page size 8k instead of default 1k selects performance
may increasing ~3x. Note: PostgreSQL use 8k disk pages.

Thank you all for your responses.

On Wed, Aug 10, 2011 at 9:23 AM, Drew Kozicki  wrote:

> To Answer several questions at once.
>
> Simon,
> Just checking: by 'queries' you mean 'SELECT', right ?  You're not making
> changes, just searching
>
> Yes to optimize we average about 5-6 indexes per table.
>
> D. Richard Hipp,
> Open a separate database connection for each thread.  Don't try to use the
> same database connection on all threads because access to the database
> connection is serialized.
>
> I'll look into this. Thank you
>
> Teg,
> Why multiple threads? What kind of performance do you get if you only
> use a single thread?
>
> Is it one thread per database perhaps?
>
> This program is ran on massive servers and the people that use it are
> talking of running 100's of millions of records through and we're trying to
> let them scale so that they can benefit from the new servers. We seem to
> have peeked out single thread performance at aprox. 2-10 million
> records/hour.
>
> Thank you once again in advance,
> Drew
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read only scaling optimization

2011-08-10 Thread Drew Kozicki
To Answer several questions at once.

Simon,
Just checking: by 'queries' you mean 'SELECT', right ?  You're not making
changes, just searching

Yes to optimize we average about 5-6 indexes per table.

D. Richard Hipp,
Open a separate database connection for each thread.  Don't try to use the
same database connection on all threads because access to the database
connection is serialized.

I'll look into this. Thank you

Teg,
Why multiple threads? What kind of performance do you get if you only
use a single thread?

Is it one thread per database perhaps?

This program is ran on massive servers and the people that use it are
talking of running 100's of millions of records through and we're trying to
let them scale so that they can benefit from the new servers. We seem to
have peeked out single thread performance at aprox. 2-10 million
records/hour.

Thank you once again in advance,
Drew
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read only scaling optimization

2011-08-10 Thread Alexey Pechnikov
2011/8/10 Wiktor Adamski :
> You
> may try increasing page size - bigger block means less near-random
> reads from the disc.

It's good way. With page size 8k instead of default 1k selects performance
may increasing ~3x. Note: PostgreSQL use 8k disk pages.

-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read only scaling optimization

2011-08-10 Thread Wiktor Adamski
Look at disc transfer. With 16 queries in one thread on single disc,
disc may be accesed linearly (depeding on query). With 16 threads
accesing disc at the same time linear disc access is impossible
(however os may do some prefetching) and queries will by slower. You
may try increasing page size - bigger block means less near-random
reads from the disc.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read only scaling optimization

2011-08-09 Thread Teg
Hello Drew,

Why multiple threads? What kind of performance do you get if you only
use a single thread?

Is it one thread per database perhaps?

C

Tuesday, August 9, 2011, 4:46:13 PM, you wrote:

DK> We have an application that does not scale well – especially 8 and 16
DK> threads we are getting about 10% at best to -50% at worst (2 and 4 are okay
DK> aprox. 40-50%).



DK> We have narrowed the problem down to the SQLite code.



DK> Looking for best practices from anyone that overcame scaling issues.


DK> First we generate 35 different SQLite databases, this is done in a separate
DK> program and performance is not an issue.



DK> The application is a  C++ app. that accesses all the databases in a read
DK> only style.  Size of database range from 1MB to 10GB.


DK> Each thread does the following in a loop

DK> 1. Grab a record from an external system (not SQLite)

DK> 2. Runs several SQLite queries to process the record





DK> We are currently using 3.6.14 (has this been improved in new versions?)


DK> Is there any compile time options or pragma's that are specifically
DK> engineered at optimizing for read only and/or scaling?


DK> Thank you in advance,

DK> Drew
DK> ___
DK> sqlite-users mailing list
DK> sqlite-users@sqlite.org
DK> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



-- 
Best regards,
 Tegmailto:t...@djii.com

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read only scaling optimization

2011-08-09 Thread Richard Hipp
On Tue, Aug 9, 2011 at 4:46 PM, Drew Kozicki  wrote:

> We have an application that does not scale well – especially 8 and 16
> threads we are getting about 10% at best to -50% at worst (2 and 4 are okay
> aprox. 40-50%).
>

Open a separate database connection for each thread.  Don't try to use the
same database connection on all threads because access to the database
connection is serialized.


>
>
>
> We have narrowed the problem down to the SQLite code.
>
>
>
> Looking for best practices from anyone that overcame scaling issues.
>
>
> First we generate 35 different SQLite databases, this is done in a separate
> program and performance is not an issue.
>
>
>
> The application is a  C++ app. that accesses all the databases in a read
> only style.  Size of database range from 1MB to 10GB.
>
>
> Each thread does the following in a loop
>
> 1. Grab a record from an external system (not SQLite)
>
> 2. Runs several SQLite queries to process the record
>
>
>
>
>
> We are currently using 3.6.14 (has this been improved in new versions?)
>
>
> Is there any compile time options or pragma's that are specifically
> engineered at optimizing for read only and/or scaling?
>
>
> Thank you in advance,
>
> Drew
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read only scaling optimization

2011-08-09 Thread Simon Slavin

On 9 Aug 2011, at 9:46pm, Drew Kozicki wrote:

> The application is a  C++ app. that accesses all the databases in a read
> only style.  Size of database range from 1MB to 10GB.
> 
> 
> Each thread does the following in a loop
> 
> 1. Grab a record from an external system (not SQLite)
> 
> 2. Runs several SQLite queries to process the record

Just checking: by 'queries' you mean 'SELECT', right ?  You're not making 
changes, just searching.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Read only scaling optimization

2011-08-09 Thread Drew Kozicki
We have an application that does not scale well – especially 8 and 16
threads we are getting about 10% at best to -50% at worst (2 and 4 are okay
aprox. 40-50%).



We have narrowed the problem down to the SQLite code.



Looking for best practices from anyone that overcame scaling issues.


First we generate 35 different SQLite databases, this is done in a separate
program and performance is not an issue.



The application is a  C++ app. that accesses all the databases in a read
only style.  Size of database range from 1MB to 10GB.


Each thread does the following in a loop

1. Grab a record from an external system (not SQLite)

2. Runs several SQLite queries to process the record





We are currently using 3.6.14 (has this been improved in new versions?)


Is there any compile time options or pragma's that are specifically
engineered at optimizing for read only and/or scaling?


Thank you in advance,

Drew
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users