Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-06 Thread Nelson, Erik - 2
Keith Medcalf Sent Saturday, January 04, 2020 3:29 PM

>So I conclude that SERIALIZED/MULTITHREAD makes very little difference
>and that MEMSTATUS ON/OFF makes a huge difference.  Since the

That's a potentially very useful observation!  Is any of this available to JDBC 
users?

Erik

--
This message, and any attachments, is for the intended recipient(s) only, may 
contain information that is privileged, confidential and/or proprietary and 
subject to important terms and conditions available at 
http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended 
recipient, please delete this message.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-04 Thread Keith Medcalf

Well, actually, no.  

It was a single process that spins up 64 threads each of which accesses its own 
per-thread in-memory database using an 
in-that-thread-only-in-thread-database-connection-in-that-thread.

Making some simple modifications (changing the number of threads to 6 and the 
insertions/thread to 1,000,000 -- so as to consume no more resources than are 
available (this is on a 4-core Xeon (2 SMT threads per core) 32 GB RAM Win10 
1909) yields interesting results (CPU usage was about 90% for all of them, 
though of course the linear runs consumed only 1 thread on one core so about 
12.5% CPU):

SERIALIZED   MEMSTATUS ON   Elapsed Time :  00:00:19.944  MPR=0.75
SERIALIZED   MEMSTATUS OFF  Elapsed Time :  00:00:04.408  MPR=3.97
MULTITHREAD  MEMSTATUS ON   Elapsed Time :  00:00:19.815  MPR=0.72
MULTITHREAD  MEMSTATUS OFF  Elapsed Time :  00:00:04.456  MPR=2.88

*MPR is the Multiprogramming Ratio:  (elapsed time running the workload 
linearly)/(elapsed time running the workload in parallel)

In this case the "linear" time is obtained by running the same test with the 
pthread_join immediately following the pthread_create so that only one thread 
of the workload runs at a time, one after each.

So I conclude that SERIALIZED/MULTITHREAD makes very little difference and that 
MEMSTATUS ON/OFF makes a huge difference.  Since the difference between 
MUTITHREAD and SERIALIZED is a thread mutex on the connection, I can conclude 
that with zero contention for that mutex, whether it is being used or not makes 
little difference and that the fact that it is, when used, a double-indexed 
indirect call through a global data area, leads me to believe that the benefit 
derived from replacing that double-indexed indirect call with an immediate call 
cannot possibly have any significant effect if the elimination of that call 
entirely has negligible effect.

From the MPR _for_this_workload_on_this_particular_CPU_ I can conclude that if 
one wishes to have MEMSTATUS enabled, then using multiple threads is 
detrimental to performance and that linear processing is significantly more 
efficient.  However, when MEMSTATUS is turned off, then serialized mode leads 
to significant increased multiprogramming benefit.

GCC 8.1.0 (MinGW-x64)
gcc -m64 -mwin32 -mconsole -mthreads -O3 -s -pipe 
-Wl,-Bstatic,--nxcompat,--dynamicbase test.c -ID:\Source\bld 
-ID:\Source\bld\tsrc -Ld:\source\bld\gcc\64 -lsqlite3.dll -o test.exe 
-static-libgcc -lpthread

sqlite3.dll compiled using:
gcc -s -O3 -pipe -D_HAVE_SQLITE_CONFIG_H -DSQLITE_EXTRA_INIT=core_init 
-DSQLITE_HAVE_ZLIB -Itsrc -march=native -mtune=native -m64 -mdll -mthreads 
-Wl,-Bstatic,--nxcompat,--dynamicbase,--high-entropy-va,--image-base,0x18000,--out-implib,gcc/64/libsqlite3.dll.a,--output-def,gcc/64/sqlite3.def
 sqlite3.c sqlite3.def -ladvapi32 -lrpcrt4 -lwinmm -lz -static-libgcc -o 
gcc/64/sqlite3.dll

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users  On
>Behalf Of Doug
>Sent: Saturday, 4 January, 2020 10:42
>To: 'SQLite mailing list' 
>Subject: Re: [sqlite] FW: Questions about your "Performance Matters" talk
>re SQLite
>
>Thanks, Jens. I got it. The benchmark sounds like it isn't a real
>benchmark, but a made-up scenario to exercise the Coz code. I've let go
>now.
>Doug
>
>> -Original Message-
>> From: sqlite-users 
>> On Behalf Of Jens Alfke
>> Sent: Friday, January 03, 2020 10:58 PM
>> To: SQLite mailing list 
>> Cc: em...@cs.umass.edu; curtsin...@grinnell.edu
>> Subject: Re: [sqlite] FW: Questions about your "Performance
>> Matters" talk re SQLite
>>
>>
>> > On Jan 2, 2020, at 11:54 AM, Doug  wrote:
>> >
>> > I know there has been a lot of talk about what can and cannot be
>> done with the C calling interface because of compatibility issues
>> and the myriad set of wrappers on various forms. I’m having a hard
>> time letting go of a possible 25% performance improvement.
>>
>> This was a heavily multithreaded benchmark (64 threads accessing
>> the same connection) on a very hefty server-class CPU. From Dr
>> Hipp’s results, it sounds like the speed up may be only in similar
>> situations, not to more normal SQLite usage.
>>
>> —Jens
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>> users
>
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-04 Thread Doug
> -Original Message-
> From: sqlite-users 
> On Behalf Of J Decker
> Sent: Saturday, January 04, 2020 4:11 AM
> 
> Could wish there was a way to
> pause execution without giving up execution context...

What about?
for (i=1000; i--; i>0);

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-04 Thread Doug
Thanks, Jens. I got it. The benchmark sounds like it isn't a real benchmark, 
but a made-up scenario to exercise the Coz code. I've let go now.
Doug

> -Original Message-
> From: sqlite-users 
> On Behalf Of Jens Alfke
> Sent: Friday, January 03, 2020 10:58 PM
> To: SQLite mailing list 
> Cc: em...@cs.umass.edu; curtsin...@grinnell.edu
> Subject: Re: [sqlite] FW: Questions about your "Performance
> Matters" talk re SQLite
> 
> 
> > On Jan 2, 2020, at 11:54 AM, Doug  wrote:
> >
> > I know there has been a lot of talk about what can and cannot be
> done with the C calling interface because of compatibility issues
> and the myriad set of wrappers on various forms. I’m having a hard
> time letting go of a possible 25% performance improvement.
> 
> This was a heavily multithreaded benchmark (64 threads accessing
> the same connection) on a very hefty server-class CPU. From Dr
> Hipp’s results, it sounds like the speed up may be only in similar
> situations, not to more normal SQLite usage.
> 
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
> users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-04 Thread J Decker
On Sat, Jan 4, 2020 at 2:59 AM Howard Chu  wrote:

> Keith Medcalf wrote:
> >
> > Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s)
> performance increase.
> > Changing the threading mode or the indirection level of the mutexes
> calls seems to have no significant effect.
> >
> Goes to show - publishing benchmark results without investigating why they
> are what
> they are is mostly pointless. When you suspect mutex contention is a
> significant
> factor, you should use something like mutrace to confirm your suspicion
> first.
>
> Fundamentally there ought to be no performance difference between running
> a 64-threaded
> server on a 64-threaded CPU vs 64 single-threaded processes. In practice,
> the single
> process with 64 threads ought to be slightly faster, due to less context
> switch overhead
> between threads, but if nothing else in the system is contending for CPU
> then context
> switching shouldn't even be an issue.
>

but, if the mutex lock always does a 'sched_yield' when the locking fails,
then a failure to lock isn't just a stutter in the thread, but can result
in at least 2 more context switches every time.  Probably something simpler
like a simple InterlockedExchange() call in a loop could maintain locking,
and affect performance much less; since it sounds like the actual lock
duration is really only a few clocks anyway.  Could wish there was a way to
pause execution without giving up execution context...


>
> --
>   -- Howard Chu
>   CTO, Symas Corp.   http://www.symas.com
>   Director, Highland Sun http://highlandsun.com/hyc/
>   Chief Architect, OpenLDAP  http://www.openldap.org/project/
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-04 Thread Howard Chu
Keith Medcalf wrote:
> 
> Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s) performance 
> increase.  
> Changing the threading mode or the indirection level of the mutexes calls 
> seems to have no significant effect.
> 
Goes to show - publishing benchmark results without investigating why they are 
what
they are is mostly pointless. When you suspect mutex contention is a significant
factor, you should use something like mutrace to confirm your suspicion first.

Fundamentally there ought to be no performance difference between running a 
64-threaded
server on a 64-threaded CPU vs 64 single-threaded processes. In practice, the 
single
process with 64 threads ought to be slightly faster, due to less context switch 
overhead
between threads, but if nothing else in the system is contending for CPU then 
context
switching shouldn't even be an issue.

-- 
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-03 Thread Jens Alfke

> On Jan 2, 2020, at 11:54 AM, Doug  wrote:
> 
> I know there has been a lot of talk about what can and cannot be done with 
> the C calling interface because of compatibility issues and the myriad set of 
> wrappers on various forms. I’m having a hard time letting go of a possible 
> 25% performance improvement.

This was a heavily multithreaded benchmark (64 threads accessing the same 
connection) on a very hefty server-class CPU. From Dr Hipp’s results, it sounds 
like the speed up may be only in similar situations, not to more normal SQLite 
usage.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-03 Thread sky5walk
Well, I told you I'm getting SQLITE_MISUSE so that kinda answers your side
question?
I am only interested in this topic for the performance gain so quoted.
I need to create a personal test case(my use model) to verify these
statements.
Querying the config state is helpful for a dll wrapped database, but I can
adapt without the '_get()' function.

On Fri, Jan 3, 2020 at 2:02 PM Keith Medcalf  wrote:

>
> On Friday, 3 January, 2020 11:32, sky5w...@gmail.com wrote:
>
> > Is there a query function for these and other config settings?
> > I see no sqlite3_config_get() in sqlite3.h.
>
> No.  There are config options to get specific config data where that might
> be useful.
> Otherwise, you simply set the configuration to match your requirements.
>
> That is, if you need to have configuration X, Y, and Z then just set it
> the way you want it.  Either the configuration is set that way or you get
> an error return (you do check return codes right, that is what they are
> for).  There is not really much point in doing "if memstatus is on turn it
> off" when you can just turn it off.
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-03 Thread Richard Hipp
On 1/3/20, sky5w...@gmail.com  wrote:
> Is there a query function for these and other config settings?
> I see no sqlite3_config_get() in sqlite3.h.

There is no query function for the SQLITE_CONFIG_MEMSTATUS setting.
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-03 Thread Keith Medcalf

On Friday, 3 January, 2020 11:32, sky5w...@gmail.com wrote:

> Is there a query function for these and other config settings?
> I see no sqlite3_config_get() in sqlite3.h.

No.  There are config options to get specific config data where that might be 
useful.  
Otherwise, you simply set the configuration to match your requirements.

That is, if you need to have configuration X, Y, and Z then just set it the way 
you want it.  Either the configuration is set that way or you get an error 
return (you do check return codes right, that is what they are for).  There is 
not really much point in doing "if memstatus is on turn it off" when you can 
just turn it off.
 
-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-03 Thread sky5walk
Is there a query function for these and other config settings?
I see no sqlite3_config_get() in sqlite3.h.

On Fri, Jan 3, 2020 at 11:36 AM Keith Medcalf  wrote:

> On Friday, 3 January, 2020 09:30, sky5w...@gmail.com wrote:
>
> >I get SQLITE_MISUSE when attempting
> >sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
> >immediately after opening a db connection?
> >My connection has THREADSAFE = 1.
>
> That is correct.  You must configure the library before it is initialized,
> not after.
>
> https://sqlite.org/c3ref/config.html
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-03 Thread Eric Grange
> Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s)
performance increase.
> Changing the threading mode or the indirection level of the mutexes calls
seems to have no significant effect.

That is quite significant.
Looking at the code, it seems the mutex requirement is mostly for the soft
heap limit, maybe there could be a way to use atomic instruction to
maintain the stats ?
The highwater stat could be handled by a CAS (loop), and this should make
the memstatus data cheaper overall when soft heap limit is not required.

(my use case involved a hundreds of independent SQLite databases from a
single process, I have not benchmarked yet, but I am probably hitting that
mutex hard as well)

Eric




Le ven. 3 janv. 2020 à 17:36, Keith Medcalf  a écrit :

> On Friday, 3 January, 2020 09:30, sky5w...@gmail.com wrote:
>
> >I get SQLITE_MISUSE when attempting
> >sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
> >immediately after opening a db connection?
> >My connection has THREADSAFE = 1.
>
> That is correct.  You must configure the library before it is initialized,
> not after.
>
> https://sqlite.org/c3ref/config.html
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-03 Thread Keith Medcalf
On Friday, 3 January, 2020 09:30, sky5w...@gmail.com wrote:

>I get SQLITE_MISUSE when attempting
>sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
>immediately after opening a db connection?
>My connection has THREADSAFE = 1.

That is correct.  You must configure the library before it is initialized, not 
after.

https://sqlite.org/c3ref/config.html

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume. 



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-03 Thread sky5walk
I get SQLITE_MISUSE when attempting
sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
immediately after opening a db connection?
My connection has THREADSAFE = 1.

On Thu, Jan 2, 2020 at 7:32 PM Keith Medcalf  wrote:

>
> Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s)
> performance increase.
> Changing the threading mode or the indirection level of the mutexes calls
> seems to have no significant effect.
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
> >-Original Message-
> >From: sqlite-users  On
> >Behalf Of Richard Hipp
> >Sent: Thursday, 2 January, 2020 16:00
> >To: SQLite mailing list 
> >Subject: Re: [sqlite] FW: Questions about your "Performance Matters" talk
> >re SQLite
> >
> >On 1/2/20, Barry Smith  wrote:
> >> One thing that really stands is “creates 64 threads that operate on
> >> independent tables in the sqlite database, performing operations that
> >should
> >> be almost entirely independent.”
> >>
> >
> >Looking at the main.c file
> >(https://github.com/plasma-
> >umass/coz/blob/master/benchmarks/sqlite/main.c)
> >it appears that the test creates 64 separate database connections,
> >each with a separate in-memory database.
> >
> >There are two sources of contention here:
> >
> >(1) SQLite keeps track of the total amount of memory it is using on
> >all threads.  So for each malloc() and free() it has to take a mutex
> >to increase or decrease the counters.  This is probably the primary
> >source of contention.  It can be disabled by running:
> >
> >sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
> >
> >early in main(), before any other SQLite interface calls.  Make that
> >one change and I suspect that most of the thread contention will go
> >away.
> >
> >(2) SQLite has a single PRNG used by all threads.  And so there is a
> >mutex that has to be taken whenever a new random number is generated.
> >But the workload does not appear to be using any random numbers, so I
> >doubt that this is an actual problem in this case.
> >
> >> I’d encourage you *not* to use cpu cycles as a proxy for runtime.
> >Dynamic frequency
> >> scaling can mess up these measurements, especially if the clock
> >frequency is dropped
> >> in response to the program’s behavior.
> >
> >The task requires X number of CPU cycles *regardless* of the clock
> >frequency.  If the clock slows down, then it takes more elapse time to
> >run those X cycles, but it does not increase or decrease the number of
> >cycles required.  So in that sense, counting the number of CPU cycles
> >is an excellent measure of effort required to complete the
> >computation.
> >
> >Furthermore, the idea that thread contention will cause the CPU clock
> >to slow down seems silly.  Technically, I suppose such a think might
> >actually happen - IF you do all of your work as multiple threads
> >within the same process and they all blocked on the same resource.
> >The point is, you shouldn't do that.  Instead of one process with 64
> >threads, how about 64 processes with one thread each.  Since they are
> >all doing different things (serving independent HTTP requests, for
> >example) they might as well each have their own address space.
> >Keeping each job in a separate process provides isolation for added
> >security.  And it completely eliminates the need for mutexes and the
> >accompanying thread contention.
> >
> >If SQLite runs faster for you when you make direct calls to
> >pthread_mutex_lock() rather than indirect calls, how much faster would
> >it run if you completely eliminated all calls to pthread_mutex_lock()
> >by putting each task in a separate process?
> >
> >
> >--
> >D. Richard Hipp
> >d...@sqlite.org
> >___
> >sqlite-users mailing list
> >sqlite-users@mailinglists.sqlite.org
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-02 Thread Keith Medcalf

Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s) performance 
increase.  
Changing the threading mode or the indirection level of the mutexes calls seems 
to have no significant effect.

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users  On
>Behalf Of Richard Hipp
>Sent: Thursday, 2 January, 2020 16:00
>To: SQLite mailing list 
>Subject: Re: [sqlite] FW: Questions about your "Performance Matters" talk
>re SQLite
>
>On 1/2/20, Barry Smith  wrote:
>> One thing that really stands is “creates 64 threads that operate on
>> independent tables in the sqlite database, performing operations that
>should
>> be almost entirely independent.”
>>
>
>Looking at the main.c file
>(https://github.com/plasma-
>umass/coz/blob/master/benchmarks/sqlite/main.c)
>it appears that the test creates 64 separate database connections,
>each with a separate in-memory database.
>
>There are two sources of contention here:
>
>(1) SQLite keeps track of the total amount of memory it is using on
>all threads.  So for each malloc() and free() it has to take a mutex
>to increase or decrease the counters.  This is probably the primary
>source of contention.  It can be disabled by running:
>
>sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
>
>early in main(), before any other SQLite interface calls.  Make that
>one change and I suspect that most of the thread contention will go
>away.
>
>(2) SQLite has a single PRNG used by all threads.  And so there is a
>mutex that has to be taken whenever a new random number is generated.
>But the workload does not appear to be using any random numbers, so I
>doubt that this is an actual problem in this case.
>
>> I’d encourage you *not* to use cpu cycles as a proxy for runtime.
>Dynamic frequency
>> scaling can mess up these measurements, especially if the clock
>frequency is dropped
>> in response to the program’s behavior.
>
>The task requires X number of CPU cycles *regardless* of the clock
>frequency.  If the clock slows down, then it takes more elapse time to
>run those X cycles, but it does not increase or decrease the number of
>cycles required.  So in that sense, counting the number of CPU cycles
>is an excellent measure of effort required to complete the
>computation.
>
>Furthermore, the idea that thread contention will cause the CPU clock
>to slow down seems silly.  Technically, I suppose such a think might
>actually happen - IF you do all of your work as multiple threads
>within the same process and they all blocked on the same resource.
>The point is, you shouldn't do that.  Instead of one process with 64
>threads, how about 64 processes with one thread each.  Since they are
>all doing different things (serving independent HTTP requests, for
>example) they might as well each have their own address space.
>Keeping each job in a separate process provides isolation for added
>security.  And it completely eliminates the need for mutexes and the
>accompanying thread contention.
>
>If SQLite runs faster for you when you make direct calls to
>pthread_mutex_lock() rather than indirect calls, how much faster would
>it run if you completely eliminated all calls to pthread_mutex_lock()
>by putting each task in a separate process?
>
>
>--
>D. Richard Hipp
>d...@sqlite.org
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-02 Thread Richard Hipp
On 1/2/20, Barry Smith  wrote:
> One thing that really stands is “creates 64 threads that operate on
> independent tables in the sqlite database, performing operations that should
> be almost entirely independent.”
>

Looking at the main.c file
(https://github.com/plasma-umass/coz/blob/master/benchmarks/sqlite/main.c)
it appears that the test creates 64 separate database connections,
each with a separate in-memory database.

There are two sources of contention here:

(1) SQLite keeps track of the total amount of memory it is using on
all threads.  So for each malloc() and free() it has to take a mutex
to increase or decrease the counters.  This is probably the primary
source of contention.  It can be disabled by running:

sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);

early in main(), before any other SQLite interface calls.  Make that
one change and I suspect that most of the thread contention will go
away.

(2) SQLite has a single PRNG used by all threads.  And so there is a
mutex that has to be taken whenever a new random number is generated.
But the workload does not appear to be using any random numbers, so I
doubt that this is an actual problem in this case.

> I’d encourage you *not* to use cpu cycles as a proxy for runtime. Dynamic 
> frequency
> scaling can mess up these measurements, especially if the clock frequency is 
> dropped
> in response to the program’s behavior.

The task requires X number of CPU cycles *regardless* of the clock
frequency.  If the clock slows down, then it takes more elapse time to
run those X cycles, but it does not increase or decrease the number of
cycles required.  So in that sense, counting the number of CPU cycles
is an excellent measure of effort required to complete the
computation.

Furthermore, the idea that thread contention will cause the CPU clock
to slow down seems silly.  Technically, I suppose such a think might
actually happen - IF you do all of your work as multiple threads
within the same process and they all blocked on the same resource.
The point is, you shouldn't do that.  Instead of one process with 64
threads, how about 64 processes with one thread each.  Since they are
all doing different things (serving independent HTTP requests, for
example) they might as well each have their own address space.
Keeping each job in a separate process provides isolation for added
security.  And it completely eliminates the need for mutexes and the
accompanying thread contention.

If SQLite runs faster for you when you make direct calls to
pthread_mutex_lock() rather than indirect calls, how much faster would
it run if you completely eliminated all calls to pthread_mutex_lock()
by putting each task in a separate process?


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-02 Thread Barry Smith
One thing that really stands is “creates 64 threads that operate on independent 
tables in the sqlite database, performing operations that should be almost 
entirely independent.”

But that’s not how SQLite works - at least not when writing data. SQLite takes 
a lock on the entire database, there is no fine granularity locking that allows 
you to perform simultaneous writes to different tables.

It seems attempting to do this - use 64 threads to write to a database - is a 
highly inefficient use of the library.

> On 2 Jan 2020, at 1:54 pm, Doug  wrote:
> 
> I asked for some information from Emery Berger about his video talk on 
> performance where he said they got a 25% improvement in SQLite performance. 
> Here is the reply I got back.
> 
> 
> 
> I know there has been a lot of talk about what can and cannot be done with 
> the C calling interface because of compatibility issues and the myriad set of 
> wrappers on various forms. I’m having a hard time letting go of a possible 
> 25% performance improvement.
> 
> 
> 
> I don’t have the slightest idea on how to run a benchmark (but I could 
> learn). I wonder if the current set of benchmarks used by SQLite developers 
> actually measure throughput using wall-clock numbers. It might be a good idea 
> to put a wrapper around all the benchmarks to capture how long they took to 
> run (wall-clock), and include things like number and type of cpu cores, 
> average cpu busy time, and other relevant numbers. If the benchmarks are run 
> on lots of different machines (all over the world?), it would provide an 
> excellent view of what changes in SQLite made a difference in performance.
> 
> 
> 
> Doug
> 
> 
> 
> From: Curtsinger, Charlie  
> Sent: Thursday, January 02, 2020 10:55 AM
> To: dougf@comcast.net
> Cc: Emery D Berger 
> Subject: Re: Questions about your "Performance Matters" talk re SQLite
> 
> 
> 
> Hello Doug,
> 
> 
> 
> I was able to track down the sqlite benchmark I ran for the paper, and I’ve 
> checked it into the github repository at 
> https://github.com/plasma-umass/coz/tree/master/benchmarks/sqlite. This 
> benchmark creates 64 threads that operate on independent tables in the sqlite 
> database, performing operations that should be almost entirely independent. 
> This benchmark exposes contention inside of sqlite, since running it with a 
> larger number of hardware threads will hurt performance. I see a performance 
> improvement of nearly 5x when I run this on a two-core linux VM versus a 
> 64-thread Xeon machine, since there are fewer opportunities for the threads 
> to interfere with each other.
> 
> 
> 
> You can also find the modified version of sqlite with the same benchmark at 
> https://github.com/plasma-umass/coz/tree/master/benchmarks/sqlite-modified. 
> There are just a few changes from indirect to direct calls in the sqlite3.c 
> file.
> 
> 
> 
> I reran the experiment on the same machine we used for the original Coz 
> paper, and saw a performance improvement of around 20% with the modified 
> version of sqlite. That’s slightly less than what we originally found, but I 
> didn’t do many runs (just five) and there’s quite a bit of variability. The 
> compiler has been upgraded on this machine as well, so there could be some 
> effect there as well. On a much-newer 64-thread Xeon machine I see a 
> difference of just 5%, still in favor of the modified version of sqlite. 
> That’s not terribly surprising, since Intel has baked a lot of extra 
> pointer-chasing and branch prediction smarts into processors in the years 
> since we set up the 64-core AMD machine we originally used for the Coz 
> benchmarks.
> 
> 
> 
> As far as measuring performance, I’d encourage you *not* to use cpu cycles as 
> a proxy for runtime. Dynamic frequency scaling can mess up these 
> measurements, especially if the clock frequency is dropped in response to the 
> program’s behavior. Putting many threads to sleep might allow the OS to drop 
> the CPU frequency, thereby reducing the number of CPU cycles. That doesn’t 
> mean the program will actually run in a shorter wall clock time. Some CPUs 
> have a hardware event that counts “clock cycles” at a constant rate even with 
> frequency scaling, but these are really just high-precision timers and would 
> be perfectly fine for measuring runtime. I’m thinking of the “ref-cycles” 
> event from perf here.
> 
> 
> 
> Hope this helps,
> 
> - Charlie
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-02 Thread Doug
I asked for some information from Emery Berger about his video talk on 
performance where he said they got a 25% improvement in SQLite performance. 
Here is the reply I got back.

 

I know there has been a lot of talk about what can and cannot be done with the 
C calling interface because of compatibility issues and the myriad set of 
wrappers on various forms. I’m having a hard time letting go of a possible 25% 
performance improvement.

 

I don’t have the slightest idea on how to run a benchmark (but I could learn). 
I wonder if the current set of benchmarks used by SQLite developers actually 
measure throughput using wall-clock numbers. It might be a good idea to put a 
wrapper around all the benchmarks to capture how long they took to run 
(wall-clock), and include things like number and type of cpu cores, average cpu 
busy time, and other relevant numbers. If the benchmarks are run on lots of 
different machines (all over the world?), it would provide an excellent view of 
what changes in SQLite made a difference in performance.

 

Doug

 

From: Curtsinger, Charlie  
Sent: Thursday, January 02, 2020 10:55 AM
To: dougf@comcast.net
Cc: Emery D Berger 
Subject: Re: Questions about your "Performance Matters" talk re SQLite

 

Hello Doug,

 

I was able to track down the sqlite benchmark I ran for the paper, and I’ve 
checked it into the github repository at 
https://github.com/plasma-umass/coz/tree/master/benchmarks/sqlite. This 
benchmark creates 64 threads that operate on independent tables in the sqlite 
database, performing operations that should be almost entirely independent. 
This benchmark exposes contention inside of sqlite, since running it with a 
larger number of hardware threads will hurt performance. I see a performance 
improvement of nearly 5x when I run this on a two-core linux VM versus a 
64-thread Xeon machine, since there are fewer opportunities for the threads to 
interfere with each other.

 

You can also find the modified version of sqlite with the same benchmark at 
https://github.com/plasma-umass/coz/tree/master/benchmarks/sqlite-modified. 
There are just a few changes from indirect to direct calls in the sqlite3.c 
file.

 

I reran the experiment on the same machine we used for the original Coz paper, 
and saw a performance improvement of around 20% with the modified version of 
sqlite. That’s slightly less than what we originally found, but I didn’t do 
many runs (just five) and there’s quite a bit of variability. The compiler has 
been upgraded on this machine as well, so there could be some effect there as 
well. On a much-newer 64-thread Xeon machine I see a difference of just 5%, 
still in favor of the modified version of sqlite. That’s not terribly 
surprising, since Intel has baked a lot of extra pointer-chasing and branch 
prediction smarts into processors in the years since we set up the 64-core AMD 
machine we originally used for the Coz benchmarks.

 

As far as measuring performance, I’d encourage you *not* to use cpu cycles as a 
proxy for runtime. Dynamic frequency scaling can mess up these measurements, 
especially if the clock frequency is dropped in response to the program’s 
behavior. Putting many threads to sleep might allow the OS to drop the CPU 
frequency, thereby reducing the number of CPU cycles. That doesn’t mean the 
program will actually run in a shorter wall clock time. Some CPUs have a 
hardware event that counts “clock cycles” at a constant rate even with 
frequency scaling, but these are really just high-precision timers and would be 
perfectly fine for measuring runtime. I’m thinking of the “ref-cycles” event 
from perf here.

 

Hope this helps,

- Charlie

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users