I did a bunch of experiments to understand the issues of the strangely high
write transaction performance once and forall. I learned some things and
maybe came across a problem in SAPDB when running on Windows and ATA drives.


I used the following setup:

Hardware:

HD1: IBM DORS-32160 SCSI, an old 2GB drive with 5.400 RPMs
HD2: IBM DTLA-307030 ATA-100, a younger 30GB drive with 7.200 RPMs

in a Intel PIII-800, .5G RAM, Promise UDMA100, Adaptec 1542.


Software:

1) Stock 7.4 binary release for Windows (Kernel 7.4.3 Build 010-120-035-462)
2) Oracle 9iR2 Personal Ed. for Windows

both on Windows 2000SP2.


Task:

I INSERTed 10.000 records into a table mytab (f1 integer, f2 varchar(100))
from a Notebook connected via 100MBit running two different breeds of
clients
(to divide out client specifica):

Client A: C++/OTL based client with streamsize == 1, autocommit == OFF,
prepared SQL and COMMIT after every INSERT.

Client B: Java/JDBC based client with autocommit == OFF, prepared SQL,
COMMIT nach jedem INSERT.


Results:

Client A -> SAPDB on HD2 w/ write cache OFF      600 INSERTs/sec
Client A -> SAPDB on HD1 w/ write cache OFF      89 INSERTs/sec
Client A -> Oracle on HD2 w/ write cache OFF     115 INSERTs/sec
Client A -> Oracle on HD1 w/ write cache OFF     82 INSERTs/sec

Client B -> SAPDB on HD2 w/ write cache OFF      620 INSERTs/sec
Client B -> SAPDB on HD1 w/ write cache OFF      89 INSERTs/sec
Client B -> Oracle on HD2 w/ write cache OFF     112 INSERTs/sec
Client B -> Oracle on HD1 w/ write cache OFF     80 INSERTs/sec


For HD1, which runs at 5400 rpms I expected an upper limit of
5400/60=90 synchronous writes. Both Oracle and SAP DB do as expected.

For HD2, which runs at 7200 rpms I expected an upper limit of
7200/60=120 synchronous writes. Oracle does as expected, whereas
SAP DB does much more: over 600 writes/s. This can't be the case -
something goes wrong here. Don't say I should be lucky to have as
much performance;) I'd like to have it "durable" too.

The sources to the test proggies are here:
http://www.brainthat.com/cvspublic/cvsweb/hacking/code/tests/sapdb/cppclient
s/DbClient1/
http://www.brainthat.com/cvspublic/cvsweb/hacking/code/tests/oracle/cppclien
ts/DbClient1/
http://www.brainthat.com/cvspublic/cvsweb/hacking/code/tests/sapdb/javaclien
ts/SimpleDml/
http://www.brainthat.com/cvspublic/cvsweb/hacking/code/tests/oracle/javaclie
nts/SimpleDml/


I did more tests with a "synthetic log writer". I found that under
Cygwin/Windows,
it does NOT produce equal results to open a file O_SYNC or fsync()ing
explicitely.
Doing the same under Linux, those two variants do produce the same results.
On the
same harddisk (the ATA)!

Under Linux (Debian woody, stock 2.4 kernel), I ran a little test prog
to _simulate_ what a database does when writing the transaction log. You may
find code here:

http://www.brainthat.com/cvspublic/cvsweb/hacking/code/tests/transactionperf
/

These tests are for estimating the maximum write transaction performance
a database system may achieve on given hardware.

See the discussion:
http://www.sleepycat.com/docs/ref/transapp/throughput.html

"If you are bottlenecked on logging, the following test will help you
 confirm that the number of transactions per second that your application
 does is reasonable for the hardware on which you're running. Your test
 program should repeatedly perform the following operations:

    * Seek to the beginning of a file
    * Write to the file
    * Flush the file write to disk

 The number of times that you can perform these three operations per second
 is a rough measure of the minimum number of transactions per second of
which
 the hardware is capable. This test simulates the operations applied to the
 log file. (As a simplifying assumption in this experiment, we assume that
the
 database files are either on a separate disk; or that they fit, with some
 few exceptions, into the database cache.) We do not have to directly
simulate
 updating the log file directory information because it will normally be
 updated and flushed to disk as a result of flushing the log file write to
 disk."

Linux results:

For HD1, when either fsync'ing or opening the "transaction log" using
O_SYNC,
the write performance was nearly exactly 90 ops/s as predicted.

For HD2, the write performace was first over 2.600 ops/s. Until I switched
off
write caching of the drive using a IBM supplied tool. After that,
performance
was down to an expected approx. 115 ops/sec.

The manual that comes with the tool makes it quite clear.
>From the "IBM Feature Tool Users Guide"
(http://www.hgst.com/hdd/support/ftool.pdf):

"Write cache allows the drive to write data out to the disk media some time
 after reporting to the system that the write operation had been completed.
 This data is protected provided power isn't removed from the drive."

"Write cache is a performance enhancement whereby the device reports
 completion of the write command (Write Sectors, Write DMA and Write
Multiple)
 to the host as soon as the device has received all of the data into its
 buffer. The device assumes responsibility to write the data subsequently
onto
 the disk. While writing data after completing the acknowledgement of a
write
 command, neither soft nor hard resets will affect its operation. But
power-off
 terminates the writing operation immediately and any unwritten data will be
 lost."


Under Windows 2000 I compiled and ran the test proggy using Cygwin. The
results were:

HD1, write cache ON (the Windows device manager allows to enable/disable
write caching using a checkbox under the drive's details. At least it
says so.)

  [EMAIL PROTECTED]
~/sandbox/brainthat/hacking/code/tests/transactionperf
  $ ./writetest.exe -e -b 256 -f /cygdrive/g/data.dat
  running: 1000 ops with data size 256 bytes
  Elapsed time: 43.837000 seconds
  1000 ops:   22.81 ops per second

Believable, but bad performance. Note, the "-e" option means that the test
proggy will explicitely fsync() after every write. Note also, that
apparently
the write cache had no influence this time.

  [EMAIL PROTECTED]
~/sandbox/brainthat/hacking/code/tests/transactionperf
  $ ./writetest.exe -b 256 -f /cygdrive/g/data.dat
  running: 1000 ops with data size 256 bytes
  Elapsed time: 0.000000 seconds
  1000 ops:     Inf ops per second

Bogus. Note, in this case the file was openend with flag O_SYNC meaning
"auto-sync"
instead of explicitely fsyncing.


HD1, write cache OFF

  [EMAIL PROTECTED]
~/sandbox/brainthat/hacking/code/tests/transactionperf
  $ ./writetest.exe -e -b 256 -f /cygdrive/g/data.dat
  running: 1000 ops with data size 256 bytes
  Elapsed time: 44.355000 seconds
  1000 ops:   22.55 ops per second

Same, bad performance as before.

  [EMAIL PROTECTED]
~/sandbox/brainthat/hacking/code/tests/transactionperf
  $ ./writetest.exe -b 256 -f /cygdrive/g/data.dat
  running: 1000 ops with data size 256 bytes
  Elapsed time: 0.010000 seconds
  1000 ops: 100000.00 ops per second

Bogus as before, despite unchecking the "write cache enabled" checkbox in
the
Windows device manager.


HD2, write cache OFF

  [EMAIL PROTECTED]
~/sandbox/brainthat/hacking/code/tests/transactionperf
  $ ./writetest.exe -e -b 256 -f /cygdrive/c/data.dat
  running: 1000 ops with data size 256 bytes
  Elapsed time: 9.034000 seconds
  1000 ops:  110.69 ops per second

This is expected and good performance. Using explicit fsync().

  [EMAIL PROTECTED]
~/sandbox/brainthat/hacking/code/tests/transactionperf
  $ ./writetest.exe -b 256 -f /cygdrive/c/data.dat
  running: 1000 ops with data size 256 bytes
  Elapsed time: 0.022000 seconds
  1000 ops: 45454.55 ops per second

This is bogus. Unlike Linux, for Cygwin it makes a difference either opening
with O_SYNC or fsyncing.
I suppose there is a problem with Cygwin's O_SYNC implementation or Win32
doesn't
fit it. And a problem with Cygwin's performance on SCSI drives. But that's
another
case;)

Greets,
Tobias

_______________________________________________
sapdb.general mailing list
[EMAIL PROTECTED]
http://listserv.sap.com/mailman/listinfo/sapdb.general

Reply via email to