I did a bunch of experiments to understand the issues of the strangely high write transaction performance once and forall. I learned some things and maybe came across a problem in SAPDB when running on Windows and ATA drives.
I used the following setup: Hardware: HD1: IBM DORS-32160 SCSI, an old 2GB drive with 5.400 RPMs HD2: IBM DTLA-307030 ATA-100, a younger 30GB drive with 7.200 RPMs in a Intel PIII-800, .5G RAM, Promise UDMA100, Adaptec 1542. Software: 1) Stock 7.4 binary release for Windows (Kernel 7.4.3 Build 010-120-035-462) 2) Oracle 9iR2 Personal Ed. for Windows both on Windows 2000SP2. Task: I INSERTed 10.000 records into a table mytab (f1 integer, f2 varchar(100)) from a Notebook connected via 100MBit running two different breeds of clients (to divide out client specifica): Client A: C++/OTL based client with streamsize == 1, autocommit == OFF, prepared SQL and COMMIT after every INSERT. Client B: Java/JDBC based client with autocommit == OFF, prepared SQL, COMMIT nach jedem INSERT. Results: Client A -> SAPDB on HD2 w/ write cache OFF 600 INSERTs/sec Client A -> SAPDB on HD1 w/ write cache OFF 89 INSERTs/sec Client A -> Oracle on HD2 w/ write cache OFF 115 INSERTs/sec Client A -> Oracle on HD1 w/ write cache OFF 82 INSERTs/sec Client B -> SAPDB on HD2 w/ write cache OFF 620 INSERTs/sec Client B -> SAPDB on HD1 w/ write cache OFF 89 INSERTs/sec Client B -> Oracle on HD2 w/ write cache OFF 112 INSERTs/sec Client B -> Oracle on HD1 w/ write cache OFF 80 INSERTs/sec For HD1, which runs at 5400 rpms I expected an upper limit of 5400/60=90 synchronous writes. Both Oracle and SAP DB do as expected. For HD2, which runs at 7200 rpms I expected an upper limit of 7200/60=120 synchronous writes. Oracle does as expected, whereas SAP DB does much more: over 600 writes/s. This can't be the case - something goes wrong here. Don't say I should be lucky to have as much performance;) I'd like to have it "durable" too. The sources to the test proggies are here: http://www.brainthat.com/cvspublic/cvsweb/hacking/code/tests/sapdb/cppclient s/DbClient1/ http://www.brainthat.com/cvspublic/cvsweb/hacking/code/tests/oracle/cppclien ts/DbClient1/ http://www.brainthat.com/cvspublic/cvsweb/hacking/code/tests/sapdb/javaclien ts/SimpleDml/ http://www.brainthat.com/cvspublic/cvsweb/hacking/code/tests/oracle/javaclie nts/SimpleDml/ I did more tests with a "synthetic log writer". I found that under Cygwin/Windows, it does NOT produce equal results to open a file O_SYNC or fsync()ing explicitely. Doing the same under Linux, those two variants do produce the same results. On the same harddisk (the ATA)! Under Linux (Debian woody, stock 2.4 kernel), I ran a little test prog to _simulate_ what a database does when writing the transaction log. You may find code here: http://www.brainthat.com/cvspublic/cvsweb/hacking/code/tests/transactionperf / These tests are for estimating the maximum write transaction performance a database system may achieve on given hardware. See the discussion: http://www.sleepycat.com/docs/ref/transapp/throughput.html "If you are bottlenecked on logging, the following test will help you confirm that the number of transactions per second that your application does is reasonable for the hardware on which you're running. Your test program should repeatedly perform the following operations: * Seek to the beginning of a file * Write to the file * Flush the file write to disk The number of times that you can perform these three operations per second is a rough measure of the minimum number of transactions per second of which the hardware is capable. This test simulates the operations applied to the log file. (As a simplifying assumption in this experiment, we assume that the database files are either on a separate disk; or that they fit, with some few exceptions, into the database cache.) We do not have to directly simulate updating the log file directory information because it will normally be updated and flushed to disk as a result of flushing the log file write to disk." Linux results: For HD1, when either fsync'ing or opening the "transaction log" using O_SYNC, the write performance was nearly exactly 90 ops/s as predicted. For HD2, the write performace was first over 2.600 ops/s. Until I switched off write caching of the drive using a IBM supplied tool. After that, performance was down to an expected approx. 115 ops/sec. The manual that comes with the tool makes it quite clear. >From the "IBM Feature Tool Users Guide" (http://www.hgst.com/hdd/support/ftool.pdf): "Write cache allows the drive to write data out to the disk media some time after reporting to the system that the write operation had been completed. This data is protected provided power isn't removed from the drive." "Write cache is a performance enhancement whereby the device reports completion of the write command (Write Sectors, Write DMA and Write Multiple) to the host as soon as the device has received all of the data into its buffer. The device assumes responsibility to write the data subsequently onto the disk. While writing data after completing the acknowledgement of a write command, neither soft nor hard resets will affect its operation. But power-off terminates the writing operation immediately and any unwritten data will be lost." Under Windows 2000 I compiled and ran the test proggy using Cygwin. The results were: HD1, write cache ON (the Windows device manager allows to enable/disable write caching using a checkbox under the drive's details. At least it says so.) [EMAIL PROTECTED] ~/sandbox/brainthat/hacking/code/tests/transactionperf $ ./writetest.exe -e -b 256 -f /cygdrive/g/data.dat running: 1000 ops with data size 256 bytes Elapsed time: 43.837000 seconds 1000 ops: 22.81 ops per second Believable, but bad performance. Note, the "-e" option means that the test proggy will explicitely fsync() after every write. Note also, that apparently the write cache had no influence this time. [EMAIL PROTECTED] ~/sandbox/brainthat/hacking/code/tests/transactionperf $ ./writetest.exe -b 256 -f /cygdrive/g/data.dat running: 1000 ops with data size 256 bytes Elapsed time: 0.000000 seconds 1000 ops: Inf ops per second Bogus. Note, in this case the file was openend with flag O_SYNC meaning "auto-sync" instead of explicitely fsyncing. HD1, write cache OFF [EMAIL PROTECTED] ~/sandbox/brainthat/hacking/code/tests/transactionperf $ ./writetest.exe -e -b 256 -f /cygdrive/g/data.dat running: 1000 ops with data size 256 bytes Elapsed time: 44.355000 seconds 1000 ops: 22.55 ops per second Same, bad performance as before. [EMAIL PROTECTED] ~/sandbox/brainthat/hacking/code/tests/transactionperf $ ./writetest.exe -b 256 -f /cygdrive/g/data.dat running: 1000 ops with data size 256 bytes Elapsed time: 0.010000 seconds 1000 ops: 100000.00 ops per second Bogus as before, despite unchecking the "write cache enabled" checkbox in the Windows device manager. HD2, write cache OFF [EMAIL PROTECTED] ~/sandbox/brainthat/hacking/code/tests/transactionperf $ ./writetest.exe -e -b 256 -f /cygdrive/c/data.dat running: 1000 ops with data size 256 bytes Elapsed time: 9.034000 seconds 1000 ops: 110.69 ops per second This is expected and good performance. Using explicit fsync(). [EMAIL PROTECTED] ~/sandbox/brainthat/hacking/code/tests/transactionperf $ ./writetest.exe -b 256 -f /cygdrive/c/data.dat running: 1000 ops with data size 256 bytes Elapsed time: 0.022000 seconds 1000 ops: 45454.55 ops per second This is bogus. Unlike Linux, for Cygwin it makes a difference either opening with O_SYNC or fsyncing. I suppose there is a problem with Cygwin's O_SYNC implementation or Win32 doesn't fit it. And a problem with Cygwin's performance on SCSI drives. But that's another case;) Greets, Tobias _______________________________________________ sapdb.general mailing list [EMAIL PROTECTED] http://listserv.sap.com/mailman/listinfo/sapdb.general
