Re: [sqlite] Difference in performance between 32 and 64 bit versions of SQLite?

2008-09-01 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

[EMAIL PROTECTED] wrote:
> Background: Will 64 bit versions of SQLite buy us any significant
> difference in performance?

You need to have some benchmark code first, ideally in the form of a
representative script to feed to the sqlite shell.  That way we can run
the code for you and give feedback.

In the specific case of SQLite the biggest determinant of performance
are the hard drives and the speed they spin at.  For transactions, data
has to be on the oxide and confirmed as such by the drive.  For 7200rpm
drives, each spot is under the heads 120 times per second.  Each
transaction requires two syncs so the maximum possible transaction rate
is 60 per second.  This will apply even if the data is spread across
raid stripes since every drive in the array still has to confirm the
data is committed. For random access the latency is also affected by the
drive speed, limiting you to a maximum of 120 random accesses per
second.  Sequential access will obviously be significantly faster but
again the maximum rate will be a factor of packing density and rpm.  You
can increase the amount of data that can be sent to/read from the drives
by using a RAID array, or by using drives that support NCQ.  For example
my bog standard Seagate 750GB drive will accept 100MB/s for writes and
does 70MB/s for mostly sequential reads.  A two drive array would
roughly double that.

> I may have a chance to get our department 64 bit AMD workstations

It is a very good idea to get 64 bit machines and all current
workstation processors are 64 bit anyway.  64 bit Windows and Linux will
also run and compile 32 bit binaries so you can test compatibility.

> with 8G, 

It is pointless having more than about 3G with 32 bit operating systems
as they need the last 1G of physical address space for peripherals,
video memory etc.

> These workstations will
> be processing very large text based log files (> 16G each).

Generally using 64 bit operating systems makes life considerably easier
for developers.  As an example you can just memory map the files and
then treat them as a char*.  You can assemble multiple files in memory,
do sorting etc without having to worry about hitting limits.

Note that SQLite itself has limits which are 32 bit based even in a 64
bit process.  For example the largest a single record could be is 2GB
(signed 32 bit maxint).

In terms of general 32 bit vs 64 bit performance, the AMD64 instruction
set doubles the number of registers from 8 to 16.  But pointers take up
twice the space (8 rather than 4 bytes).  So basically the code can have
more balls in the air, but requires more memory accessed to operate.  If
you have access to 64 bit Linux you can compile 64 and 32 bit versions
of the SQLite shell for easy comparison.

  $ ./configure --disable-shared CC="gcc -m32" && make && mv sqlite3 \
   sqlite3-32

Repeat replacing 32 with 64.  The resulting binary is 371kb (32 bit) and
428kb (64 bit).

> We will be
> using Python 2.52 as our SQLite scripting tool.

Good tool choice.  That is far more productive for developers than C
development :-)  You may also want to try the APSW bindings as well as
pysqlite.  APSW is reported to be faster by others doing benchmarking.
(Disclaimer: I am the author of APSW).

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIvK5pmOOfHg372QQRAs0zAJ9IIxOEFE+p9eLXNlyPOOoQFUu5pwCgoJiJ
bgnWRsZ58n3bGkgeGFwuHS8=
=QEH7
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Difference in performance between 32 and 64 bit versions of SQLite?

2008-09-01 Thread Jay A. Kreibich
On Mon, Sep 01, 2008 at 03:10:05PM -0400, [EMAIL PROTECTED] scratched on the 
wall:
> Does anyone have any benchmarks to share that compare common SQLite
> operations running under a 32 and 64 bit versions of SQLite? This
> question is OS neutral so please feel to share your experience with 32
> and 64 bit versions of Windows or Linux.
> 
> Background: Will 64 bit versions of SQLite buy us any significant
> difference in performance? I may have a chance to get our department 64
> bit AMD workstations with 8G, but I need to justify the extra cost
> against a reasonable guess at what the performance improvement may be
> against an equivalent 32 bit AMD 4G workstation. These workstations will
> be processing very large text based log files (> 16G each). We will be
> using Python 2.52 as our SQLite scripting tool.


  If money is limited, the first thing I'd be looking at is I/O
  systems and fast storage.  It doesn't matter how fast your CPU and
  application are if they're sitting around waiting for the drives to
  process data.  I'd be looking at the fastest drives you can find or,
  if funds are sufficient, a RAID systems with lots of drives (and lots
  of heads) to get as much parallel I/O as you can.  This is a case
  when eight smaller drives is better than four larger ones, assuming
  all else is the same.



  If you're processing text based log files, I assume you're just
  scanning those top to bottom.  As a result, cache size is fairly 
  unimportant, as you're going to read everything once and then dump it.
  Caches systems only provide an advantage if you come back to the data
  later.  As the file size grows, that all comes down to raw-I/O speeds.
  The memory situation is less demanding as memory-mapped I/O,
  double-/triple-buffered asyc I/O calls, or one of the other mass-I/O
  techniques is very likely going to serve quite well with only 256MB or
  less, even for the faster RAID systems.

  Where memory and cache size does become very important is the
  database size.  If you process a 20GB log file into a 1GB database,
  the extra address space afforded by the 64 bit system isn't going to
  buy you anything, since you can comfortably cache the whole database
  within the 32 bit address space.  On the other hand, if you're talking
  about databases in the 3 to 6GB range and, to a lesser degree,
  beyond, the 64 bit system (and the extra RAM) may serve you quite well.

  BUT... (and there is always a "but") don't forget that writing the
  database doesn't get a massive boost from caching anyways, unless
  you play things fast and loose and turn off most of SQLite's safety
  nets.  Assuming you're running in the normal mode where disk writes
  are sync'ed as much as possible, I/O speeds still play a very strong
  part of your total processing time any time INSERTs or UPDATEs are
  involved.

  If the processing just INSERTs records to the database, you get back
  into a situation much like reading the log files... pages tend to go
  one way (out to disk) and caching is less important.  As long as you
  have enough cache to hold your indexes, you're fairly good to go.

  On the other hand, if you do stuff like record event counts,
  incrementing fields with each log file record that is processed, the
  total database cache size becomes a lot more important.  That gets
  back to the question of database size, and if you really need the 64
  bit system or not, and how much it really matters once you get much
  beyond ~6GB.

  There is also the question of what you do with these database files.
  If you do a lot of processing once the database files are formed,
  then total memory becomes important, as we can assume you're pulling
  a lot of pages in and out and doing all kinds of interesting things.
  On the other hand, if all you do is scan the data and generate a
  report or two and that's it, you're back to the issue of limited
  cache use and I/O becoming king.



  The "best" decision really depends on what you're trying to do with
  all this data.  The two important points I'd make, however, are that
  cache doesn't matter unless you re-use the data, and that disk I/O
  tends to be thousands (if not millions) of times slower than CPU
  actions-- and that SQLite is extremely disk I/O intensive.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' Jesus said that."
   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Difference in performance between 32 and 64 bit versions of SQLite?

2008-09-01 Thread python
Does anyone have any benchmarks to share that compare common SQLite
operations running under a 32 and 64 bit versions of SQLite? This
question is OS neutral so please feel to share your experience with 32
and 64 bit versions of Windows or Linux.

Background: Will 64 bit versions of SQLite buy us any significant
difference in performance? I may have a chance to get our department 64
bit AMD workstations with 8G, but I need to justify the extra cost
against a reasonable guess at what the performance improvement may be
against an equivalent 32 bit AMD 4G workstation. These workstations will
be processing very large text based log files (> 16G each). We will be
using Python 2.52 as our SQLite scripting tool.

Thank you,
Malcolm
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users