On Mon, Sep 01, 2008 at 03:10:05PM -0400, [EMAIL PROTECTED] scratched on the 
wall:
> Does anyone have any benchmarks to share that compare common SQLite
> operations running under a 32 and 64 bit versions of SQLite? This
> question is OS neutral so please feel to share your experience with 32
> and 64 bit versions of Windows or Linux.
> 
> Background: Will 64 bit versions of SQLite buy us any significant
> difference in performance? I may have a chance to get our department 64
> bit AMD workstations with 8G, but I need to justify the extra cost
> against a reasonable guess at what the performance improvement may be
> against an equivalent 32 bit AMD 4G workstation. These workstations will
> be processing very large text based log files (> 16G each). We will be
> using Python 2.52 as our SQLite scripting tool.


  If money is limited, the first thing I'd be looking at is I/O
  systems and fast storage.  It doesn't matter how fast your CPU and
  application are if they're sitting around waiting for the drives to
  process data.  I'd be looking at the fastest drives you can find or,
  if funds are sufficient, a RAID systems with lots of drives (and lots
  of heads) to get as much parallel I/O as you can.  This is a case
  when eight smaller drives is better than four larger ones, assuming
  all else is the same.



  If you're processing text based log files, I assume you're just
  scanning those top to bottom.  As a result, cache size is fairly 
  unimportant, as you're going to read everything once and then dump it.
  Caches systems only provide an advantage if you come back to the data
  later.  As the file size grows, that all comes down to raw-I/O speeds.
  The memory situation is less demanding as memory-mapped I/O,
  double-/triple-buffered asyc I/O calls, or one of the other mass-I/O
  techniques is very likely going to serve quite well with only 256MB or
  less, even for the faster RAID systems.

  Where memory and cache size does become very important is the
  database size.  If you process a 20GB log file into a 1GB database,
  the extra address space afforded by the 64 bit system isn't going to
  buy you anything, since you can comfortably cache the whole database
  within the 32 bit address space.  On the other hand, if you're talking
  about databases in the 3 to 6GB range and, to a lesser degree,
  beyond, the 64 bit system (and the extra RAM) may serve you quite well.

  BUT... (and there is always a "but") don't forget that writing the
  database doesn't get a massive boost from caching anyways, unless
  you play things fast and loose and turn off most of SQLite's safety
  nets.  Assuming you're running in the normal mode where disk writes
  are sync'ed as much as possible, I/O speeds still play a very strong
  part of your total processing time any time INSERTs or UPDATEs are
  involved.

  If the processing just INSERTs records to the database, you get back
  into a situation much like reading the log files... pages tend to go
  one way (out to disk) and caching is less important.  As long as you
  have enough cache to hold your indexes, you're fairly good to go.

  On the other hand, if you do stuff like record event counts,
  incrementing fields with each log file record that is processed, the
  total database cache size becomes a lot more important.  That gets
  back to the question of database size, and if you really need the 64
  bit system or not, and how much it really matters once you get much
  beyond ~6GB.

  There is also the question of what you do with these database files.
  If you do a lot of processing once the database files are formed,
  then total memory becomes important, as we can assume you're pulling
  a lot of pages in and out and doing all kinds of interesting things.
  On the other hand, if all you do is scan the data and generate a
  report or two and that's it, you're back to the issue of limited
  cache use and I/O becoming king.



  The "best" decision really depends on what you're trying to do with
  all this data.  The two important points I'd make, however, are that
  cache doesn't matter unless you re-use the data, and that disk I/O
  tends to be thousands (if not millions) of times slower than CPU
  actions-- and that SQLite is extremely disk I/O intensive.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' Jesus said that."
   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to