On Mon, Sep 01, 2008 at 03:10:05PM -0400, [EMAIL PROTECTED] scratched on the
wall:
> Does anyone have any benchmarks to share that compare common SQLite
> operations running under a 32 and 64 bit versions of SQLite? This
> question is OS neutral so please feel to share your experience with 32
> and 64 bit versions of Windows or Linux.
>
> Background: Will 64 bit versions of SQLite buy us any significant
> difference in performance? I may have a chance to get our department 64
> bit AMD workstations with 8G, but I need to justify the extra cost
> against a reasonable guess at what the performance improvement may be
> against an equivalent 32 bit AMD 4G workstation. These workstations will
> be processing very large text based log files (> 16G each). We will be
> using Python 2.52 as our SQLite scripting tool.
If money is limited, the first thing I'd be looking at is I/O
systems and fast storage. It doesn't matter how fast your CPU and
application are if they're sitting around waiting for the drives to
process data. I'd be looking at the fastest drives you can find or,
if funds are sufficient, a RAID systems with lots of drives (and lots
of heads) to get as much parallel I/O as you can. This is a case
when eight smaller drives is better than four larger ones, assuming
all else is the same.
If you're processing text based log files, I assume you're just
scanning those top to bottom. As a result, cache size is fairly
unimportant, as you're going to read everything once and then dump it.
Caches systems only provide an advantage if you come back to the data
later. As the file size grows, that all comes down to raw-I/O speeds.
The memory situation is less demanding as memory-mapped I/O,
double-/triple-buffered asyc I/O calls, or one of the other mass-I/O
techniques is very likely going to serve quite well with only 256MB or
less, even for the faster RAID systems.
Where memory and cache size does become very important is the
database size. If you process a 20GB log file into a 1GB database,
the extra address space afforded by the 64 bit system isn't going to
buy you anything, since you can comfortably cache the whole database
within the 32 bit address space. On the other hand, if you're talking
about databases in the 3 to 6GB range and, to a lesser degree,
beyond, the 64 bit system (and the extra RAM) may serve you quite well.
BUT... (and there is always a "but") don't forget that writing the
database doesn't get a massive boost from caching anyways, unless
you play things fast and loose and turn off most of SQLite's safety
nets. Assuming you're running in the normal mode where disk writes
are sync'ed as much as possible, I/O speeds still play a very strong
part of your total processing time any time INSERTs or UPDATEs are
involved.
If the processing just INSERTs records to the database, you get back
into a situation much like reading the log files... pages tend to go
one way (out to disk) and caching is less important. As long as you
have enough cache to hold your indexes, you're fairly good to go.
On the other hand, if you do stuff like record event counts,
incrementing fields with each log file record that is processed, the
total database cache size becomes a lot more important. That gets
back to the question of database size, and if you really need the 64
bit system or not, and how much it really matters once you get much
beyond ~6GB.
There is also the question of what you do with these database files.
If you do a lot of processing once the database files are formed,
then total memory becomes important, as we can assume you're pulling
a lot of pages in and out and doing all kinds of interesting things.
On the other hand, if all you do is scan the data and generate a
report or two and that's it, you're back to the issue of limited
cache use and I/O becoming king.
The "best" decision really depends on what you're trying to do with
all this data. The two important points I'd make, however, are that
cache doesn't matter unless you re-use the data, and that disk I/O
tends to be thousands (if not millions) of times slower than CPU
actions-- and that SQLite is extremely disk I/O intensive.
-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >
"'People who live in bamboo houses should not throw pandas.' Jesus said that."
- "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users