I'll preface this by saying that while I have a large database, it doesn't
require quite the performace you're talking about here.
On Thu, Sep 26, 2002 at 02:05:44PM +0530, Shridhar Daithankar wrote:
> 1) Database load time from flat file using copy is very high
> 2) Creating index takes huge amount of time.
> 3) Any suggsestions for runtime as data load and query will be going in
You're loading all the data in one copy. I find that INSERTs are mostly
limited by indexes. While index lookups are cheap, they are not free and
each index needs to be updated for each row.
I fond using partial indexes to only index the rows you actually use can
help with the loading. It's a bit obscure though.
As for parallel loading, you'll be limited mostly by your I/O bandwidth.
Have you measured it to take sure it's up to speed?
> Now the details. Note that this is a test run only..
> Platform:- 4x Xeon2.4GHz/4GB RAM/4x48 SCSI RAID5/72 GB SCSI
> Database in flat file:
> 125,000,000 records of around 100 bytes each.
> Flat file size 12GB
> Load time: 14581 sec/~8600 rows persec/~ an MB of data per sec.
> Create unique composite index on 2 char and a timestamp field: 25226 sec.
> Database size on disk: 26GB
> Select query: 1.5 sec. for approx. 150 rows.
So you're loading at a rate of 860KB per sec. That's not too fast. How many
indexes are active at that time? Triggers and foreign keys also take their
> Important postgresql.conf settings
> sort_mem = 12000
> shared_buffers = 24000
> fsync=true (Sad but true. Left untouched.. Will that make a difference on
> wal_buffers = 65536
> wal_files = 64
fsync IIRC only affects the WAL buffers now but it may be quite expensive,
especially considering it's running on every transaction commit. Oh, your
WAL files are on a seperate disk from the data?
> Initial flat data load: 250GB of data. This has gone up since last query. It
> was 150GB earlier..
> Ongoing inserts: 5000/sec.
> Number of queries: 4800 queries/hour
> Query response time: 10 sec.
That looks quite acheivable.
> 1) Instead of copying from a single 12GB data file, will a parallel copy from
> say 5 files will speed up the things?
Limited by I/O bandwidth. On linux vmstat can tell you how many blocks are
being loaded and stored per second. Try it. As long as sync() doesn't get
done too often, it should be help.
> Couple MB of data per sec. to disk is just not saturating it. It's a RAID 5
No, it's not. You should be able to do better.
> 2) Sort mem.=12K i.e. 94MB, sounds good enough to me. Does this need further
> addition to improve create index performance?
Should be fine. Admittedly your indexes are taking rather long to build.
> 3) 5K concurrent inserts with an index on, will this need a additional CPU
> power? Like deploying it on dual RISC CPUs etc?
It shouldn't. Do you have an idea of what your CPU usage is? ps aux should
give you a decent idea.
> 4) Query performance is not a problem. Though 4.8K queries per sec. expected
> response time from each query is 10 sec. But my guess is some serius CPU power
> will be chewed there too..
Should be fine.
> 5)Will upgrading to 7.2.2/7.3 beta help?
Possibly, though it may be wirth it just for the features/bugfixes.
> All in all, in the test, we didn't see the performance where hardware is
> saturated to it's limits. So effectively we are not able to get postgresql
> making use of it. Just pushing WAL and shared buffers does not seem to be the
> If you guys have any suggestions. let me know. I need them all..
Find the bottleneck: CPU, I/O or memory?
> Mysql is almost out because it's creating index for last 17 hours. I don't
> think it will keep up with 5K inserts per sec. with index. SAP DB is under
> evaluation too. But postgresql is most favourite as of now because it works. So
> I need to come up with solutions to problems that will occur in near future..
17 hours! Ouch. Either way, you should be able to do much better. Hope this
Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]