subject:"\[HACKERS\] Large databases, performance"

Re: [HACKERS] Large databases, performance

2002-10-04 Thread Shridhar Daithankar


On 3 Oct 2002 at 18:53, Manfred Koizar wrote:

 On Thu, 03 Oct 2002 21:47:03 +0530, Shridhar Daithankar
 [EMAIL PROTECTED] wrote:
 I believe that was vacuum analyze only.
 
 Well there is
 
   VACUUM [tablename]; 
 
 and there is
 
   ANALYZE [tablename];
 
 And
 
   VACUUM ANALYZE [tablename];
 
 is VACUUM followed by ANALYZE.

I was using vacuum analyze. 

Good that you pointed out. Now I will modify the postgresql auto vacuum daemon 
that I wrote to analyze only in case of excesive inserts. I hope that's lighter 
on performance compared to vacuum analyze..

Bye
 Shridhar

--
Mix's Law:  There is nothing more permanent than a temporary building.  There 
is 
nothing more permanent than a temporary tax.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] Large databases, performance

2002-10-03 Thread Shridhar Daithankar


Hi,

Today we concluded test for database performance. Attached are results and the 
schema, for those who have missed earlier discussion on this.

We have (almost) decided that we will partition the data across machines. The 
theme is, after every some short interval a burst of data will be entered in 
new table in database, indexed and vacuume. The table(s) will be inherited so 
that query on base table will fetch results from all the children. The 
application has to consolidate all the data per node basis. If the database is 
not postgresql, app. has to consolidate data across partitions as well.

Now we need to investigate whether selecting on base table to include children 
would use indexes created on children table.

It's estimated that when entire data is gathered, total number of children 
tables would be around 1K-1.1K across all machines. 

This is in point of average rate of data insertion i.e. 5K records/sec and 
total data size, estimated to be 9 billion rows max i.e. estimated database 
size is 900GB. Obviously it's impossible to keep insertion rate on an indexed 
table high as data grows. So partitioning/inheritance looks better approach. 

Postgresql is not the final winner as yet. Mysql is in close range. I will keep 
you guys posted about the result.

Let me know about any comments..

Bye
 Shridhar

--
Price's Advice: It's all a game -- play it to have fun.




Machine 
Compaq Proliant Server ML 530  
 
Intel Xeon 2.4 Ghz Processor x 4,
 
4 GB RAM, 5 x 72.8 GB SCSI HDD   
 
RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0 
 
Cost - $13,500 ($1,350 for each additional 72GB HDD) 
 

Performance Parameter   MySQL 3.23.52   MySQL 3.23.52  
 PostgreSQL 7.2.2
WITHOUT InnoDB  WITH InnoDB 
for with built-in support   
for transactional   transactional 
support   for transactions
support
 
Complete Data   

Inserts + building a composite index   
 
40 GB data, 432,000,000 tuples3738 secs   18720 secs 
 20628 secs  
about 100 bytes each, schema on 
'schema' sheet 
composite index on 3 fields 
(esn, min, datetime)   

Load Speed  115570 tuples/second23076 
tuples/second 20942 tuples/second

Database Size on Disk   48 GB   87 GB  
 111 GB

Average per partition   

Inserts + building a composite index
300MB data, 3,000,000 tuples, 28 secs 130 secs   
 150 secs
about 100 bytes each, schema on 
'schema' sheet 
composite index on 3 fields 
(esn, min, datetime)   

Select Query7 secs  7 secs 
 6 secs
based on equality match of 2 fields 
(esn and min) - 4 concurrent queries 
running

Database Size on Disk   341 MB  619 MB 
 788 MB


Field Name  Field Type  NullableIndexed
typeint no  no
esn char (10)   no  yes
min char (10)   no  yes
datetimetimestamp   no  yes
opc0char (3)no  no
opc1char (3)no  no
opc2char (3)no  no
dpc0char (3)no  no
dpc1char (3)no  no
dpc2char (3)no  no
npa char (3)no

Re: [HACKERS] Large databases, performance

2002-10-03 Thread Robert Treat


NOTE: Setting follow up to the performance list

Funny that the status quo seems to be if you need fast selects on data
that has few inserts to pick mysql, otherwise if you have a lot of
inserts and don't need super fast selects go with PostgreSQL; yet your
data seems to cut directly against this. 

I'm curious, did you happen to run the select tests while also running
the insert tests? IIRC the older mysql versions have to lock the table
when doing the insert, so select performance goes in the dumper in that
scenario, perhaps that's not an issue with 3.23.52? 

It also seems like the vacuum after each insert is unnecessary, unless
your also deleting/updating data behind it. Perhaps just running an
ANALYZE on the table would suffice while reducing overhead.

Robert Treat

On Thu, 2002-10-03 at 08:36, Shridhar Daithankar wrote:
 Machine   
 Compaq Proliant Server ML 530
 
 Intel Xeon 2.4 Ghz Processor x 4,  
 
 4 GB RAM, 5 x 72.8 GB SCSI HDD 
 
 RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0   
 
 Cost - $13,500 ($1,350 for each additional 72GB HDD)   
 
   
 Performance Parameter MySQL 3.23.52   MySQL 3.23.52  
 PostgreSQL 7.2.2
   WITHOUT InnoDB  WITH InnoDB 
for with built-in support   
   for transactional   transactional 
support   for transactions
   support
 
 Complete Data 
   
 Inserts + building a composite index 
 
 40 GB data, 432,000,000 tuples  3738 secs   18720 secs 
 20628 secs  
 about 100 bytes each, schema on 
 'schema' sheet   
 composite index on 3 fields 
 (esn, min, datetime) 
   
 Load Speed115570 tuples/second23076 
tuples/second 20942 tuples/second
   
 Database Size on Disk 48 GB   87 GB  
 111 GB
   
 Average per partition 
   
 Inserts + building a composite index  
 300MB data, 3,000,000 tuples,   28 secs 130 
secs150 secs
 about 100 bytes each, schema on 
 'schema' sheet   
 composite index on 3 fields 
 (esn, min, datetime) 
   
 Select Query  7 secs  7 secs 
 6 secs
 based on equality match of 2 fields   
 (esn and min) - 4 concurrent queries 
 running
   
 Database Size on Disk 341 MB  619 MB 
 788 MB
 



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Large databases, performance

2002-10-03 Thread Manfred Koizar


On Thu, 03 Oct 2002 21:47:03 +0530, Shridhar Daithankar
[EMAIL PROTECTED] wrote:
I believe that was vacuum analyze only.

Well there is

VACUUM [tablename]; 

and there is

ANALYZE [tablename];

And

VACUUM ANALYZE [tablename];

is VACUUM followed by ANALYZE.

Servus
 Manfred

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Large databases, performance

2002-10-03 Thread Shridhar Daithankar

On 3 Oct 2002 at 11:57, Robert Treat wrote:

 NOTE: Setting follow up to the performance list
 
 Funny that the status quo seems to be if you need fast selects on data
 that has few inserts to pick mysql, otherwise if you have a lot of
 inserts and don't need super fast selects go with PostgreSQL; yet your
 data seems to cut directly against this. 

Well, couple of things..

The number of inserts aren't few. it's 5000/sec.required in the field Secondly 
I don't know really but postgresql seems doing pretty fine in parallel selects. 
If we use mysql with transaction support then numbers are really close..

May be it's time to rewrite famous myth that postgresql is slow. When properly 
tuned or given enough head room, it's almost as fast as mysql..

 I'm curious, did you happen to run the select tests while also running
 the insert tests? IIRC the older mysql versions have to lock the table
 when doing the insert, so select performance goes in the dumper in that
 scenario, perhaps that's not an issue with 3.23.52? 

IMO even if it locks tables that shouldn't affect select performance. It would 
be fun to watch when we insert multiple chunks of data and fire queries 
concurrently. I would be surprised if mysql starts slowing down..

 It also seems like the vacuum after each insert is unnecessary, unless
 your also deleting/updating data behind it. Perhaps just running an
 ANALYZE on the table would suffice while reducing overhead.

I believe that was vacuum analyze only. But still it takes lot of time. Good 
thing is it's not blocking..

Anyway I don't think such frequent vacuums are going to convince planner to 
choose index scan over sequential scan. I am sure it's already convinced..

Regards,
 Shridhar

---
Shridhar Daithankar
LIMS CPE Team Member, PSPL.
mailto:shridhar_daithankar;persistent.co.in
Phone:- +91-20-5678900 Extn.270
Fax  :- +91-20-5678901 
---


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Large databases, performance

[HACKERS] Large databases, performance

Re: [HACKERS] Large databases, performance

Re: [HACKERS] Large databases, performance

Re: [HACKERS] Large databases, performance

5 matches

Site Navigation

Mail list logo

Footer information