Re: [HACKERS] [GENERAL] Large databases, performance

Charles H. Woloszynski Fri, 04 Oct 2002 05:33:51 -0700

Can you comment on the tools you are using to do the insertions (Perl, 
Java?) and the distribution of data (all random, all static), and the 
transaction scope (all inserts in one transaction, each insert as a 
single transaction, some group of inserts as a transaction).


I'd be curious what happens when you submit more queries than you have 
processors (you had four concurrent queries and four CPUs), if you care 
to run any additional tests.  Also, I'd report the query time in 
absolute (like you did) and also in 'Time/number of concurrent queries". 
 This will give you a sense of how the system is scaling as the workload 
increases.  Personally I am more concerned about this aspect than the 
load time, since I am going to guess that this is where all the time is 
spent.  

Was the original posting on GENERAL or HACKERS.  Is this moving the 
PERFORMANCE for follow-up?  I'd like to follow this discussion and want 
to know if I should join another group?

Thanks,

Charlie

P.S.  Anyone want to comment on their expectation for 'commercial' 
databases handling this load?  I know that we cannot speak about 
specific performance metrics on some products (licensing restrictions) 
but I'd be curious if folks have seen some of the databases out there 
handle these dataset sizes and respond resonably.


Shridhar Daithankar wrote:

>Hi,
>
>Today we concluded test for database performance. Attached are results and the 
>schema, for those who have missed earlier discussion on this.
>
>We have (almost) decided that we will partition the data across machines. The 
>theme is, after every some short interval a burst of data will be entered in 
>new table in database, indexed and vacuume. The table(s) will be inherited so 
>that query on base table will fetch results from all the children. The 
>application has to consolidate all the data per node basis. If the database is 
>not postgresql, app. has to consolidate data across partitions as well.
>
>Now we need to investigate whether selecting on base table to include children 
>would use indexes created on children table.
>
>It's estimated that when entire data is gathered, total number of children 
>tables would be around 1K-1.1K across all machines. 
>
>This is in point of average rate of data insertion i.e. 5K records/sec and 
>total data size, estimated to be 9 billion rows max i.e. estimated database 
>size is 900GB. Obviously it's impossible to keep insertion rate on an indexed 
>table high as data grows. So partitioning/inheritance looks better approach. 
>
>Postgresql is not the final winner as yet. Mysql is in close range. I will keep 
>you guys posted about the result.
>
>Let me know about any comments..
>
>Bye
> Shridhar
>
>--
>Price's Advice:        It's all a game -- play it to have fun.
>
>
>  
>
>------------------------------------------------------------------------
>
>Machine                                                                
>Compaq Proliant Server ML 530                                                         
> 
>"Intel Xeon 2.4 Ghz Processor x 4, "                                                  
>         
>"4 GB RAM, 5 x 72.8 GB SCSI HDD "                                                     
>         
>"RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0"                                
>                         
>"Cost - $13,500 ($1,350 for each additional 72GB HDD)"                                
>                         
>                                                               
>Performance Parameter                          MySQL 3.23.52           MySQL 3.23.52  
>         PostgreSQL 7.2.2                
>                                               WITHOUT InnoDB          WITH InnoDB 
>for         with built-in support           
>                                               for transactional       transactional 
>support   for transactions
>                                               support                                
>                         
>Complete Data                                                          
>                                                               
>Inserts + building a composite index                                                  
>         
>"40 GB data, 432,000,000 tuples"               3738 secs               18720 secs     
>         20628 secs              
>"about 100 bytes each, schema on 
>'schema' sheet"                                                                
>"composite index on 3 fields 
>(esn, min, datetime)"                                                          
>                                               
>Load Speed                                     115570 tuples/second    23076 
>tuples/second     20942 tuples/second
>                                               
>Database Size on Disk                          48 GB                   87 GB          
>         111 GB
>                                               
>Average per partition                                          
>                                               
>Inserts + building a composite index                                           
>"300MB data, 3,000,000 tuples,"                        28 secs                 130 
>secs                150 secs
>"about 100 bytes each, schema on 
>'schema' sheet"                                                
>"composite index on 3 fields 
>(esn, min, datetime)"                                          
>                                               
>Select Query                                   7 secs                  7 secs         
>         6 secs
>based on equality match of 2 fields                                            
>(esn and min) - 4 concurrent queries 
>running
>                                               
>Database Size on Disk                          341 MB                  619 MB         
>         788 MB
>  
>
>------------------------------------------------------------------------
>
>Field Name     Field Type      Nullable        Indexed
>type           int             no              no
>esn            char (10)       no              yes
>min            char (10)       no              yes
>datetime       timestamp       no              yes
>opc0           char (3)        no              no
>opc1           char (3)        no              no
>opc2           char (3)        no              no
>dpc0           char (3)        no              no
>dpc1           char (3)        no              no
>dpc2           char (3)        no              no
>npa            char (3)        no              no
>nxx            char (3)        no              no
>rest           char (4)        no              no
>field0         int             yes             no
>field1         char (4)        yes             no
>field2         int             yes             no
>field3         char (4)        yes             no
>field4         int             yes             no
>field5         char (4)        yes             no
>field6         int             yes             no
>field7         char (4)        yes             no
>field8         int             yes             no
>field9         char (4)        yes             no
>
>  
>
>------------------------------------------------------------------------
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
>  
>

-- 


Charles H. Woloszynski

ClearMetrix, Inc.
115 Research Drive
Bethlehem, PA 18015

tel: 610-419-2210 x400
fax: 240-371-3256
web: www.clearmetrix.com





---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] [GENERAL] Large databases, performance

Reply via email to