Thanks for sharing T3 experiences, I'll have to go down this road soon, very helpful. I can spot a tall tale when I hear one, however, keeping your opinions, when pigs fly!
On Mon, Aug 12, 2002 at 06:53:20PM -0800, Dave Morgan wrote: > Hi All, > Well after 6 wekks of testing here is the basic way > to operate SUN T3's as efficiently as possible. > > Be prepared for arguments with High priests from the cult of SAME. > > SUN T3's are fiber attached hardware RAID 5 arrays with > a "modern cache". The hardware engineers argue that if you need more > I/Os/sec just add another array as a "concatenated volume". The theory > being the hardware is intelligent enough to use the cache to increase > throughput. It actually works as they claim. Never did explain why > it wasn't a single point of failure in the end though. > > My hardware was 3 4810's, each with 4 - 8 cpus, each with 4 - 8GB, > 2 - 4 bricks per machine > > First insist that multiple bricks be mounted on at least 2 mount > points. > (D2 and D3). DO NOT USE the forcedirectio option. I don't know why but > I have been unable to take less than a 40% throughput hit with it > turned on. And I don't care what other people say, no matter how > much respect I have for them > > Insist on at least one JBOD for oracle binaries and configs > Insist on at least one JBOD for redo logs (D1) > > This a bare minimum. > > One set of redo on D1 > One set of redo on D2 > Archive logs, Rollback and Temp on D3 > All data files where needed on D2 > > > Next Level up > > Add another JBOD for redo and move redo on to it > Move Rollback and Temp to D2 > > At this point to get more throughput you have to take the > JBOD to raw devices. Or try forcedirectio on these devices :) > > If even better performance is needed, more JBOD, for rollback and redo. > If more disk spaces is needed, get another brick. > > Which leads me to the recent discussion on "proper way to tune" > > Huh????? Why make it so complex? > > Tuning from a blue collar DBA perspective: > > Assess the machine first > > No matter what your ratios or what your waiting for: > > sar to see if the machine is ever pinned > vmstat to see your queues and paging > iostat to see disk activity > top at timed intervals to catch rogue jobs > > read your logs and config files > > Then talk to the users > Is the "system slow" or is it specific jobs?" > > log on run ratio reports and query v$system_event > > Any ratio that is out of range needs to be tuned: > > Especially disk sorts to memory sorts > > For the infamous buffer cache ratio: > < 10% throw memory at it > > 97% take memory away > > For wait states here's a quick drive through for those who look at > the number and say "Yeah but what do they mean" > > Time Wait Total Time Average > Event # Waits Timeout In > Hndrds Time > -------------------------- --------- ------- ---------------------- > SQL*Net more data to client ######### 0 > 680421 .005 > SQL*Net message to client ######### 0 17590 > 0.000 > SQL*Net message from client ######### 0 3953399703 > 35.511 > > - These are all communication to the client. ignore > > db file sequential read 39562523 0 > 12300885 .311 > > - Data read, 0.0003 seconds average wait, ignore. This number will climb > if > - IO is bottlenecked or inappropriate (ie using FTS for joins) > > rdbms ipc message 12440441 ####### 2774129387 > 222.993 > > - Internal machine communication ignore > > db file scattered read 12264223 0 > 6202885 .506 > > - Data read, 0.0005 seconds average wait, ignore. This is higher due to > type of read. > - Increase in this time indicates an IO bottleneck > > log file parallel write 4724477 67 > 2212249 .468 > log file sequential read 2097709 0 > 1712615 .816 > > - Redo logs, with 2 pure raw JBOD I have got this down to about 0.25 > hundredths > > buffer busy waits 1548548 0 > 408235 .264 > > - Memory latch contention 0.0002 seconds ignore > - If Timeout or average increase, need to determine why contention is > increasing > > control file parallel write 669234 0 > 376491 .563 > pmon timer 662092 662074 203382329 > 307.181 > > - Internal waits ignore > > direct path read 573442 0 > 423920 .739 > > - Reads from tempfiles (sorting). Each segment is 10M in this db so > ignore. > > log file sync 551716 15 > 459036 .832 > > - See redo above. > > db file parallel write 201166 0 610793 > 3.036 > > - writing updates and inserts 0.003 seconds ignore > > undo segment extension 100516 100507 27 > 0.000 > > - Don't know what this is, hope it's not critical ;) > > SQL*Net break/reset to client 92904 0 > 9522 .102 > > - Client communication ignore > > LGWR wait for redo copy 92844 7 > 736 .008 > > - Affected by the archiver not keeping up. The alert log and an infamous > ratio > - are better ways of detecting this. > > file open 76910 0 > 1874 .024 > > - note the number of occurances is getting very small, average time is > low, ignore > > direct path write 69706 0 1408596 > 20.208 > > - writes for sorts, even though average time is high, research indicates > that > - the client does not wait for this, so internal ignore > > > SQL*Net message to dblink 48680 0 7 > 0.000 > SQL*Net message from dblink 48680 0 108414 > 2.227 > > - network or machine dependant ignore > > control file sequential read 45198 0 > 31664 .701 > latch free 44849 30305 > 28693 .640 > > - Memory latch contention, notice rate of Timeout to # of waits > - If average time increases, need to determine why contention is > increasing > - QUICKLY > > SQL*Net more data from client 43851 0 229528 > 5.234 > enqueue 19946 19380 5973595 > 299.488 > > - I used to worry but ...., nothing ever happened so ignore > > - And the rest happen so infrequently ignore > file identify 6961 0 > 496 .071 > smon timer 6639 6612 203366288 > 30632.066 > log file single write 2770 0 2998 > 1.082 > rdbms ipc reply 1290 0 2302 > 1.784 > log buffer space 1104 1 5507 > 4.988 > db file single write 924 0 > 162 .175 > log file switch completion 743 8 17454 > 23.491 > refresh controlfile command 722 0 > 413 .572 > pipe get 377 213 81693 > 216.692 > library cache pin 316 0 > 152 .481 > control file single write 231 0 > 164 .710 > library cache load lock 72 6 877 > 26.069 > single-task message 68 0 > 49 .721 > switch logfile command 45 0 815 > 18.111 > process startup 16 0 94 > 5.875 > SQL*Net more data from dblink 16 0 0 > 0.000 > row cache lock 5 0 0 > 0.000 > db file parallel read 3 0 7 > 2.333 > instance state change 2 0 0 > 0.000 > Null event 1 1 410 > 410.000 > reliable message 1 0 0 > 0.000 > sort segment request 1 1 103 > 103.000 > > > And the more you study your database the more you will understand of the > above :) > > After you are aware of your systems problems, fix your config files and > file positions and then chase down SQL issues. > > >From your users, capture the SQL run explain plan > Run top, catch processes that use a full cpu for more than 30 seconds > Capture the sql, run explain plan > > > > I have always ogled women. When I got married, (well started going out) > I explained to my wife that I was making sure I had the best. > > But really, she's a good wife, > I'm even allowed to have opinions. If she approves of them I'm allowed > to have them :) > > > The digest hit 983K on Friday, if I'm kind, 100K was content. > > >From the titles I see that there were > performance problems with partitioned tables and bitmap indices? > > I can't help those who won't help themselves. > And I don't receive HTML email. > > TTFN > > Off to figure out the relationship between multiblock_read_count and > those index_optimizer thingies > > Dave > > -- > Dave Morgan > [EMAIL PROTECTED] > 403 399 2442 > -- > Please see the official ORACLE-L FAQ: http://www.orafaq.com > -- > Author: Dave Morgan > INET: [EMAIL PROTECTED] > > Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 > San Diego, California -- Public Internet access / Mailing Lists > -------------------------------------------------------------------- > To REMOVE yourself from this mailing list, send an E-Mail message > to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in > the message BODY, include a line containing: UNSUB ORACLE-L > (or the name of mailing list you want to be removed from). You may > also send the HELP command for other information (like subscribing). -- =============================================================== Ray Stell [EMAIL PROTECTED] (540) 231-4109 KE4TJC 28^D -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Ray Stell INET: [EMAIL PROTECTED] Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).