Re: T3's, NetApps, Tuning, Wife's Opinion and other fun

Ray Stell Tue, 13 Aug 2002 07:50:11 -0700


Thanks for sharing T3 experiences, I'll have to go down this road soon,
very helpful.  I can spot a tall tale when I hear one, however, keeping
your opinions, when pigs fly!




On Mon, Aug 12, 2002 at 06:53:20PM -0800, Dave Morgan wrote:
> Hi All,
>       Well after 6 wekks of testing here is the basic way
>       to operate SUN T3's  as efficiently as possible.
> 
>       Be prepared for arguments with High priests from the cult of SAME.
> 
>       SUN T3's are fiber attached hardware RAID 5 arrays with
>       a "modern cache". The hardware engineers argue that if you need more
>       I/Os/sec just add another array as a "concatenated volume". The theory
>       being the hardware is intelligent enough to use the cache to increase 
>       throughput. It actually works as they claim. Never did explain why
>       it wasn't a single point of failure in the end though.
> 
>       My hardware was 3 4810's, each with 4 - 8 cpus, each with 4 - 8GB, 
>       2 - 4 bricks per machine
>  
>       First insist that multiple bricks be mounted on at least 2 mount
> points.
>       (D2 and D3). DO NOT USE the forcedirectio option. I don't know why but 
>       I have been unable to take less than a 40% throughput hit with it
>       turned on. And I don't care what other people say, no matter how 
>       much respect I have for them
> 
>       Insist on at least one JBOD for oracle binaries and configs
>       Insist on at least one JBOD for redo logs (D1)
> 
>       This a bare minimum. 
> 
>       One set of redo on D1
>       One set of redo on D2
>       Archive logs, Rollback and Temp on D3
>       All data files where needed on D2
> 
>       
>       Next Level up
> 
>       Add another JBOD for redo and move redo on to it
>       Move Rollback and Temp to D2
> 
>       At this point to get more throughput you have to take the
>       JBOD to raw devices. Or try forcedirectio on these devices :)
> 
>       If even better performance is needed, more JBOD, for rollback and redo.
>       If more disk spaces is needed, get another brick.
> 
>       Which leads me to the recent discussion on "proper way to tune"
> 
>       Huh????? Why make it so complex?
> 
> Tuning from a blue collar DBA perspective:
> 
>       Assess the machine first
> 
>       No matter what your ratios or what your waiting for:
> 
>       sar to see if the machine is ever pinned
>       vmstat to see your queues and paging 
>       iostat to see disk activity
>       top at timed intervals to catch rogue jobs
> 
>       read your logs and config files
> 
>       Then talk to the users
>       Is the "system slow" or is it specific jobs?"
> 
>       log on run ratio reports and query v$system_event
> 
>       Any ratio that is out of range needs to be tuned:
> 
>       Especially disk sorts to memory sorts
> 
>       For the infamous buffer cache ratio:
>               < 10% throw memory at it
>               > 97% take memory away  
>       
> For wait states here's a quick drive through for those who look at 
> the number and say "Yeah but what do they mean"
>  
> Time Wait                                                      Total Time     Average
> Event                                 # Waits  Timeout   In
> Hndrds        Time
> -------------------------- --------- ------- ----------------------
> SQL*Net more data to client           #########       0    
> 680421        .005
> SQL*Net message to client             #########       0      17590      
> 0.000
> SQL*Net message from client           #########       0  3953399703    
> 35.511
> 
> - These are all communication to the client. ignore 
> 
> db file sequential read                39562523       0  
> 12300885        .311
> 
> - Data read, 0.0003 seconds average wait, ignore. This number will climb
> if
> - IO is bottlenecked or inappropriate (ie using FTS for joins) 
> 
> rdbms ipc message                      12440441 #######  2774129387   
> 222.993
> 
> - Internal machine communication ignore
> 
> db file scattered read                 12264223       0   
> 6202885        .506
> 
> - Data read, 0.0005 seconds average wait, ignore. This is higher due to
> type of read.
> - Increase in this time indicates an IO bottleneck
> 
> log file parallel write                 4724477      67   
> 2212249        .468
> log file sequential read                2097709       0   
> 1712615        .816
> 
> - Redo logs, with 2 pure raw JBOD I have got this down to about 0.25
> hundredths
> 
> buffer busy waits                       1548548       0    
> 408235        .264
> 
> - Memory latch contention 0.0002 seconds ignore
> - If Timeout or average increase, need to determine why contention is
> increasing
> 
> control file parallel write              669234       0    
> 376491        .563
> pmon timer                               662092  662074   203382329   
> 307.181
> 
> - Internal waits ignore
> 
> direct path read                         573442       0    
> 423920        .739
> 
> - Reads from tempfiles (sorting). Each segment is 10M in this db so
> ignore.
> 
> log file sync                            551716      15    
> 459036        .832
> 
> - See redo above. 
> 
> db file parallel write                   201166       0     610793      
> 3.036
> 
> - writing updates and inserts 0.003 seconds ignore
>  
> undo segment extension                   100516  100507         27      
> 0.000
> 
> - Don't know what this is, hope it's not critical ;)
> 
> SQL*Net break/reset to client             92904       0      
> 9522        .102
> 
> - Client communication ignore
> 
> LGWR wait for redo copy                   92844       7       
> 736        .008
> 
> - Affected by the archiver not keeping up. The alert log and an infamous
> ratio
> - are better ways of detecting this.
> 
> file open                                 76910       0      
> 1874        .024
> 
> - note the number of occurances is getting very small, average time is
> low, ignore
> 
> direct path write                         69706       0     1408596    
> 20.208
> 
> - writes for sorts, even though average time is high, research indicates
> that
> - the client does not wait for this, so internal ignore
> 
> 
> SQL*Net message to dblink                 48680       0          7      
> 0.000
> SQL*Net message from dblink               48680       0     108414      
> 2.227
> 
> - network or machine dependant ignore
> 
> control file sequential read              45198       0     
> 31664        .701
> latch free                                44849   30305     
> 28693        .640
> 
> - Memory latch contention, notice rate of Timeout to # of waits  
> - If average time increases, need to determine why contention is
> increasing
> - QUICKLY
> 
> SQL*Net more data from client             43851       0     229528      
> 5.234
> enqueue                                   19946   19380     5973595   
> 299.488
> 
> - I used to worry but ...., nothing ever happened so ignore
> 
> - And the rest happen so infrequently ignore
> file identify                              6961       0       
> 496        .071
> smon timer                                 6639    6612   203366288 
> 30632.066
> log file single write                      2770       0       2998      
> 1.082
> rdbms ipc reply                            1290       0       2302      
> 1.784
> log buffer space                           1104       1       5507      
> 4.988
> db file single write                        924       0       
> 162        .175
> log file switch completion                  743       8       17454    
> 23.491
> refresh controlfile command                 722       0       
> 413        .572
> pipe get                                    377     213       81693   
> 216.692
> library cache pin                           316       0       
> 152        .481
> control file single write                   231       0       
> 164        .710
> library cache load lock                      72       6        877    
> 26.069
> single-task message                          68       0        
> 49        .721
> switch logfile command                       45       0         815    
> 18.111
> process startup                              16       0         94      
> 5.875
> SQL*Net more data from dblink                16       0          0      
> 0.000
> row cache lock                                5       0          0      
> 0.000
> db file parallel read                         3       0          7      
> 2.333
> instance state change                         2       0          0      
> 0.000
> Null event                                    1       1         410   
> 410.000
> reliable message                              1       0          0      
> 0.000
> sort segment request                          1       1         103   
> 103.000
> 
>               
> And the more you study your database the more you will understand of the
> above :)
> 
> After you are aware of your systems problems, fix your config files and
> file positions and then chase down SQL issues.
> 
> >From your users, capture the SQL run explain plan
> Run top, catch processes that use a full cpu for more than 30 seconds
> Capture the sql, run explain plan
> 
> 
> 
> I have always ogled women. When I got married, (well started going out)
> I explained to my wife that I was making sure I had the best.
> 
> But really, she's a good wife,
> I'm even allowed to have opinions. If she approves of them I'm allowed
> to have them :)
> 
> 
> The digest hit 983K on Friday, if I'm kind, 100K was content.
> 
> >From the titles I see that there were
> performance problems with partitioned tables and bitmap indices?
> 
> I can't help those who won't help themselves.
> And I don't receive HTML email.
> 
> TTFN
> 
> Off to figure out the relationship between multiblock_read_count and
> those index_optimizer thingies
> 
> Dave
>       
> -- 
> Dave Morgan
> [EMAIL PROTECTED]
> 403 399 2442
> -- 
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> -- 
> Author: Dave Morgan
>   INET: [EMAIL PROTECTED]
> 
> Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
> San Diego, California        -- Public Internet access / Mailing Lists
> --------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from).  You may
> also send the HELP command for other information (like subscribing).

-- 
===============================================================
Ray Stell   [EMAIL PROTECTED]     (540) 231-4109     KE4TJC    28^D
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Ray Stell
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Re: T3's, NetApps, Tuning, Wife's Opinion and other fun

Reply via email to