[firebird-support] Re[2]: [FB 2.1] Firebird engine seems to slow down on high load without utilizing hardware

2016-04-13 Thread Alexey Kovyazin (ak) a...@ib-aid.com [firebird-support]

Patrick,
Definetely you need to compare real life loads.
--
Regards,
Alexey Kovyazin
IBSurgeon среда, 13 апреля 2016г., 04:47 +03:00 от " thetr...@yahoo.com 
[firebird-support]" < firebird-support@yahoogroups.com> :

> 
>
>Hey Alexey,
>thanks you for our input. I think what you say is correct, and we reviewed our 
>disk setup again.
>We are utilizing mechnical discs so it's kinda hard to compare SSD performance 
>to them.
>But they should provide enought IOPS for our load.
>
>Unfortunatly we can't just switch to a single SSD, since we would loose 
>replication and failover systems the SAN provides which is a critical demand 
>for us. I'm afraid for now we have to stick with it, until we have some facts 
>to proof that the SAN Setup is our limiting factor. And data is not should 
>that for me currently.
>
>On a sidenode, we do own a server with SSD setup, but in tests we couldn't get 
>a noticable performance gain through increasement of IOs this way. (tests was 
>generic and not real world load unfortunatly)
>
>Best Regards,
>Patrick
>
>---In firebird-support@yahoogroups.com,  wrote :
>
>Hi Patrick,
>
>If you say that problem occurred recently, I would suggest you to
check SAN disks health.
>
>However, these values
>>> Average system IOPS under load read: 100
>>>Average system IOPS under load write: 550
>>> Backup Restore IOPS read: 1700
>>> Backup Restore IOPS write: 250
are really, really low. 
>1700 IOPS for the database with 4k page means 6.8Mb/sec (in case
of random reads).
>
>I suggest to install a single SSD drive and check how it will
work.
>SSD IOPS looks like
>  Random Read 4KB (QD=32) :   283.050 MB/s [ 69104.0 IOPS]
>  Random Write 4KB (QD=32) :   213.837 MB/s [ 52206.2 IOPS]
>
>
>From our optimization practice we found that if you need to
optimize only the single instance of the database, the most cost
effective way is to upgrade to SSD first, and only then fix other
problems.
>
>Regards,
>Alexey Kovyazin
>IBSurgeon HQbird  www.ib-aid.com
>
>
>
>>> 
>>>hi,
>>>recently we had some strange performance issues with our
Firebird DB server.
>>>On high load, our server started to slow down. Select and
update SQL query times did go up by more than 500% on
average,
>>>but reaching unreasonable high execution times at worst
case. (several minutes instead of < 1sec)
>>>
>>>OIT/OAT/Next Transaction statistics was within 1000 the
hole time
>>>We were not able to messure any hardware limiting factor.
Indeed, this system was running with only 8 cores at about
70% CPU usage on max. load.
>>>We decided that this may be our problem since we
experienced a similar problem at about 80% CPU load in the
past.
>>>So we upgraded the hardware. As expected, the CPU-load
dropped to ~35% usage on max. load scenario.
>>>But this did not solve the problem.
>>>Same story for the harddisk system. The usage is not even
near it's max capacity.
>>>
>>>We also can't see any impact on the harddisk.
>>>We'r kind of stuck with our ideas, because we have no
idea what could be a potential bottleneck to the system.
>>>Since the hardware doesn't show a limit, there have to be
anything else - most likely firebird engine related that's
limiting our system.
>>>We would be very grateful if anyone can give us hints
where we can search further.
>>>Or someone has similar experiences to share with us.
>>>
>>>
>>>Operating System: Windows Server 2003
>>>Firebird: 2.1.5 Classic
>>>Dedicated database server (VMWare)
>>>
>>>CPU: 16 cores, each 2.4 GHz
>>>RAM: 32 GB
>>>About
14GB are used from OS and firebird processes under max
load.
>>>HDD: SAN Storage System
>>>
>>>Average
system IOPS under load read: 100
>>>Average
system IOPS under load write: 550
>>>Backup
Restore IOPS read: 1700
>>>Backup
Restore IOPS write: 250
>>>SAN
IPOS Limit (max): 3000
>>>
>>>Firebird Config Settings, based on defaults
>>>DefaultDbCachePages
= 1024
>>>LockMemSize
= 134247728
>>>LockHashSlots
= 20011
>>>Database
>>>size:
about 45 GB
>>>450
to 550 concurrent connections
>>>Daily
average of 65 transactions / second (peak should be
higher)
>>>
>>>FB_LOCK_PRINT (without any params) while system was
slowing down (~4 days uptime).
>>>I have to note, Firebird was not able to print the
complete output (stats was not cropped by me)
>>>
>>>LOCK_HEADER BLOCK
>>>Version:
16, Active owner:      0, Length: 134247728, Used:
82169316
>>>Semmask:
0x0, Flags: 0x0001
>>>Enqs:
4211018659, Converts: 10050437, Rejects: 9115488, Blocks:
105409192
>>>Deadlock
scans:   1049, Deadlocks:      0, Scan interval:  10
>>>Acquires:
4723416170, Acquire blocks: 640857597, Spin count:   0
>>>Mutex
wait: 13.6%
>>>Hash
slots: 15077, Hash lengths (min/avg/max):    3/  12/  25
>>>Remove
node:      0, Insert queue:     36, Insert prior: 74815332
>>>Owners
(456): forward:
131316, backward: 14899392
>>>Free
owners (9): forward:
39711576, backward: 49867232
>>>Free
locks (42409): forward:
65924212, backward: 23319052
>>>
>>>With best Regards,
>>>
>>>Patrick Friessnegg
>>>Synesc GmbH
>

[firebird-support] Re[2]: [FB 2.1] Firebird engine seems to slow down on high load without utilizing hardware

2016-04-13 Thread Alexey Kovyazin (ak) a...@ib-aid.com [firebird-support]

Hi,
You wrote:
>The thing is, sure this numbers look really low. But >the system never uses 
>it. The monitoring of the >SAN show's that this load's are never used
You are confusing the reason and the result.
Monitoring shows low numbers because spinning drives cannot provide fast enough 
random reads. 
>From Sintatica, every 20 Minutes a Peak in GC for >~15.000 transactions
Sinatica uses really strange terms to describe transactions behaviour, and 
since it is an abandonen tool, nobody can explain how they are aligned with 
real situation.
--
Regards,
Alexey Kovyazin
IBSurgeon среда, 13 апреля 2016г., 05:08 +03:00 от " thetr...@yahoo.com 
[firebird-support]" < firebird-support@yahoogroups.com> :

> 
>Hey Thomas,
>thanks for your extensive reply.
>Unfortunatly we'r still bound to some old 32bit UDF functionality which we 
>can't get in 64bit. 
>I think you know about the use of SuperClassic with 32bit Server - 2GB RAM 
>Limit :)
>It's not impossible, but also not really a fast route we can go. But for sure 
>again a reason to talk about moving the switch to 2.5.
>
>We did ran some some disk IO benchmarks (with AS SSD) today, and in times of 
>SSD kinda depressing :D
>The thing is, sure this numbers look really low. But the system never uses it. 
>The monitoring of the SAN show's that this load's are never used. The 
>Single-4k-read is worring me, but i lean towards that our 500 proceses are 
>more like the 64-thread test. But even then, we only messured 100 Iops reading 
>on livesystem.
>
>Sequential Read speed: ~ 450 MB / s
>Sequential Write speed: ~500 MB / s
>4k read: 196 Iops
>4k write: 1376 Iops
>4k-64 thread read: 15945 Iops
>4k-64 thread write: 7361 Iops
>
>
>Garbage Info still needs to be collected.
>But first signs show that this indeed could be a potential problem.
>From Sintatica, every 20 Minutes a Peak in GC for ~15.000 transactions. This 
>get's fixed by the server in the relative small amount of time (i think < 1 
>minute), since it's really only a single peak in the graph everytime.
>When the GC stop increasing and the server starts to collect it, we see an 
>increase of concurrent running transactions (= transactions are longer open 
>and processed slower).
>
>We don't have data from the live system yet to see if this behaviour kind of 
>"snowballs" when there is really high load on the server.
>
>Best Regards,
>
>---In firebird-support@yahoogroups.com,  wrote :
>
>Hi Patrick,
>
>>> Hi Thomas, nice to get a response from you. We already met in ~2010 in Linz 
>>> at
>>> your office :)
>>> (ex. SEM GmbH, later Playmonitor GmbH)
>>
>I know. XING (Big Brother) is watching you. Nice to see that you are still 
>running with Firebird. ;-)
>
>
>>> First, sorry for posting a mixed state of informations. The config settings 
>>> i
>>> postet are the current settings.
>>> But the Lock-Table-Header was from last saturday (day of total system 
>>> crash) -
>>> we changed Hash Slot Value since than, but it didn't work. New Table looks
>>> like:
>>> 
>>> 
>>> LOCK_HEADER BLOCK
>>> Version: 16, Active owner:  0, Length: 134247728, Used: 55790260
>>> Semmask: 0x0, Flags: 0x0001
>>> Enqs: 1806423519, Converts: 4553851, Rejects: 5134185, Blocks: 56585419
>>> Deadlock scans: 82, Deadlocks:  0, Scan interval:  10
>>> Acquires: 2058846891, Acquire blocks: 321584126, Spin count:   0
>>> Mutex wait: 15.6%
>>> Hash slots: 20011, Hash lengths (min/avg/max):0/   7/  18
>>> Remove node:  0, Insert queue:  0, Insert prior:  0
>>> Owners (297): forward: 385160, backward: 38086352
>>> Free owners (43): forward: 52978748, backward: 20505128
>>> Free locks (41802): forward: 180712, backward: 3620136
>>> Free requests (-1097572396): forward: 46948676, backward: 13681252
>>> Lock Ordering: Enabled
>>> 
>>> 
>>> The Min/Avg/Max hash lengths look better now, but as you mentioned the Mutex
>>> wait is worring us too.
>>> We have 2 direct questions about that.
>>> 
>>> 
>>> 1) What are the negative effects of increasing Hash-Slots (too high)?
>>
>It somehow defines the initial size of a hash table which is used for lock(ed) 
>object lookup by a key (= hash value), ideally with constant O(1) run-time 
>complexity. If the hash table is too small, due to a too small value for hash 
>slots, it starts to degenerate into a linked/linear list per hash slot. Worst 
>case resulting in O(n) complexity for lookups. The above 20011 setting shows 
>an AVG hash length which looks fine.
>
>As you might know, Classic having a dedicated process per connection model 
>somehow needs a (global) mechanism to synchronize/protect shared data 
>structures across these processes via IPC. This is what the lock manager and 
>the lock table is used for.
>
>>> 2) As far as we know, we can't influence Mutex wait directly (it's just
>>> informational). But do you think that's the reason the underlying hardware 
>>> is
>>> not utilized?
>>
>I don't think you are disk IO bound. Means, I'm not convinced that faster IO