Re: [firebird-support] Problem with FB database that freezes
On 24-7-2015 00:23, conver...@gmail.com [firebird-support] wrote: Thanks for your insightful response. FWIW, I would like to mention that, in the same server, we have another database (same size ~7 GB) no one connects to, it's a restore of the production database from January this year. This database works perfectly even when the production database is down. We try only a few test connections though. Below is some of the requested information, at a time when the production database performance is normal. Firebird.conf: - DefaultDbCachePages = 1024 #FileSystemCacheThreshold = 65536 (commented out) #FileSystemCacheSize = 0 (commented out) Server environment: -- CPU utiliza tion: 11% Memory utilization: 11 GB (out of 32) Note.- Even when the DB performance is down, this values are in the same range or even lower. No swapping. gstat output (normal performance): - Database header page information: Flags 0 Checksum 12345 Generation 19572161 Page size 16384 ODS version 11.2 Oldest transaction 18709808 Oldest active 18953295 Oldest snapshot 18851591 Next transaction 19520857 The large transaction gap indicates that you have long running transactions, which can lead to performance problems due to garbage accumulation. Bumped transaction 1 Sequence number 0 Next attachment ID 50438 Implementation ID 26 Shadow count 0 Page buffers 3000 This might be a bit high for Classic. This means that each connection can take 47 MB in cached pages. However with 32 GB available, that might not be that relevant. Next header page 0 Database dialect 1 Creation date Jul 7, 2015 7:00:57 Attributes no reserve As already noted by Thomas: don't use no reserve (from the gstat manual: All pages will be filled to 100% and will be most useful on read-only databases. No space is reserved in each page for updates and/or deletions.) Variable header data: Database backup GUID: {BF8D26E0-970E-431A-7FAD-E2D9BDB2E4DA} Sweep interval: 0 *END* Note.- We seep the database manually each night. fb_lock_print output (normal performance): LOCK_HEADER BLOCK Version: 145, Active owner: 0, Length: 28311552, Used: 27588104 Flags: 0x0001 Enqs: 69364533, Converts: 192066, Rejects: 36029, Blocks: 282250 Deadlock scans: 7, Deadlocks: 0, Scan interval: 10 Acquires: 77720068, Acquire blocks: 2159883, Spin count: 0 Mutex wait: 2.8% Hash slots: 1009, Hash lengths (min/avg/max): 51/ 66/ 81 Remove node: 0, Insert queue: 0, Insert prior: 0 Owners (145): forward: 441288, backward: 98120 Free owners (11): forward: 24695928, backward: 23070064 Free locks (2963): forward: 22024, backward: 27499760 Free requests (42905): forward: 22145288, backward: 25253392 Lock Ordering: Enabled You need to increase the value of LockHashSlots in firebird.conf as the hash length is rather long. Firebird.log (IBMCASA is the server's host name) -- The log is literally FULL of 10053 and 10054 error entries like the following: IBMCASA Thu Jul 23 10:27:27 2015 Unable to complete network request to host IBMCASA. Error writing data to the connection. IBMCASA Thu Jul 23 10:27:29 2015 Unable to complete network r equest to host IBMCASA. Error reading data from the connection. IBMCASA Thu Jul 23 10:27:30 2015 INET/inet_error: read errno = 10054 According to the log, this errors seems to be happening every second or every few seconds/minutes, since March 8 2014 and until today even as I'm writing this. Each day, this errors stop at 11:49 PM when the last users stop working on the client apps, then they'll start again every morning at 6:00 AM when the first client apps connect to the database. Error 10054 is connection reset by peer, it means that the connection was terminated without properly signalling a connection close to the server. This might indicate a problem in the application: not properly closing connections, or applications being closed/killed before the connection could be closed properly. Combined with Error 10053 it might mean that you are also using events and that the server tries to notify a client of an event, when the client is no longer there. It would still be interesting to see the values when there is a performance problem. Mark -- Mark Rotteveel
Re: [firebird-support] Problem with FB database that freezes
Hi there, Can the connection errors in the log contribute to the increasing gap between the OAT and NT? Regards, -Eduardo
Re: [firebird-support] Problem with FB database that freezes
snip As you are using FB TraceManager, it is a simple mouse-click via the context-menu in the parsed gstat output to locate the OAT in the monitoring tables. Thanks Thomas. I've located the OAT in the monitoring tables as you mention. Can you please elaborate a bit on how to find the client application or process that started this transaction? Is this something FBTM can help us with? Regards, -Eduardo
Re: [firebird-support] Problem with FB database that freezes
Take at look at the HDD usage. Is HDD been used around 100% when slowly appears? On 24/07/2015 16:00, conver...@gmail.com [firebird-support] wrote: Thanks Thomas. Today we had a performance problem about 3 hours ago, we had to reboot the Windows server to solve it. Prior to the restart, the OAT-NT gap was 120479. After the restart it was 33. Interestingly enough, currently the gap is 354354 as I write this. That's almost three times the gap we had at the time of the restart, and performance has been normal since we restarted. Thanks everyone for the help, any other pointers are much welcomed. Best Regards, -Eduardo
Re: [firebird-support] Problem with FB database that freezes
Thanks Thomas. Today we had a performance problem about 3 hours ago, we had to reboot the Windows server to solve it. Prior to the restart, the OAT-NT gap was 120479. After the restart it was 33. Interestingly enough, currently the gap is 354354 as I write this. That's almost three times the gap we had at the time of the restart, and performance has been normal since we restarted. Thanks everyone for the help, any other pointers are much welcomed. Best Regards, -Eduardo
Re: [firebird-support] Problem with FB database that freezes
Hi, OAT is an active transaction. You can see it alive if you analyze MON$ snaphot (you can use trial of our FBMonLogger). So, disconnects are not related with Next-OAT gap - because this gap is caused by some open transaction which you can easily identify. However, disconnects can lead to forced rollback, and as a result, OIT could stuck, and it will lead to increased gap OIT-OST, which is also not good. For more details about OIT, OST and OAT read this ppt http://www.slideshare.net/ibsurgeon/3-how-transactionswork Regards, Alexey Kovyazin IBSurgeon Hi there, Can the connection errors in the log contribute to the increasing gap between the OAT and NT? Regards, -Eduardo
RE: [firebird-support] Problem with FB database that freezes
gstat output (normal performance): - Oldest transaction 18709808 Oldest active 18953295 Oldest snapshot 18851591 Next transaction 19520857 The large difference in Oldest Active and Next Transaction suggests that your problem is with garbage collection which is creating a growing backlog of pages that need to be read to find the correct record versions. You have a process which is not committing transactions which you need to find, using the MON$Transactions and MON$Attachments tables. Sean
Re: [firebird-support] Problem with FB database that freezes
On 23-7-2015 21:07, conver...@gmail.com [firebird-support] wrote: Hi there, we have a 7 GB Firebird 2.5 database, running on FB Classic 2.5.2 64 bits on a Windows Server. This is a 64 bits Windows server with 24 CPU cores and 32 GB of RAM. When the client connection count surpasses 200, the database starts freezing, I.E., opening a connection with IBExpert can take up to 5 minutes, and you can see the Loading Views..., Loading Tables... etc. events happening in slow motion in the IBExpert window. What is the value of DefaultDbCachePages, FileSystemCacheThreshold, and FileSystemCacheSize in firebird.conf What is the output of gstat -h and fb_lock_print for your database at the time of the problem? Does the firebird.log have anything interesting around this time? Is CPU utilization high or low, is the server swapping (running out of physical memory)? We instruct the users to close client apps and when the connection count drops to 150 or so, the database breathes in and the performance goes back to normal. What is the output of gstat -h and fb_lock_print after normal performance has been restored? Is there a limit of client connections for Firebird? No for classic except when you reach OS limits (memory, threads, processes, etc) (there is a limit of 1024 for SuperServer and SuperClassic in 2.5.3 and earlier, 2048 since 2.5.4) Right now we are puzzled, any pointers / recommendations are much appreciate d. Consider upgrading to 2.5.4; a lot of bugs have been fixed since 2.5.2 (including security problems). I don't know whether it will actually fix your problem though. Mark -- Mark Rotteveel
Re: [firebird-support] Problem with FB database that freezes
Hi again, Hi Eduardo, [snip] Firebird.conf: - DefaultDbCachePages = 1024 #FileSystemCacheThreshold = 65536 (commented out) #FileSystemCacheSize = 0 (commented out) Server environment: -- CPU utilization: 11% Memory utilization: 11 GB (out of 32) Note.- Even when the DB performance is down, this values are in the same range or even lower. No swapping. gstat output (normal performance): - Database header page information: Flags 0 Checksum 12345 Generation 19572161 Page size 16384 ODS version 11.2 Oldest transaction 18709808 Oldest active 18953295 Oldest snapshot 18851591 Next transaction 19520857 Bumped transaction 1 Sequence number 0 Next attachment ID 50438 Implementation ID 26 Shadow count 0 Page buffers 3000 Possibly a bit high for Classic, but might be ok. You might increase that to e.g. 2048 and spend more RAM on the temporary storage module for ordering/group by results etc. Next header page 0 Database dialect 1 Creation date Jul 7, 2015 7:00:57 Attributes no reserve Btw, it is not a good idea to run a read/write production database with the no reserve option. -- With regards, Thomas Steinmaurer http://www.upscene.com/ Professional Tools and Services for Firebird FB TraceManager, IB LogManager, Database Health Check, Tuning etc.
Re: [firebird-support] Problem with FB database that freezes
Hi Eduardo, [snip] Firebird.conf: - DefaultDbCachePages = 1024 #FileSystemCacheThreshold = 65536 (commented out) #FileSystemCacheSize = 0 (commented out) Server environment: -- CPU utilization: 11% Memory utilization: 11 GB (out of 32) Note.- Even when the DB performance is down, this values are in the same range or even lower. No swapping. gstat output (normal performance): - Database header page information: Flags 0 Checksum 12345 Generation 19572161 Page size 16384 ODS version 11.2 Oldest transaction 18709808 Oldest active 18953295 Oldest snapshot 18851591 Next transaction 19520857 Bumped transaction 1 Sequence number 0 Next attachment ID 50438 Implementation ID 26 Shadow count 0 Page buffers 3000 Possibly a bit high for Classic, but might be ok. You might increase that to e.g. 2048 and spend more RAM on the temporary storage module for ordering/group by results etc. Next header page 0 Database dialect 1 Creation date Jul 7, 2015 7:00:57 Attributes no reserve Variable header data: Database backup GUID: {BF8D26E0-970E-431A-7FAD-E2D9BDB2E4DA} Sweep interval: 0 *END* You have a long-running active transaction, cause the gap between Next Transaction - Oldest Active is quite high. As Sean mentioned, use the monitoring tables to identify the long-running process. As you are using FB TraceManager, it is a simple mouse-click via the context-menu in the parsed gstat output to locate the OAT in the monitoring tables. Note.- We seep the database manually each night. Do transaction counters move on with that? fb_lock_print output (normal performance): LOCK_HEADER BLOCK Version: 145, Active owner: 0, Length: 28311552, Used: 27588104 Flags: 0x0001 Enqs: 69364533, Converts: 192066, Rejects: 36029, Blocks: 282250 Deadlock scans: 7, Deadlocks: 0, Scan interval: 10 Acquires: 77720068, Acquire blocks: 2159883, Spin count: 0 Mutex wait: 2.8% Hash slots: 1009, Hash lengths (min/avg/max): 51/ 66/ 81 Remove node: 0, Insert queue: 0, Insert prior: 0 Owners (145): forward: 441288, backward: 98120 Free owners (11): forward: 24695928, backward: 23070064 Free locks (2963): forward: 22024, backward: 27499760 Free requests (42905): forward: 22145288, backward: 25253392 Lock Ordering: Enabled You have very high hash lengths values (51-81) running with the default hash slots value of 1009. Increase it to at least 10009 or even higher in firebird.conf. It is recommended that the value is a prime number. AVG hash lengths should not be larger than ~ 20. -- With regards, Thomas Steinmaurer http://www.upscene.com/ Professional Tools and Services for Firebird FB TraceManager, IB LogManager, Database Health Check, Tuning etc. Firebird.log (IBMCASA is the server's host name) -- The log is literally FULL of 10053 and 10054 error entries like the following: IBMCASA Thu Jul 23 10:27:27 2015 Unable to complete network request to host IBMCASA. Error writing data to the connection. IBMCASA Thu Jul 23 10:27:29 2015 Unable to complete network request to host IBMCASA. Error reading data from the connection. IBMCASA Thu Jul 23 10:27:30 2015 INET/inet_error: read errno = 10054 According to the log, this errors seems to be happening every second or every few seconds/minutes, since March 8 2014 and until today even as I'm writing this. Each day, this errors stop at 11:49 PM when the last users stop working on the client apps, then they'll start again every morning at 6:00 AM when the first client apps connect to the database.
Re: [firebird-support] Problem with FB database that freezes
Hi Mark, Thanks for your insightful response. FWIW, I would like to mention that, in the same server, we have another database (same size ~7 GB) no one connects to, it's a restore of the production database from January this year. This database works perfectly even when the production database is down. We try only a few test connections though. Below is some of the requested information, at a time when the production database performance is normal. I beg you to please read it until the end. You might have nailed something. Thanks again. Hope to hear from you soon, -Eduardo Firebird.conf: - DefaultDbCachePages = 1024 #FileSystemCacheThreshold = 65536 (commented out) #FileSystemCacheSize = 0 (commented out) Server environment: -- CPU utilization: 11% Memory utilization: 11 GB (out of 32) Note.- Even when the DB performance is down, this values are in the same range or even lower. No swapping. gstat output (normal performance): - Database header page information: Flags 0 Checksum 12345 Generation 19572161 Page size 16384 ODS version 11.2 Oldest transaction 18709808 Oldest active 18953295 Oldest snapshot 18851591 Next transaction 19520857 Bumped transaction 1 Sequence number 0 Next attachment ID 50438 Implementation ID 26 Shadow count 0 Page buffers 3000 Next header page 0 Database dialect 1 Creation date Jul 7, 2015 7:00:57 Attributes no reserve Variable header data: Database backup GUID: {BF8D26E0-970E-431A-7FAD-E2D9BDB2E4DA} Sweep interval: 0 *END* Note.- We seep the database manually each night. fb_lock_print output (normal performance): LOCK_HEADER BLOCK Version: 145, Active owner: 0, Length: 28311552, Used: 27588104 Flags: 0x0001 Enqs: 69364533, Converts: 192066, Rejects: 36029, Blocks: 282250 Deadlock scans: 7, Deadlocks: 0, Scan interval: 10 Acquires: 77720068, Acquire blocks: 2159883, Spin count: 0 Mutex wait: 2.8% Hash slots: 1009, Hash lengths (min/avg/max): 51/ 66/ 81 Remove node: 0, Insert queue: 0, Insert prior: 0 Owners (145): forward: 441288, backward: 98120 Free owners (11): forward: 24695928, backward: 23070064 Free locks (2963): forward: 22024, backward: 27499760 Free requests (42905): forward: 22145288, backward: 25253392 Lock Ordering: Enabled Firebird.log (IBMCASA is the server's host name) -- The log is literally FULL of 10053 and 10054 error entries like the following: IBMCASA Thu Jul 23 10:27:27 2015 Unable to complete network request to host IBMCASA. Error writing data to the connection. IBMCASA Thu Jul 23 10:27:29 2015 Unable to complete network request to host IBMCASA. Error reading data from the connection. IBMCASA Thu Jul 23 10:27:30 2015 INET/inet_error: read errno = 10054 According to the log, this errors seems to be happening every second or every few seconds/minutes, since March 8 2014 and until today even as I'm writing this. Each day, this errors stop at 11:49 PM when the last users stop working on the client apps, then they'll start again every morning at 6:00 AM when the first client apps connect to the database.