Re: [firebird-support] Problem with FB database that freezes

2015-07-24 Thread Mark Rotteveel m...@lawinegevaar.nl [firebird-support]
On 24-7-2015 00:23, conver...@gmail.com [firebird-support] wrote:
 Thanks for your insightful response. FWIW, I would like to mention that,
 in the same server, we have another database (same size ~7 GB) no one
 connects to, it's a restore of the production database from January this
 year. This database works perfectly even when the production database is
 down. We try only a few test connections though.

 Below is some of the requested information, at a time when the
 production database performance is normal.

 Firebird.conf:
 -

 DefaultDbCachePages = 1024
 #FileSystemCacheThreshold = 65536 (commented out)
 #FileSystemCacheSize = 0 (commented out)


 Server environment:
 --

 CPU utiliza tion: 11%
 Memory utilization: 11 GB (out of 32)

 Note.- Even when the DB performance is down, this values are in the same
 range or even lower. No swapping.

 gstat output (normal performance):
 -

 Database header page information:
   Flags   0
   Checksum  12345
   Generation  19572161
   Page size  16384
   ODS version  11.2
   Oldest transaction 18709808
   Oldest active  18953295
   Oldest snapshot  18851591
   Next transaction 19520857

The large transaction gap indicates that you have long running 
transactions, which can lead to performance problems due to garbage 
accumulation.

   Bumped transaction 1
   Sequence number  0
   Next attachment ID 50438
   Implementation ID 26
   Shadow count  0
   Page buffers  3000

This might be a bit high for Classic. This means that each connection 
can take 47 MB in cached pages. However with 32 GB available, that might 
not be that relevant.

   Next header page 0
   Database dialect 1
   Creation date  Jul 7, 2015 7:00:57
   Attributes  no reserve

As already noted by Thomas: don't use no reserve (from the gstat 
manual: All pages will be filled to 100% and will be most useful on 
read-only databases. No space is reserved in each page for updates 
and/or deletions.)

  Variable header data:
   Database backup GUID: {BF8D26E0-970E-431A-7FAD-E2D9BDB2E4DA}
   Sweep interval:  0
   *END*

 Note.- We seep the database manually each night.

 fb_lock_print output (normal performance):
 

 LOCK_HEADER BLOCK
   Version: 145, Active owner:  0, Length: 28311552, Used: 27588104
   Flags: 0x0001
   Enqs: 69364533, Converts: 192066, Rejects:  36029, Blocks: 282250
   Deadlock scans:  7, Deadlocks:  0, Scan interval:  10
   Acquires: 77720068, Acquire blocks: 2159883, Spin count:   0
   Mutex wait: 2.8%
   Hash slots: 1009, Hash lengths (min/avg/max):   51/ 66/  81
   Remove node:  0, Insert queue:  0, Insert prior:  0
   Owners (145): forward: 441288, backward:  98120
   Free owners (11): forward: 24695928, backward: 23070064
   Free locks (2963): forward:  22024, backward: 27499760
   Free requests (42905): forward: 22145288, backward: 25253392
   Lock Ordering: Enabled

You need to increase the value of LockHashSlots in firebird.conf as the 
hash length is rather long.

 Firebird.log (IBMCASA is the server's host name)
 --

 The log is literally FULL of 10053 and 10054 error entries like the
 following:

 IBMCASA Thu Jul 23 10:27:27 2015
   Unable to complete network request to host IBMCASA.
   Error writing data to the connection.


 IBMCASA Thu Jul 23 10:27:29 2015
   Unable to complete network r equest to host IBMCASA.
   Error reading data from the connection.


 IBMCASA Thu Jul 23 10:27:30 2015
   INET/inet_error: read errno = 10054


 According to the log, this errors seems to be happening every second or
 every few seconds/minutes, since March 8 2014 and until today even as
 I'm writing this. Each day, this errors stop at 11:49 PM when the last
 users stop working on the client apps, then they'll start again every
 morning at 6:00 AM when the first client apps connect to the database.

Error 10054 is connection reset by peer, it means that the connection 
was terminated without properly signalling a connection close to the 
server. This might indicate a problem in the application: not properly 
closing connections, or applications being closed/killed before the 
connection could be closed properly. Combined with Error 10053 it might 
mean that you are also using events and that the server tries to notify 
a client of an event, when the client is no longer there.

It would still be interesting to see the values when there is a 
performance problem.

Mark
-- 
Mark Rotteveel


Re: [firebird-support] Problem with FB database that freezes

2015-07-24 Thread conver...@gmail.com [firebird-support]
Hi there,
 

 Can the connection errors in the log contribute to the increasing gap between 
the OAT and NT?
 

 Regards,
 

 -Eduardo



Re: [firebird-support] Problem with FB database that freezes

2015-07-24 Thread conver...@gmail.com [firebird-support]
snip
 As you are using 
 FB TraceManager, it is a simple mouse-click via the context-menu in the 
 parsed gstat output to locate the OAT in the monitoring tables.
 

 Thanks Thomas. I've located the OAT in the monitoring tables as you mention. 
Can you please elaborate a bit on how to find the client application or process 
that started this transaction? Is this something FBTM can help us with?
 

 Regards,
 

 -Eduardo




Re: [firebird-support] Problem with FB database that freezes

2015-07-24 Thread Fabiano Kureck - Desenvolvimento SCI fabi...@sci10.com.br [firebird-support]
Take at look at the HDD usage. Is HDD been used around 100% when slowly 
appears?


On 24/07/2015 16:00, conver...@gmail.com [firebird-support] wrote:


Thanks Thomas. Today we had a performance problem about 3 hours ago, 
we had to reboot the Windows server to solve it. Prior to the restart, 
the OAT-NT gap was 120479. After the restart it was 33.



Interestingly enough, currently the gap is 354354 as I write this. 
That's almost three times the gap we had at the time of the restart, 
and performance has been normal since we restarted.



Thanks everyone for the help, any other pointers are much welcomed.


Best Regards,


-Eduardo






Re: [firebird-support] Problem with FB database that freezes

2015-07-24 Thread conver...@gmail.com [firebird-support]
Thanks Thomas. Today we had a performance problem about 3 hours ago, we had to 
reboot the Windows server to solve it. Prior to the restart, the OAT-NT gap was 
120479. After the restart it was 33.
 

 Interestingly enough, currently the gap is 354354 as I write this. That's 
almost three times the gap we had at the time of the restart, and performance 
has been normal since we restarted. 
 

 Thanks everyone for the help, any other pointers are much welcomed.
 

 Best Regards,
 

 -Eduardo


Re: [firebird-support] Problem with FB database that freezes

2015-07-24 Thread Alexey Kovyazin a...@ib-aid.com [firebird-support]

Hi,


OAT is an active transaction. You can see it alive if you analyze MON$ 
snaphot (you can use trial of our FBMonLogger).


So, disconnects are not related with Next-OAT gap - because this gap is 
caused by some open transaction which you can easily identify.


However, disconnects can lead to forced rollback, and as a result, OIT 
could stuck, and it will lead to increased gap OIT-OST, which is also 
not good.


For more details about OIT, OST and OAT read this ppt
http://www.slideshare.net/ibsurgeon/3-how-transactionswork

Regards,
Alexey Kovyazin
IBSurgeon





Hi there,


Can the connection errors in the log contribute to the increasing gap 
between the OAT and NT?



Regards,


-Eduardo






RE: [firebird-support] Problem with FB database that freezes

2015-07-23 Thread 'Leyne, Sean' s...@broadviewsoftware.com [firebird-support]


 gstat output (normal performance):
 -

  Oldest transaction 18709808
  Oldest active  18953295
  Oldest snapshot  18851591
  Next transaction 19520857

The large difference in Oldest Active and Next Transaction suggests that your 
problem is with garbage collection which is creating a growing backlog of pages 
that need to be read to find the correct record versions.

You have a process which is not committing transactions which you need to find, 
using the MON$Transactions and MON$Attachments tables.


Sean



Re: [firebird-support] Problem with FB database that freezes

2015-07-23 Thread Mark Rotteveel m...@lawinegevaar.nl [firebird-support]
On 23-7-2015 21:07, conver...@gmail.com [firebird-support] wrote:
 Hi there, we have a 7 GB Firebird 2.5 database, running on FB Classic
 2.5.2 64 bits on a Windows Server. This is a 64 bits Windows server with
 24 CPU cores and 32 GB of RAM.

 When the client connection count surpasses 200, the database starts
 freezing, I.E., opening a connection with IBExpert can take up to 5
 minutes, and you can see the Loading Views..., Loading Tables...
 etc. events happening in slow motion in the IBExpert window.

What is the value of DefaultDbCachePages, FileSystemCacheThreshold, and 
FileSystemCacheSize in firebird.conf

What is the output of gstat -h and fb_lock_print for your database at 
the time of the problem? Does the firebird.log have anything interesting 
around this time?

Is CPU utilization high or low, is the server swapping (running out of 
physical memory)?

 We instruct the users to close client apps and when the connection count
 drops to 150 or so, the database breathes in and the performance goes
 back to normal.

What is the output of gstat -h and fb_lock_print after normal 
performance has been restored?

 Is there a limit of client connections for Firebird?

No for classic except when you reach OS limits (memory, threads, 
processes, etc) (there is a limit of 1024 for SuperServer and 
SuperClassic in 2.5.3 and earlier, 2048 since 2.5.4)


 Right now we are puzzled, any pointers / recommendations are much
 appreciate d.

Consider upgrading to 2.5.4; a lot of bugs have been fixed since 2.5.2 
(including security problems). I don't know whether it will actually fix 
your problem though.

Mark
-- 
Mark Rotteveel


Re: [firebird-support] Problem with FB database that freezes

2015-07-23 Thread Thomas Steinmaurer t...@iblogmanager.com [firebird-support]
Hi again,

 Hi Eduardo,

 [snip]

 Firebird.conf:

 -

 DefaultDbCachePages = 1024

 #FileSystemCacheThreshold = 65536 (commented out)

 #FileSystemCacheSize = 0 (commented out)


 Server environment:

 --

 CPU utilization: 11%

 Memory utilization: 11 GB (out of 32)


 Note.- Even when the DB performance is down, this values are in the same
 range or even lower. No swapping.


 gstat output (normal performance):

 -

 Database header page information:
Flags   0
Checksum  12345
Generation  19572161
Page size  16384
ODS version  11.2
Oldest transaction 18709808
Oldest active  18953295
Oldest snapshot  18851591
Next transaction 19520857
Bumped transaction 1
Sequence number  0
Next attachment ID 50438
Implementation ID 26
Shadow count  0
Page buffers  3000

 Possibly a bit high for Classic, but might be ok. You might increase
 that to e.g. 2048 and spend more RAM on the temporary storage module for
 ordering/group by results etc.


Next header page 0
Database dialect 1
Creation date  Jul 7, 2015 7:00:57
Attributes  no reserve

Btw, it is not a good idea to run a read/write production database with 
the no reserve option.



-- 
With regards,
Thomas Steinmaurer
http://www.upscene.com/

Professional Tools and Services for Firebird
FB TraceManager, IB LogManager, Database Health Check, Tuning etc.


Re: [firebird-support] Problem with FB database that freezes

2015-07-23 Thread Thomas Steinmaurer t...@iblogmanager.com [firebird-support]
Hi Eduardo,

[snip]

 Firebird.conf:

 -

 DefaultDbCachePages = 1024

 #FileSystemCacheThreshold = 65536 (commented out)

 #FileSystemCacheSize = 0 (commented out)


 Server environment:

 --

 CPU utilization: 11%

 Memory utilization: 11 GB (out of 32)


 Note.- Even when the DB performance is down, this values are in the same
 range or even lower. No swapping.


 gstat output (normal performance):

 -

 Database header page information:
   Flags   0
   Checksum  12345
   Generation  19572161
   Page size  16384
   ODS version  11.2
   Oldest transaction 18709808
   Oldest active  18953295
   Oldest snapshot  18851591
   Next transaction 19520857
   Bumped transaction 1
   Sequence number  0
   Next attachment ID 50438
   Implementation ID 26
   Shadow count  0
   Page buffers  3000

Possibly a bit high for Classic, but might be ok. You might increase 
that to e.g. 2048 and spend more RAM on the temporary storage module for 
ordering/group by results etc.


   Next header page 0
   Database dialect 1
   Creation date  Jul 7, 2015 7:00:57
   Attributes  no reserve

  Variable header data:
   Database backup GUID: {BF8D26E0-970E-431A-7FAD-E2D9BDB2E4DA}
   Sweep interval:  0
   *END*

You have a long-running active transaction, cause the gap between Next 
Transaction - Oldest Active is quite high. As Sean mentioned, use the 
monitoring tables to identify the long-running process. As you are using 
FB TraceManager, it is a simple mouse-click via the context-menu in the 
parsed gstat output to locate the OAT in the monitoring tables.


 Note.- We seep the database manually each night.

Do transaction counters move on with that?



 fb_lock_print output (normal performance):

 

 LOCK_HEADER BLOCK
   Version: 145, Active owner:  0, Length: 28311552, Used: 27588104
   Flags: 0x0001
   Enqs: 69364533, Converts: 192066, Rejects:  36029, Blocks: 282250
   Deadlock scans:  7, Deadlocks:  0, Scan interval:  10
   Acquires: 77720068, Acquire blocks: 2159883, Spin count:   0
   Mutex wait: 2.8%
   Hash slots: 1009, Hash lengths (min/avg/max):   51/  66/  81
   Remove node:  0, Insert queue:  0, Insert prior:  0
   Owners (145): forward: 441288, backward:  98120
   Free owners (11): forward: 24695928, backward: 23070064
   Free locks (2963): forward:  22024, backward: 27499760
   Free requests (42905): forward: 22145288, backward: 25253392
   Lock Ordering: Enabled

You have very high hash lengths values (51-81) running with the default 
hash slots value of 1009. Increase it to at least 10009 or even higher 
in firebird.conf. It is recommended that the value is a prime number. 
AVG hash lengths should not be larger than ~ 20.


-- 
With regards,
Thomas Steinmaurer
http://www.upscene.com/

Professional Tools and Services for Firebird
FB TraceManager, IB LogManager, Database Health Check, Tuning etc.


 Firebird.log (IBMCASA is the server's host name)

 --

 The log is literally FULL of 10053 and 10054 error entries like the
 following:


 IBMCASA Thu Jul 23 10:27:27 2015
   Unable to complete network request to host IBMCASA.
   Error writing data to the connection.


 IBMCASA Thu Jul 23 10:27:29 2015
   Unable to complete network request to host IBMCASA.
   Error reading data from the connection.


 IBMCASA Thu Jul 23 10:27:30 2015
   INET/inet_error: read errno = 10054


 According to the log, this errors seems to be happening every second or
 every few seconds/minutes, since March 8 2014 and until today even as
 I'm writing this. Each day, this errors stop at 11:49 PM when the last
 users stop working on the client apps, then they'll start again every
 morning at 6:00 AM when the first client apps connect to the database.



 





Re: [firebird-support] Problem with FB database that freezes

2015-07-23 Thread conver...@gmail.com [firebird-support]
Hi Mark,
 

 Thanks for your insightful response. FWIW, I would like to mention that, in 
the same server, we have another database (same size ~7 GB) no one connects to, 
it's a restore of the production database from January this year. This database 
works perfectly even when the production database is down. We try only a few 
test connections though.
 

 Below is some of the requested information, at a time when the production 
database performance is normal.
 

 I beg you to please read it until the end. You might have nailed something.
 

 Thanks again. Hope to hear from you soon,
 

 -Eduardo
 

 Firebird.conf:
 -
 DefaultDbCachePages = 1024
 #FileSystemCacheThreshold = 65536 (commented out)
 #FileSystemCacheSize = 0 (commented out)
 

 Server environment:
 --
 CPU utilization: 11%
 Memory utilization: 11 GB (out of 32)
 

 Note.- Even when the DB performance is down, this values are in the same range 
or even lower. No swapping.
 

 gstat output (normal performance):
 -
 Database header page information:
 Flags   0
 Checksum  12345
 Generation  19572161
 Page size  16384
 ODS version  11.2
 Oldest transaction 18709808
 Oldest active  18953295
 Oldest snapshot  18851591
 Next transaction 19520857
 Bumped transaction 1
 Sequence number  0
 Next attachment ID 50438
 Implementation ID 26
 Shadow count  0
 Page buffers  3000
 Next header page 0
 Database dialect 1
 Creation date  Jul 7, 2015 7:00:57
 Attributes  no reserve
 Variable header data:
 Database backup GUID: {BF8D26E0-970E-431A-7FAD-E2D9BDB2E4DA}
 Sweep interval:  0
 *END*
 

 Note.- We seep the database manually each night.
 

 fb_lock_print output (normal performance):
 
 LOCK_HEADER BLOCK
 Version: 145, Active owner:  0, Length: 28311552, Used: 27588104
 Flags: 0x0001
 Enqs: 69364533, Converts: 192066, Rejects:  36029, Blocks: 282250
 Deadlock scans:  7, Deadlocks:  0, Scan interval:  10
 Acquires: 77720068, Acquire blocks: 2159883, Spin count:   0
 Mutex wait: 2.8%
 Hash slots: 1009, Hash lengths (min/avg/max):   51/  66/  81
 Remove node:  0, Insert queue:  0, Insert prior:  0
 Owners (145): forward: 441288, backward:  98120
 Free owners (11): forward: 24695928, backward: 23070064
 Free locks (2963): forward:  22024, backward: 27499760
 Free requests (42905): forward: 22145288, backward: 25253392
 Lock Ordering: Enabled
 

 

 Firebird.log (IBMCASA is the server's host name)
 --
 The log is literally FULL of 10053 and 10054 error entries like the following:
 

 IBMCASA Thu Jul 23 10:27:27 2015
 Unable to complete network request to host IBMCASA.
 Error writing data to the connection.
 
IBMCASA Thu Jul 23 10:27:29 2015
 Unable to complete network request to host IBMCASA.
 Error reading data from the connection.
 
IBMCASA Thu Jul 23 10:27:30 2015
 INET/inet_error: read errno = 10054
 

 According to the log, this errors seems to be happening every second or every 
few seconds/minutes, since March 8 2014 and until today even as I'm writing 
this. Each day, this errors stop at 11:49 PM when the last users stop working 
on the client apps, then they'll start again every morning at 6:00 AM when the 
first client apps connect to the database.