Re: [firebird-support] Firebird Embedded corruptions
Hi, Thanks for this and sorry for my slow response. Please see my comments below. Best Regards //Jan Flyborg 2014-09-19 21:13 GMT+02:00 Ann Harrison aharri...@ibphoenix.com [firebird-support] firebird-support@yahoogroups.com: On Mon, Sep 15, 2014 at 7:41 AM, Jan Flyborg jan.pers...@gmail.com [firebird-support] firebird-support@yahoogroups.com wrote: I just made another posting where I tried to describe three different examples of things we have seen. The first was a wrong page type, which sounds like a bug that was fixed in a newer version in code that's common to all Firebird architectures. In your case, the bad page was in an index (7). If you can find the index with the bad page and recreate it, all will be well. Just as an FYI, the page types are: 0 - undefined, normally an uninitialized page and indicates a bad page pointer elsewhere; 1 - Database header page 2 - Page inventory page 3 - Transaction inventory page 4 - Pointer page 5 - Data page 6 - Index root page - contains information about each index on the table, one per table 7 - Index (B-tree) page 8 - Blob data page 9 - Generator pages That sounds very good and it seems like an upgrade to 2.5.3 will make sure that we do not see this again. The second problem (CCH_precedence: block marked. file: cch.cpp line: 4390) is more concerning - I don't remember having read a bug about it. CCH is the cache handler. A mark is the sign that a page is about to be changed. When Firebird is forced to write a page either as part of a commit or to free space in the cache, it must write out any pages that the page depends on first. That's a little obscure. Suppose that the page you're about to write has a record with a back version, and the back version is on a different page. To keep the database consistent, the page with the back version must be on disk before the page that includes a record that points to the back version. Firebird keeps a list of precedence relationships and CCH goes through them before writing a page. I think the error means that someone is currently writing to a page that's on the precedence list. That should never happen. It's interesting that the problem occurred during an alter index operation. However, the database should be fine on disk and usable after you restart Firebird. Page marks are entirely in memory. It's quite possible that I missed a bug report and this problem was fixed in a later version. If that is of any help for you, I was wrong in my original posting when I said we were using 2.5.1 (I mean that the line numbers in the exception might lead you to draw the wrong conclusion when I gave you the wrong version). We are currently using 2.5.2 and nothing else. The third problem is two records in a referencing table lack mates in the referenced table, despite a referential constraint. I have no idea how that happened, but it should be reasonably easy to fix in your database. In another posting (later than yours) Fabiano is saying that these errors are connected to bad memory chips and in the future we will instruct our users who are having this problem to run memtest86 overnight to check that the memory is physically OK. These constraints problems are actually the most common that we see. The first problem is what I would call a physical corruption - the internal structure of the database is corrupt. The second is an in-memory corruption - the disk database is OK, but the in-memory version is damaged. The third is logical corruption - the database is physically intact, but does not conform to the data rules.. Typically we fix our problems with a gfix -mend and then doing a backup restore cycle. Usually some tables then still have problems (typically foreign keys that refers to non existing primary keys), so if possible we then remove the faulty records and then it works again. Problem is that these are not my databases. I have normally no access to them since they are running in a standalone installation at our customers sites. Recently we have bundled our own homemade tool for repairing databases that our customer can use when they are experiencing problems (basically a graphical frontend for gfix), but sometimes this is not enough and the databases has to be sent to us. Gfix is pretty old and somewhat crude. IBFirstAid might give you better help on physical corruptions. Checking that there is no non-conforming data before creating constraints may help with logical corruption. Yes that would probably be a better choice for us, but we cannot bundle IBFirstAId together with our application. Will however download it and try it on files to got sent to us. Good luck (and my apologies for the late response) No need for any apologies. I am very grateful for you taking your time to help us. Another thing, what do you say about the posting above where
Re: [firebird-support] Firebird Embedded corruptions
Jan, Yes that would probably be a better choice for us, but we cannot bundle IBFirstAId together with our application. Will however download it and try it on files to got sent to us. Actually, current version of FirstAID which is available at our web site is a full version, but it requests license every time you are running recovery (not diagnose). So you can (from license and technical points of view) include FirstAID executables into your application, so user will be able to use on demand. Don't hesitate to contact our supp...@ib-aid.com for more details. Regards, Alexey Kovyazin IBSurgeon ++ Visit http://www.firebirdsql.org and click the Documentation item on the main (top) menu. Try FAQ and other links from the left-side menu there. Also search the knowledgebases at http://www.ibphoenix.com/resources/documents/ ++ Yahoo Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/firebird-support/ * Your email settings: Individual Email | Traditional * To change settings online go to: http://groups.yahoo.com/group/firebird-support/join (Yahoo! ID required) * To change settings via email: firebird-support-dig...@yahoogroups.com firebird-support-fullfeatu...@yahoogroups.com * To unsubscribe from this group, send an email to: firebird-support-unsubscr...@yahoogroups.com * Your use of Yahoo Groups is subject to: https://info.yahoo.com/legal/us/yahoo/utos/terms/
Re: [firebird-support] Firebird Embedded corruptions
On Tue, Sep 23, 2014 at 10:49 AM, Jan Flyborg jan.pers...@gmail.com [firebird-support] firebird-support@yahoogroups.com wrote: The first was a wrong page type, which sounds like a bug that was fixed in a newer version in code that's common to all Firebird architectures. In your case, the bad page was in an index (7). If you can find the index with the bad page and recreate it, all will be well. That sounds very good and it seems like an upgrade to 2.5.3 will make sure that we do not see this again. Anytime your users get an error of of the form wrong page type, expected 7 encountered n, you can probably work with them to identify and rebuild the bad index. The second problem (CCH_precedence: block marked. file: cch.cpp line: 4390) is more concerning If that is of any help for you, I was wrong in my original posting when I said we were using 2.5.1 (I mean that the line numbers in the exception might lead you to draw the wrong conclusion when I gave you the wrong version). We are currently using 2.5.2 and nothing else. I follow bug reports but not religiously. So I searched for one that includes block marked and modify RDB$INDICES and found #4467 which is marked as will not fix and described as a user error. User errors should not cause internal cache manager problems, so I'm somewhat bemused. It was reported in 2.5.2, so it may well be your problem. The third problem is two records in a referencing table lack mates in the referenced table, despite a referential constraint. I have no idea how that happened, but it should be reasonably easy to fix in your database. In another posting (later than yours) Fabiano is saying that these errors are connected to bad memory chips and in the future we will instruct our users who are having this problem to run memtest86 overnight to check that the memory is physically OK. These constraints problems are actually the most common that we see. Clever memory problem to corrupt just the key or the constraint check. Certainly it's worth checking that the memory is OK. I'd also check that the referencing key looks generally sound. Do you add referential constraints to existing databases? A problem with broken constraints is that the error doesn't leave traces, so a reproducible case would be very helpful, but very hard to produce. Gfix is pretty old and somewhat crude. IBFirstAid might give you better help on physical corruptions. Checking that there is no non-conforming data before creating constraints may help with logical corruption. Yes that would probably be a better choice for us, but we cannot bundle IBFirstAId togethe r with our application. Will however download it and try it on files to got sent to us. The analysis tool is free - maybe your users could download it themselves to look for evidence. But it's not going to help with broken referential constraints or mangled cache precedence. Another thing, what do you say about the posting above where the theory is that Volume Shadow Copy is interfering with the database? Have you heard about that before? I'm quite sure that Volume Shadow Copy won't make good copies of an active database or any other file that's open for random writes. Whether it could corrupt the original is an open question. Lots of people claim to have seen instances where copying a database corrupts the original. And another last comment. We have bundled Firebird w ith very many installations of our product and it might be the case that what we are seeing are very rare problems, that no one else has experienced before. Do you think we should post bug reports every time we see an exception or a problem that you have not already been made aware of? Search the tracker (http://tracker.firebirdsql.org/browse) first to see if the problem has been reported. Then you might mention it on the support list to see if there's something that looks like a user error so you won't annoy the developers with stuff that the volunteers on this list could resolve. But if your getting errors with source file and line numbers, the chances are good that you've found a bug. Firebird is used pretty widely and quite heavily in many installations. However, the embedded form probably gets less stress in the world than any of the architectures, so you may be stressing something unusual. No development group, open or closed source, can fix bugs it doesn't know about. Thank you for working with Firebird on these problems. Good luck, Ann
Re: [firebird-support] Firebird Embedded corruptions
Hi, I just saw CCH mentioned and figured I`d pitch in - http://tracker.firebirdsql.org/browse/CORE-4467 . I was basically told it was most probably bad hardware but the error happens only under major loads (in short - a transacation does a million or so updates and in the mean time a few others are trying to do the same and getting a lock conflict). Haven`t had a case of that workload happening since I updated to 2.5.3 - partly luck, partly I don`t want to take the chance so I can`t give you a status on it. I can say I don`t seem to get corruptions under sane workloads (haven`t touched the hardware, updated a kernel or two in the mean time) 2014-09-19 22:28 GMT+03:00 'Fabiano - Desenvolvimento SCI' fabi...@sci10.com.br [firebird-support] firebird-support@yahoogroups.com: Ann, about the third problem: “The third problem is two records in a referencing table lack mates in the referenced table, despite a referential constraint. I have no idea how that happened, but it should be reasonably easy to fix in your database. ” I saw this happen two times, it is related to bad RAM. I thought that when Firebird writes to the memory the memory changes this contents and when the transaction commits you get a different value. We struggle with one case this week. The solution is change RAM from the server where Firebird is running. *De:* firebird-support@yahoogroups.com [mailto: firebird-support@yahoogroups.com] *Enviada em:* sexta-feira, 19 de setembro de 2014 16:14 *Para:* firebird-support@yahoogroups.com *Assunto:* Re: [firebird-support] Firebird Embedded corruptions On Mon, Sep 15, 2014 at 7:41 AM, Jan Flyborg jan.pers...@gmail.com [firebird-support] firebird-support@yahoogroups.com wrote: I just made another posting where I tried to describe three different examples of things we have seen. The first was a wrong page type, which sounds like a bug that was fixed in a newer version in code that's common to all Firebird architectures. In your case, the bad page was in an index (7). If you can find the index with the bad page and recreate it, all will be well. Just as an FYI, the page types are: 0 - undefined, normally an uninitialized page and indicates a bad page pointer elsewhere; 1 - Database header page 2 - Page inventory page 3 - Transaction inventory page 4 - Pointer page 5 - Data page 6 - Index root page - contains information about each index on the table, one per table 7 - Index (B-tree) page 8 - Blob data page 9 - Generator pages The second problem (CCH_precedence: block marked. file: cch.cpp line: 4390) is more concerning - I don't remember having read a bug about it. CCH is the cache handler. A mark is the sign that a page is about to be changed. When Firebird is forced to write a page either as part of a commit or to free space in the cache, it must write out any pages that the page depends on first. That's a little obscure. Suppose that the page you're about to write has a record with a back version, and the back version is on a different page. To keep the database consistent, the page with the back version must be on disk before the page that includes a record that points to the back version. Firebird keeps a list of precedence relationships and CCH goes through them before writing a page. I think the error means that someone is currently writing to a page that's on the precedence list. That should never happen. It's interesting that the problem occurred during an alter index operation. However, the database should be fine on disk and usable after you restart Firebird. Page marks are entirely in memory. It's quite possible that I missed a bug report and this problem was fixed in a later version. The third problem is two records in a referencing table lack mates in the referenced table, despite a referential constraint. I have no idea how that happened, but it should be reasonably easy to fix in your database. The first problem is what I would call a physical corruption - the internal structure of the database is corrupt. The second is an in-memory corruption - the disk database is OK, but the in-memory version is damaged. The third is logical corruption - the database is physically intact, but does not conform to the data rules.. Typically we fix our problems with a gfix -mend and then doing a backup restore cycle. Usually some tables then still have problems (typically foreign keys that refers to non existing primary keys), so if possible we then remove the faulty records and then it works again. Gfix is pretty old and somewhat crude. IBFirstAid might give you better help on physical corruptions. Checking that there is no non-conforming data before creating constraints may help with logical corruption. Good luck (and my apologies for the late response) Ann
Re: [firebird-support] Firebird Embedded corruptions
On Mon, Sep 15, 2014 at 7:41 AM, Jan Flyborg jan.pers...@gmail.com [firebird-support] firebird-support@yahoogroups.com wrote: I just made another posting where I tried to describe three different examples of things we have seen. The first was a wrong page type, which sounds like a bug that was fixed in a newer version in code that's common to all Firebird architectures. In your case, the bad page was in an index (7). If you can find the index with the bad page and recreate it, all will be well. Just as an FYI, the page types are: 0 - undefined, normally an uninitialized page and indicates a bad page pointer elsewhere; 1 - Database header page 2 - Page inventory page 3 - Transaction inventory page 4 - Pointer page 5 - Data page 6 - Index root page - contains information about each index on the table, one per table 7 - Index (B-tree) page 8 - Blob data page 9 - Generator pages The second problem (CCH_precedence: block marked. file: cch.cpp line: 4390) is more concerning - I don't remember having read a bug about it. CCH is the cache handler. A mark is the sign that a page is about to be changed. When Firebird is forced to write a page either as part of a commit or to free space in the cache, it must write out any pages that the page depends on first. That's a little obscure. Suppose that the page you're about to write has a record with a back version, and the back version is on a different page. To keep the database consistent, the page with the back version must be on disk before the page that includes a record that points to the back version. Firebird keeps a list of precedence relationships and CCH goes through them before writing a page. I think the error means that someone is currently writing to a page that's on the precedence list. That should never happen. It's interesting that the problem occurred during an alter index operation. However, the database should be fine on disk and usable after you restart Firebird. Page marks are entirely in memory. It's quite possible that I missed a bug report and this problem was fixed in a later version. The third problem is two records in a referencing table lack mates in the referenced table, despite a referential constraint. I have no idea how that happened, but it should be reasonably easy to fix in your database. The first problem is what I would call a physical corruption - the internal structure of the database is corrupt. The second is an in-memory corruption - the disk database is OK, but the in-memory version is damaged. The third is logical corruption - the database is physically intact, but does not conform to the data rules.. Typically we fix our problems with a gfix -mend and then doing a backup restore cycle. Usually some tables then still have problems (typically foreign keys that refers to non existing primary keys), so if possible we then remove the faulty records and then it works again. Gfix is pretty old and somewhat crude. IBFirstAid might give you better help on physical corruptions. Checking that there is no non-conforming data before creating constraints may help with logical corruption. Good luck (and my apologies for the late response) Ann
Re: [firebird-support] Firebird Embedded corruptions
Hi, First a sincere thanks to all of you for your answers. We have all kinds of different corruptions and maybe do they not have the same root cause. Here I will give you three typical examples. *Example 1* This user complained that the system had stopped. Upon further investigation the following exception was found in our logs and when we received the database it was indeed corrupted. Exception while executing job: NHibernate.Exceptions.GenericADOException: Error executing Enumerable() query[SQL: select sequencevo0_.recording_sequence_id as col_0_0_ from Recording_Sequence sequencevo0_ where not (exists (select filesequen1_.SEQ_ID from File_Seq filesequen1_, Recording_Sequence sequencevo2_ where filesequen1_.SEQ_ID=sequencevo2_.recording_sequence_id and filesequen1_.SEQ_ID=sequencevo0_.recording_sequence_id)) and sequencevo0_.StopTime@p0] --- FirebirdSql.Data.FirebirdClient.FbException: database file appears corrupt (C:\PROGRAMDATA\AXIS COMMUNICATIONS\AXIS CAMERA STATION SERVER\ACS.FDB) wrong page type page 3819 is of wrong type (expected 7, found 3) --- FirebirdSql.Data.Common.IscException: Exception of type 'FirebirdSql.Data.Common.IscException' was thrown. at FirebirdSql.Data.Client.Native.FesDatabase.ParseStatusVector(IntPtr[] statusVector) at FirebirdSql.Data.Client.Native.FesStatement.Fetch() at FirebirdSql.Data.FirebirdClient.FbCommand.Fetch() *Example 2* Here is another example of a corruption. FirebirdSql.Data.FirebirdClient.FbException (0x80004005): unsuccessful metadata update MODIFY RDB$INDICES failed internal gds software consistency check (CCH_precedence: block marked (212), file: cch.cpp line: 4390) --- unsuccessful metadata update MODIFY RDB$INDICES failed internal gds software consistency check (CCH_precedence: block marked (212), file: cch.cpp line: 4390) at FirebirdSql.Data.FirebirdClient.FbCommand.ExecuteNonQuery() *Example 3* Here is a file that got sent in from a user that complained that his system was no longer working. For this file it looks like one table in the database has two records with foreign keys that refers to non existing primary keys in another table (which we have constraints for), so how this data has entered a transaction is somewhat of a mystery to us: $ gfix -v -user SYSDBA -password masterkey acs_system_2014-06-17_16-17-33.737.fdb $ gbak -b -g -user SYSDBA -password masterkey acs_system_2014-06-17_16-17-33.737.fdb out.fbk $ gbak -c -user SYSDBA -password masterkey out.fbk restored.fdb gbak:cannot commit index FKA6F4437CE96F23CD gbak: ERROR:violation of FOREIGN KEY constraint FKA6F4437CE96F23CD on table FILE_SEQ gbak: ERROR:Foreign key reference target does not exist gbak:cannot commit index FKA6F4437C58FFBFFC gbak: ERROR:violation of FOREIGN KEY constraint FKA6F4437C58FFBFFC on table FILE_SEQ gbak: ERROR:Foreign key reference target does not exist gbak:Database is not online due to failure to activate one or more indices. gbak:Run gfix -online to bring database online without active indices. If anyone is interested I can provide you with more details or even complete database files for further investigation. We have loads of corruptions like these three. Best Regards //Jan Flyborg 2014-09-13 22:19 GMT+02:00 Alexey Kovyazin a...@ib-aid.com [firebird-support] firebird-support@yahoogroups.com: Hi Jan, You did not tell what kind of corruption you had (please provide full text of error). There are plenty of them, as well as reasons. You also could use our tool FirstAID (Direct) to analyze database on low level and see where are the problems. Regards, Alexey Kovyazin IBSurgeon (www.ib-aid.com) Hi, We have shipped Firebird Embedded bundled together with our product for a few years now and the system is currently in production at several thousand of our customer's sites. Currently we are using Firebird Embedded 2.5.1 with the latest .NET-driver and a stack consisting of Castle Active Record on top on NHibernate and the system is running on the latest versions of Windows. All is well and Firebird has served us good so far with the exception of database corruptions that gets reported from a new set of customers every week. For some of them it is possible to instruct the customer on how to repair the databases themselves, but some of the databases are unfortunately so heavily corrupted that they need to be sent to us for repairing (which is a tedious work that steals time from other tasks). Most of them corruptions are normally found in the tables that gets the most writes, but I guess that is only natural. We are now at the planning stage for the next major release of our product and we are thus rethinking if Firebird really is a good choice, because of this. Lots of effort has gone into solving this problem on our side, so I think the normal prerequisites has already been put into place (e.g using forced writes and so forth), but our system needs to be up and running 24x7, which
Re: [firebird-support] Firebird Embedded corruptions
Hi, Thanks for your answers. Please se my comments inline below. Best Regards //Jan Flyborg 2014-09-13 22:31 GMT+02:00 Svein Erling Tysvær svein.erling.tysv...@kreftregisteret.no [firebird-support] firebird-support@yahoogroups.com: Hei Jan! Hej! The one thing I try to avoid, is running DDL (CREATE, ALTER, DROP table|trigger|stored procedure) on a database in use. Maybe I'm overly careful, but not all too long ago, a colleague caused some problems when he did ALTER MyTable DROP MyField; while he simultaneously had another transaction having uncommitted changes to MyField in one record. We never do that. All our database upgrades takes place just after our middleware has started and before we give any access to the other parts of the system, so this is probably not the explanation for our problems, since there is always just one transaction running when we are modyfying the data model. I think (but have no experience), that possible reasons for corruption could include file system backups of the database while it is in use (exclude the database file(s) from such backups, rather use gbak for the backup, and include the resulting file in the system backup), Since we target a market consisting of normal end users it is hard for us to exclude our files from backups that our customers are performing. We can instruct them to do that, but we can never be sure that they follow our instructions. Also I can understand that the Firebird database files could become corrupted if you performed a (non-consistent) backup of them and then read back this backup into the production system, but we are seeing these corruptions on non backed up database files. and possibly anti-viruses preventing Firebird from doing it's work (though I would expect this to result in the database being unaccessible, not corrupted). Yes I agree. The file locking would probably not corrupt the database file, but I am by no means any expert. Another thing that's only affecting Fb 2.5.1, is that this version has an error relating to compund indices (requiring backup/restore or rebuilding such indices if upgrading to 2.5.2). Though I doubt this error would cause data corruptions involving more than the index. We are going to upgrade in any case as soon as possible, so we will see if the problem will disappaer then. Others will be able to give you a more thorough answer, despite having used Firebird since it's inception (0.9.4), I've very little experience with corruptions (undoubtedly related to only working on a handful of databases with about 20 simultaneous users). That sounds promising to us. Everyone else seems to have good success with Firebird. HTH, Set
Re: [firebird-support] Firebird Embedded corruptions
Hi, Thanks again. 2014-09-14 19:56 GMT+02:00 Ann Harrison aharri...@ibphoenix.com [firebird-support] firebird-support@yahoogroups.com: On Sat, Sep 13, 2014 at 12:22 PM, Jan Flyborg jan.pers...@gmail.com [firebird-support] firebird-support@yahoogroups.com wrote: Lots of effort has gone into solving this problem on our side, so I think the normal prerequisites has already been put into place (e.g using forced writes and so forth), but our system needs to be up and running 24x7, which means that it is not possible to schedule periodic backup/restore cycles and my personal theory is that Firebird embedded gets corrupted over time if you are not doing this regularly. Nice theory, but if the database is physically corrupt, you can't back it up, and if it's logically corrupt, you can't restore it. I think it's worth looking elsewhere for the problem. Yes you are correct. I can see that now. So I have have a few questions that I would appreciate if someone could answer: 1. Is it feasible to run Firebird Embedded 24x7 in a setup where there are no scheduled backup/restore cycles. If not, how often should this be performed to ensure that the database does not get corrupted. It should be possible to run Firebird Embedded 24x7. Without knowing what you're seeing as corruptions, it's very hard to guess why they're occurring. What errors are your customers seeing? What do they (and you) do to correct the errors? I just made another posting where I tried to describe three different examples of things we have seen. It would be really nice if you could take a look at this. 2. Most of our customers are not using a UPS. From my experiments I have not managed to create a corrupted database by turning of the power while doing a large set of writes (in a session running in VirtualBox). Could someone please confirm that this is indeed safe when you are running with synchronized writes turned on? A hard shutdown should not corrupt a database that has forced writes enabled. It might corrupt the file system, but again, without knowing what the errors and problem are, it's hard to guess. 3. Are there any operations on a live database that should be avoided to minimize the risk of corruptions? Dropping tables and altering tables to drop fields are pretty dangerous operations, but even if that is what's happening, the development group should be given a reproducible case that corrupts databases. As explained in a previous post, we never do that with other transactions running. 4. Just read a discussion about whether it is needed or not to call fb_shutdown to stop Firebird Embedded. Could this be the reason why we are getting corruptions? Should we change our service to perform this call when it is stopped? 5. I have also seen discussions of turning of automatic sweeps of the database (and doing them manually instead). Is this a likely source of corruptions for our setup? No. Sweeping the database is very much like backing it up without creating the backup file. When a sweep starts during heavy database usage, it can reduce performance but not corrupt the database. So, question back to you: what errors are you seeing and how have you fixed them? Typically we fix our problems with a gfix -mend and then doing a backup restore cycle. Usually some tables then still have problems (typically foreign keys that refers to non existing primary keys), so if possible we then remove the faulty records and then it works again. However, some database are so heavily corrupted that this strategy would give us an empty database and if that is the case we have to tell the customer to start all over again again with an empty file. Best Regards //Jan Flyborg
Re: [firebird-support] Firebird Embedded corruptions
On Sat, Sep 13, 2014 at 12:22 PM, Jan Flyborg jan.pers...@gmail.com [firebird-support] firebird-support@yahoogroups.com wrote: We have shipped Firebird Embedded bundled together with our product for a few years now and the system is currently in production at several thousand of our customer's sites... All is well and Firebird has served us good so far with the exception of database corruptions that gets reported from a new set of customers every week. We are now at the planning stage for the next major release of our product and we are thus rethinking if Firebird really is a good choice, because of this. I can understand that. Lots of effort has gone into solving this problem on our side, so I think the normal prerequisites has already been put into place (e.g using forced writes and so forth), but our system needs to be up and running 24x7, which means that it is not possible to schedule periodic backup/restore cycles and my personal theory is that Firebird embedded gets corrupted over time if you are not doing this regularly. Nice theory, but if the database is physically corrupt, you can't back it up, and if it's logically corrupt, you can't restore it. I think it's worth looking elsewhere for the problem. So I have have a few questions that I would appreciate if someone could answer: 1. Is it feasible to run Firebird Embedded 24x7 in a setup where there are no scheduled backup/restore cycles. If not, how often should this be performed to ensure that the database does not get corrupted. It should be possible to run Firebird Embedded 24x7. Without knowing what you're seeing as corruptions, it's very hard to guess why they're occurring. What errors are your customers seeing? What do they (and you) do to correct the errors? 2. Most of our customers are not using a UPS. From my experiments I have not managed to create a corrupted database by turning of the power while doing a large set of writes (in a session running in VirtualBox). Could someone please confirm that this is indeed safe when you are running with synchronized writes turned on? A hard shutdown should not corrupt a database that has forced writes enabled. It might corrupt the file system, but again, without knowing what the errors and problem are, it's hard to guess. 3. Are there any operations on a live database that should be avoided to minimize the risk of corruptions? Dropping tables and altering tables to drop fields are pretty dangerous operations, but even if that is what's happening, the development group should be given a reproducible case that corrupts databases. 4. Just read a discussion about whether it is needed or not to call fb_shutdown to stop Firebird Embedded. Could this be the reason why we are getting corruptions? Should we change our service to perform this call when it is stopped? 5. I have also seen discussions of turning of automatic sweeps of the database (and doing them manually instead). Is this a likely source of corruptions for our setup? No. Sweeping the database is very much like backing it up without creating the backup file. When a sweep starts during heavy database usage, it can reduce performance but not corrupt the database. So, question back to you: what errors are you seeing and how have you fixed them? Good luck, Ann
Re: [firebird-support] Firebird Embedded corruptions
Hi Jan, You did not tell what kind of corruption you had (please provide full text of error). There are plenty of them, as well as reasons. You also could use our tool FirstAID (Direct) to analyze database on low level and see where are the problems. Regards, Alexey Kovyazin IBSurgeon (www.ib-aid.com) Hi, We have shipped Firebird Embedded bundled together with our product for a few years now and the system is currently in production at several thousand of our customer's sites. Currently we are using Firebird Embedded 2.5.1 with the latest .NET-driver and a stack consisting of Castle Active Record on top on NHibernate and the system is running on the latest versions of Windows. All is well and Firebird has served us good so far with the exception of database corruptions that gets reported from a new set of customers every week. For some of them it is possible to instruct the customer on how to repair the databases themselves, but some of the databases are unfortunately so heavily corrupted that they need to be sent to us for repairing (which is a tedious work that steals time from other tasks). Most of them corruptions are normally found in the tables that gets the most writes, but I guess that is only natural. We are now at the planning stage for the next major release of our product and we are thus rethinking if Firebird really is a good choice, because of this. Lots of effort has gone into solving this problem on our side, so I think the normal prerequisites has already been put into place (e.g using forced writes and so forth), but our system needs to be up and running 24x7, which means that it is not possible to schedule periodic backup/restore cycles and my personal theory is that Firebird embedded gets corrupted over time if you are not doing this regularly. So I have have a few questions that I would appreciate if someone could answer: 1. Is it feasible to run Firebird Embedded 24x7 in a setup where there are no scheduled backup/restore cycles. If not, how often should this be performed to ensure that the database does not get corrupted. 2. Most of our customers are not using a UPS. From my experiments I have not managed to create a corrupted database by turning of the power while doing a large set of writes (in a session running in VirtualBox). Could someone please confirm that this is indeed safe when you are running with synchronized writes turned on? 3. Are there any operations on a live database that should be avoided to minimize the risk of corruptions? 4. Just read a discussion about whether it is needed or not to call fb_shutdown to stop Firebird Embedded. Could this be the reason why we are getting corruptions? Should we change our service to perform this call when it is stopped? 5. I have also seen discussions of turning of automatic sweeps of the database (and doing them manually instead). Is this a likely source of corruptions for our setup? Thanks in advance. Maybe are there no certain answers to my questions, but any pointers in the right direction would be very appreciated. Firebird has been a real workhorse for us and we would rather like to keep it. Best Regards //Jan Flyborg