Re: [firebird-support] Firebird Embedded corruptions

2014-09-23 Thread Jan Flyborg jan.pers...@gmail.com [firebird-support]
Hi,

Thanks for this and sorry for my slow response. Please see my comments
below.

Best Regards
//Jan Flyborg

2014-09-19 21:13 GMT+02:00 Ann Harrison aharri...@ibphoenix.com
[firebird-support]
firebird-support@yahoogroups.com:



 On Mon, Sep 15, 2014 at 7:41 AM, Jan Flyborg jan.pers...@gmail.com
 [firebird-support] firebird-support@yahoogroups.com wrote:




 I just made another posting where I tried to describe three different
 examples of things we have seen.


 The first was a wrong page type, which sounds like a bug that was fixed in
 a newer version in code that's common to all Firebird architectures.  In
 your case, the bad page was in an index (7).  If you can find the index
 with the bad page and recreate it, all will be well.

 Just as an FYI, the page types are:
  0 -   undefined, normally an uninitialized page and indicates a
 bad page pointer elsewhere;
  1 -  Database header page
  2 - Page inventory page
  3 - Transaction inventory page
  4 - Pointer page
  5 - Data page
  6 - Index root page - contains information about each index on the
 table, one per table
  7 - Index (B-tree) page
  8 - Blob data page
  9 -   Generator pages


That sounds very good and it seems like an upgrade to 2.5.3 will make sure
that we do not see this again.



 The second problem (CCH_precedence: block marked.  file: cch.cpp line:
 4390) is more concerning - I don't remember having read a bug about it.
 CCH is the cache handler.  A mark is the sign that a page is about to be
 changed.   When Firebird is forced to write a page either as part of a
 commit or to free space in the cache, it must write out any pages that the
 page depends on first.  That's a little obscure.  Suppose that the page
 you're about to write has a record with a back version, and the back
 version is on a different page.  To keep the database consistent, the page
 with the back version must be on disk before the page that includes a
 record that points to the back version.  Firebird keeps a list of
 precedence relationships and CCH goes through them before writing a page.
 I think the error means that someone is currently writing  to a page that's
 on the precedence list.  That should never happen.  It's interesting that
 the problem occurred during an alter index operation.  However, the
 database should be fine on disk and usable after you restart Firebird.
 Page marks are entirely in memory.  It's quite possible that I missed a bug
 report and this problem was fixed in a later version.


If that is of any help for you, I was wrong in my original posting when I
said we were using 2.5.1 (I mean that the line numbers in the exception
might lead you to draw the wrong conclusion when I gave you the wrong
version). We are currently using 2.5.2 and nothing else.



 The third problem is two records in a referencing table lack mates in the
 referenced table, despite a referential constraint.  I have no idea how
 that happened, but it should be reasonably easy to fix in your database.


In another posting (later than yours) Fabiano is saying that these errors
are connected to bad memory chips and in the future we will instruct our
users who are having this problem to run memtest86 overnight to check that
the memory is physically OK. These constraints problems are actually the
most common that we see.




The first problem is what I would call a physical  corruption - the
 internal structure of the database is corrupt.  The second is an in-memory
   corruption - the disk database is OK, but the in-memory version is
 damaged.  The third is logical corruption - the database is physically
 intact, but does not conform to the data rules..



 Typically we fix our problems with a gfix -mend and then doing a backup
 restore cycle. Usually some tables then still have problems (typically
 foreign keys that refers to non existing primary keys), so if possible we
 then remove the faulty records and then it works again.


Problem is that these are not my databases. I have normally no access to
them since they are running in a standalone installation at our customers
sites. Recently we have bundled our own homemade tool for repairing
databases that our customer can use when they are experiencing problems
(basically a graphical frontend for gfix), but sometimes this is not enough
and the databases has to be sent to us.



 Gfix is pretty old and somewhat crude.  IBFirstAid might give you better
 help on physical corruptions.  Checking that there is no non-conforming
 data before creating constraints may help with logical corruption.


Yes that would probably be a better choice for us, but we cannot bundle
IBFirstAId together with our application. Will however download it and try
it on files to got sent to us.



 Good luck (and my apologies for the late response)


No need for any apologies. I am very grateful for you taking your time to
help us.

Another thing, what do you say about the posting above where 

Re: [firebird-support] Firebird Embedded corruptions

2014-09-23 Thread Alexey Kovyazin a...@ib-aid.com [firebird-support]
Jan,

 Yes that would probably be a better choice for us, but we cannot 
bundle IBFirstAId together with our application. Will however download 
it and try it on files to got sent to us.

Actually, current version of FirstAID which is available at our web site 
is a full version, but it requests license every time you are running 
recovery (not diagnose).
  So you can (from license and technical points of view) include 
FirstAID executables into your application, so user will be able to use 
on demand.
Don't hesitate to contact our supp...@ib-aid.com for more details.

Regards,
Alexey Kovyazin
IBSurgeon








++

Visit http://www.firebirdsql.org and click the Documentation item
on the main (top) menu.  Try FAQ and other links from the left-side menu there.

Also search the knowledgebases at http://www.ibphoenix.com/resources/documents/ 

++


Yahoo Groups Links

* To visit your group on the web, go to:
http://groups.yahoo.com/group/firebird-support/

* Your email settings:
Individual Email | Traditional

* To change settings online go to:
http://groups.yahoo.com/group/firebird-support/join
(Yahoo! ID required)

* To change settings via email:
firebird-support-dig...@yahoogroups.com 
firebird-support-fullfeatu...@yahoogroups.com

* To unsubscribe from this group, send an email to:
firebird-support-unsubscr...@yahoogroups.com

* Your use of Yahoo Groups is subject to:
https://info.yahoo.com/legal/us/yahoo/utos/terms/



Re: [firebird-support] Firebird Embedded corruptions

2014-09-23 Thread Ann Harrison aharri...@ibphoenix.com [firebird-support]
On Tue, Sep 23, 2014 at 10:49 AM, Jan Flyborg jan.pers...@gmail.com
[firebird-support] firebird-support@yahoogroups.com wrote:



 The first was a wrong page type, which sounds like a bug that was fixed in
 a newer version in code that's common to all Firebird architectures.  In
 your case, the bad page was in an index (7).  If you can find the index
 with the bad page and recreate it, all will be well.

 That sounds very good and it seems like an upgrade to 2.5.3 will make
 sure that we do not see this again.


Anytime your users get an error of of the form wrong page type, expected 7
encountered n, you can probably work with them to identify and rebuild the
bad index.




 The second problem (CCH_precedence: block marked.  file: cch.cpp line:
 4390) is more concerning

 If that is of any help for you, I was wrong in my original posting when I
 said we were using 2.5.1 (I mean that the line numbers in the exception
 might lead you to draw the wrong conclusion when I gave you the wrong
 version). We are currently using 2.5.2 and nothing else.


I follow bug reports but not religiously.  So I searched for one that
includes block marked and modify RDB$INDICES and found #4467 which is
marked as will not fix and described as a user error.  User errors should
not cause internal cache manager problems, so I'm somewhat bemused.  It was
reported in 2.5.2, so it may well be your problem.




 The third problem is two records in a referencing table lack mates in the
 referenced table, despite a referential constraint.  I have no idea how
 that happened, but it should be reasonably easy to fix in your database.


 In another posting (later than yours) Fabiano is saying that these errors
 are connected to bad memory chips and in the future we will instruct our
 users who are having this problem to run memtest86 overnight to check that
 the memory is physically OK. These constraints problems are actually the
 most common that we see.


Clever memory problem to corrupt just the key or the constraint check.
Certainly it's worth checking that the memory is OK.  I'd also check that
the referencing key looks generally sound.  Do you add referential
constraints to existing databases?  A problem with broken constraints is
that the error doesn't leave traces, so a reproducible case would be very
helpful, but very hard to produce.




 Gfix is pretty old and somewhat crude.  IBFirstAid might give you better
 help on physical corruptions.  Checking that there is no non-conforming
 data before creating constraints may help with logical corruption.


 Yes that would probably be a better choice for us, but we cannot bundle
 IBFirstAId togethe r with our application. Will however download it and
 try it on files to got sent to us.


The analysis tool is free - maybe your users could download it themselves
to look for evidence.  But it's not going to help with broken referential
constraints or mangled cache precedence.



 Another thing, what do you say about the posting above where the theory is
 that Volume Shadow Copy is interfering with the database? Have you heard
 about that before?


I'm quite sure that Volume Shadow Copy won't make good copies of an active
database or any other file that's open for random writes.  Whether it could
corrupt the original is an open question.  Lots of people claim to have
seen instances where copying a database corrupts the original.


 And another last comment. We have bundled Firebird w ith very many
 installations of our product and it might be the case that what we are
 seeing are very rare problems, that no one else has experienced before. Do
 you think we should post bug reports every time we see an exception or a
 problem that you have not already been made aware of?


Search the tracker (http://tracker.firebirdsql.org/browse) first to see if
the problem has been reported.  Then you might mention it on the support
list to see if there's something that looks like a user error so you won't
annoy the developers with stuff that the volunteers on this list could
resolve.  But if your getting errors with source file and line numbers, the
chances are good that you've found a bug.  Firebird is used pretty widely
and quite heavily in many installations.  However, the embedded form
probably gets less stress in the world than any of the architectures, so
you may be stressing something unusual.   No development group, open or
closed source, can fix bugs it doesn't know about.

Thank you for working with Firebird on these problems.

Good luck,

Ann


Re: [firebird-support] Firebird Embedded corruptions

2014-09-20 Thread Ivan Arabadzhiev intelru...@yahoo.com [firebird-support]
Hi, I just saw CCH mentioned and figured I`d pitch in -
http://tracker.firebirdsql.org/browse/CORE-4467 . I was basically told it
was most probably bad hardware but the error happens only under major loads
(in short - a transacation does a million or so updates and in the mean
time a few others are trying to do the same and getting a lock conflict).
Haven`t had a case of that workload happening since I updated to 2.5.3 -
partly luck, partly I don`t want to take the chance so I can`t give you a
status on it. I can say I don`t seem to get corruptions under sane
workloads (haven`t touched the hardware, updated a kernel or two in the
mean time)

2014-09-19 22:28 GMT+03:00 'Fabiano - Desenvolvimento SCI'
fabi...@sci10.com.br [firebird-support] firebird-support@yahoogroups.com:



  Ann, about the third problem:

 “The third problem is two records in a referencing table lack mates in
 the referenced table, despite a referential constraint.  I have no idea how
 that happened, but it should be reasonably easy to fix in your database.

 ”

 I saw this happen two times, it is related to bad RAM. I thought that when
 Firebird writes to the memory the memory changes this contents and when the
 transaction commits you get a different value. We struggle with one case
 this week. The solution is change RAM from the server where Firebird is
 running.



 *De:* firebird-support@yahoogroups.com [mailto:
 firebird-support@yahoogroups.com]
 *Enviada em:* sexta-feira, 19 de setembro de 2014 16:14
 *Para:* firebird-support@yahoogroups.com
 *Assunto:* Re: [firebird-support] Firebird Embedded corruptions





 On Mon, Sep 15, 2014 at 7:41 AM, Jan Flyborg jan.pers...@gmail.com
 [firebird-support] firebird-support@yahoogroups.com wrote:







 I just made another posting where I tried to describe three different
 examples of things we have seen.



 The first was a wrong page type, which sounds like a bug that was fixed in
 a newer version in code that's common to all Firebird architectures.  In
 your case, the bad page was in an index (7).  If you can find the index
 with the bad page and recreate it, all will be well.



 Just as an FYI, the page types are:

  0 -   undefined, normally an uninitialized page and indicates a
 bad page pointer elsewhere;

  1 -  Database header page

  2 - Page inventory page

  3 - Transaction inventory page

  4 - Pointer page

  5 - Data page

  6 - Index root page - contains information about each index on the
 table, one per table

  7 - Index (B-tree) page

  8 - Blob data page

  9 -   Generator pages



 The second problem (CCH_precedence: block marked.  file: cch.cpp line:
 4390) is more concerning - I don't remember having read a bug about it.
 CCH is the cache handler.  A mark is the sign that a page is about to be
 changed.   When Firebird is forced to write a page either as part of a
 commit or to free space in the cache, it must write out any pages that the
 page depends on first.  That's a little obscure.  Suppose that the page
 you're about to write has a record with a back version, and the back
 version is on a different page.  To keep the database consistent, the page
 with the back version must be on disk before the page that includes a
 record that points to the back version.  Firebird keeps a list of
 precedence relationships and CCH goes through them before writing a page.
 I think the error means that someone is currently writing  to a page that's
 on the precedence list.  That should never happen.  It's interesting that
 the problem occurred during an alter index operation.  However, the
 database should be fine on disk and usable after you restart Firebird.
 Page marks are entirely in memory.  It's quite possible that I missed a bug
 report and this problem was fixed in a later version.



 The third problem is two records in a referencing table lack mates in the
 referenced table, despite a referential constraint.  I have no idea how
 that happened, but it should be reasonably easy to fix in your database.



 The first problem is what I would call a physical  corruption - the
 internal structure of the database is corrupt.  The second is an in-memory
   corruption - the disk database is OK, but the in-memory version is
 damaged.  The third is logical corruption - the database is physically
 intact, but does not conform to the data rules..





Typically we fix our problems with a gfix -mend and then doing a
 backup restore cycle. Usually some tables then still have problems
 (typically foreign keys that refers to non existing primary keys), so if
 possible we then remove the faulty records and then it works again.



 Gfix is pretty old and somewhat crude.  IBFirstAid might give you better
 help on physical corruptions.  Checking that there is no non-conforming
 data before creating constraints may help with logical corruption.



 Good luck (and my apologies for the late response)



 Ann





Re: [firebird-support] Firebird Embedded corruptions

2014-09-19 Thread Ann Harrison aharri...@ibphoenix.com [firebird-support]
On Mon, Sep 15, 2014 at 7:41 AM, Jan Flyborg jan.pers...@gmail.com
[firebird-support] firebird-support@yahoogroups.com wrote:




 I just made another posting where I tried to describe three different
 examples of things we have seen.


The first was a wrong page type, which sounds like a bug that was fixed in
a newer version in code that's common to all Firebird architectures.  In
your case, the bad page was in an index (7).  If you can find the index
with the bad page and recreate it, all will be well.

Just as an FYI, the page types are:
 0 -   undefined, normally an uninitialized page and indicates a
bad page pointer elsewhere;
 1 -  Database header page
 2 - Page inventory page
 3 - Transaction inventory page
 4 - Pointer page
 5 - Data page
 6 - Index root page - contains information about each index on the
table, one per table
 7 - Index (B-tree) page
 8 - Blob data page
 9 -   Generator pages

The second problem (CCH_precedence: block marked.  file: cch.cpp line:
4390) is more concerning - I don't remember having read a bug about it.
CCH is the cache handler.  A mark is the sign that a page is about to be
changed.   When Firebird is forced to write a page either as part of a
commit or to free space in the cache, it must write out any pages that the
page depends on first.  That's a little obscure.  Suppose that the page
you're about to write has a record with a back version, and the back
version is on a different page.  To keep the database consistent, the page
with the back version must be on disk before the page that includes a
record that points to the back version.  Firebird keeps a list of
precedence relationships and CCH goes through them before writing a page.
I think the error means that someone is currently writing  to a page that's
on the precedence list.  That should never happen.  It's interesting that
the problem occurred during an alter index operation.  However, the
database should be fine on disk and usable after you restart Firebird.
Page marks are entirely in memory.  It's quite possible that I missed a bug
report and this problem was fixed in a later version.

The third problem is two records in a referencing table lack mates in the
referenced table, despite a referential constraint.  I have no idea how
that happened, but it should be reasonably easy to fix in your database.

The first problem is what I would call a physical  corruption - the
internal structure of the database is corrupt.  The second is an in-memory
  corruption - the disk database is OK, but the in-memory version is
damaged.  The third is logical corruption - the database is physically
intact, but does not conform to the data rules..



 Typically we fix our problems with a gfix -mend and then doing a backup
 restore cycle. Usually some tables then still have problems (typically
 foreign keys that refers to non existing primary keys), so if possible we
 then remove the faulty records and then it works again.


Gfix is pretty old and somewhat crude.  IBFirstAid might give you better
help on physical corruptions.  Checking that there is no non-conforming
data before creating constraints may help with logical corruption.

Good luck (and my apologies for the late response)

Ann


Re: [firebird-support] Firebird Embedded corruptions

2014-09-15 Thread Jan Flyborg jan.pers...@gmail.com [firebird-support]
Hi,

First a sincere thanks to all of you for your answers.

We have all kinds of different corruptions and maybe do they not have the
same root cause. Here I will give you three typical examples.

*Example 1*
This user complained that the system had stopped. Upon further
investigation the following exception was found in our logs and when we
received the database it was indeed corrupted.

Exception while executing job: NHibernate.Exceptions.GenericADOException:
Error executing Enumerable() query[SQL: select
sequencevo0_.recording_sequence_id as col_0_0_ from Recording_Sequence
sequencevo0_ where  not (exists (select filesequen1_.SEQ_ID from File_Seq
filesequen1_, Recording_Sequence sequencevo2_ where
filesequen1_.SEQ_ID=sequencevo2_.recording_sequence_id and
filesequen1_.SEQ_ID=sequencevo0_.recording_sequence_id)) and
sequencevo0_.StopTime@p0] ---
FirebirdSql.Data.FirebirdClient.FbException: database file appears corrupt
(C:\PROGRAMDATA\AXIS COMMUNICATIONS\AXIS CAMERA STATION SERVER\ACS.FDB)

wrong page type

page 3819 is of wrong type (expected 7, found 3) ---
FirebirdSql.Data.Common.IscException: Exception of type
'FirebirdSql.Data.Common.IscException' was thrown.

   at FirebirdSql.Data.Client.Native.FesDatabase.ParseStatusVector(IntPtr[]
statusVector)

   at FirebirdSql.Data.Client.Native.FesStatement.Fetch()

   at FirebirdSql.Data.FirebirdClient.FbCommand.Fetch()

*Example 2*
Here is another example of a corruption.

FirebirdSql.Data.FirebirdClient.FbException (0x80004005): unsuccessful
metadata update
MODIFY RDB$INDICES failed
internal gds software consistency check (CCH_precedence: block marked
(212), file: cch.cpp line: 4390) --- unsuccessful metadata update
MODIFY RDB$INDICES failed
internal gds software consistency check (CCH_precedence: block marked
(212), file: cch.cpp line: 4390)
at FirebirdSql.Data.FirebirdClient.FbCommand.ExecuteNonQuery()

*Example 3*
Here is a file that got sent in from a user that complained that his system
was no longer working. For this file it looks like one table in the
database has two records with foreign keys that refers to non existing
primary keys in another table (which we have constraints for), so how this
data has entered a transaction is somewhat of a mystery to us:

$ gfix -v -user SYSDBA -password masterkey
acs_system_2014-06-17_16-17-33.737.fdb

$ gbak -b -g -user SYSDBA -password masterkey
acs_system_2014-06-17_16-17-33.737.fdb out.fbk

$ gbak -c -user SYSDBA -password masterkey out.fbk restored.fdb
gbak:cannot commit index FKA6F4437CE96F23CD
gbak: ERROR:violation of FOREIGN KEY constraint FKA6F4437CE96F23CD on
table FILE_SEQ
gbak: ERROR:Foreign key reference target does not exist
gbak:cannot commit index FKA6F4437C58FFBFFC
gbak: ERROR:violation of FOREIGN KEY constraint FKA6F4437C58FFBFFC on
table FILE_SEQ
gbak: ERROR:Foreign key reference target does not exist
gbak:Database is not online due to failure to activate one or more indices.
gbak:Run gfix -online to bring database online without active indices.

If anyone is interested I can provide you with more details or even
complete database files for further investigation. We have loads of
corruptions like these three.

Best Regards
//Jan Flyborg


2014-09-13 22:19 GMT+02:00 Alexey Kovyazin a...@ib-aid.com [firebird-support]
firebird-support@yahoogroups.com:



 Hi Jan,

 You did not tell what kind of corruption you had (please provide full text
 of error). There are plenty of them, as well as reasons.
 You also could use our tool FirstAID (Direct) to analyze database on low
 level and see where are the problems.

 Regards,
 Alexey Kovyazin
 IBSurgeon (www.ib-aid.com)




   Hi,

  We have shipped Firebird Embedded bundled together with our product for a
 few years now and the system is currently in production at several thousand
 of our customer's sites. Currently we are using Firebird Embedded 2.5.1
 with the latest .NET-driver and a stack consisting of Castle Active Record
 on top on NHibernate and the system is running on the latest versions of
 Windows.

 All is well and Firebird has served us good so far with the exception of
 database corruptions that gets reported from a new set of customers every
 week. For some of them it is possible to instruct the customer on how to
 repair the databases themselves, but some of the databases are
 unfortunately so heavily corrupted that they need to be sent to us for
 repairing (which is a tedious work that steals time from other tasks). Most
 of them corruptions are normally found in the tables that gets the most
 writes, but I guess that is only natural.

 We are now at the planning stage for the next major release of our product
 and we are thus rethinking if Firebird really is a good choice, because of
 this.

  Lots of effort has gone into solving this problem on our side, so I think
 the normal prerequisites has already been put into place (e.g using forced
 writes and so forth), but our system needs to be up and running 24x7, which
 

Re: [firebird-support] Firebird Embedded corruptions

2014-09-15 Thread Jan Flyborg jan.pers...@gmail.com [firebird-support]
Hi,

Thanks for your answers. Please se my comments inline below.

Best Regards
//Jan Flyborg


2014-09-13 22:31 GMT+02:00 Svein Erling Tysvær
svein.erling.tysv...@kreftregisteret.no [firebird-support] 
firebird-support@yahoogroups.com:

  Hei Jan!

 Hej!


 The one thing I try to avoid, is running DDL (CREATE, ALTER, DROP
 table|trigger|stored procedure) on a database in use. Maybe I'm overly
 careful, but not all too long ago, a colleague caused some problems when he
 did

 ALTER MyTable DROP MyField;

 while he simultaneously had another transaction having uncommitted changes
 to MyField in one record.


We never do that. All our database upgrades takes place just after our
middleware has started and before we give any access to the other parts of
the system, so this is probably not the explanation for our problems, since
there is always just one transaction running when we are modyfying the data
model.


 I think (but have no experience), that possible reasons for corruption
 could include file system backups of the database while it is in use
 (exclude the database file(s) from such backups, rather use gbak for the
 backup, and include the resulting file in the system backup),


Since we target a market consisting of normal end users it is hard for us
to exclude our files from backups that our customers are performing. We can
instruct them to do that, but we can never be sure that they follow our
instructions. Also I can understand that the Firebird database files could
become corrupted if you performed a (non-consistent) backup of them and
then read back this backup into the production system, but we are seeing
these corruptions on non backed up database files.


 and possibly anti-viruses preventing Firebird from doing it's work (though
 I would expect this to result in the database being unaccessible, not
 corrupted).


Yes I agree. The file locking would probably not corrupt the database file,
but I am by no means any expert.


 Another thing that's only affecting Fb 2.5.1, is that this version has an
 error relating to compund indices (requiring backup/restore or rebuilding
 such indices if upgrading to 2.5.2). Though I doubt this error would cause
 data corruptions involving more than the index.


We are going to upgrade in any case as soon as possible, so we will see if
the problem will disappaer then.


 Others will be able to give you a more thorough answer, despite having
 used Firebird since it's inception (0.9.4), I've very little experience
 with corruptions (undoubtedly related to only working on a handful of
 databases with about 20 simultaneous users).

 That sounds promising to us. Everyone else seems to have good success with
Firebird.


 HTH,
 Set

  



Re: [firebird-support] Firebird Embedded corruptions

2014-09-15 Thread Jan Flyborg jan.pers...@gmail.com [firebird-support]
Hi,

Thanks again.

2014-09-14 19:56 GMT+02:00 Ann Harrison aharri...@ibphoenix.com
[firebird-support] firebird-support@yahoogroups.com:



 On Sat, Sep 13, 2014 at 12:22 PM, Jan Flyborg jan.pers...@gmail.com
 [firebird-support] firebird-support@yahoogroups.com wrote:



 Lots of effort has gone into solving this problem on our side, so I think
 the normal prerequisites has already been put into place (e.g using forced
 writes and so forth), but our system needs to be up and running 24x7, which
 means that it is not possible to schedule periodic backup/restore cycles
 and my personal theory is that Firebird embedded gets corrupted over time
 if you are not doing this regularly.


 Nice theory, but if the database is physically corrupt, you can't back it
 up, and if it's logically corrupt, you can't restore it.  I think it's
 worth looking elsewhere for the problem.


Yes you are correct. I can see that now.



 So I have have a few questions that I would appreciate if someone could
 answer:

 1. Is it feasible to run Firebird Embedded 24x7 in a setup where there
 are no scheduled backup/restore cycles. If not, how often should this be
 performed to ensure that the database does not get corrupted.


 It should be possible to run Firebird Embedded 24x7.  Without knowing what
 you're seeing as corruptions, it's very hard to guess why they're
 occurring.  What errors are your customers seeing?  What do they (and you)
 do to correct the errors?


I just made another posting where I tried to describe three different
examples of things we have seen. It would be really nice if you could take
a look at this.



 2. Most of our customers are not using a UPS. From my experiments I have
 not managed to create a corrupted database by turning of the power while
 doing a large set of writes (in a session running in VirtualBox). Could
 someone please confirm that this is indeed safe when you are running with
 synchronized writes turned on?


 A hard shutdown should not corrupt a database that has forced writes
 enabled.  It might corrupt the file system, but again, without knowing what
 the errors and problem are, it's hard to guess.


 3. Are there any operations on a live database that should be avoided to
 minimize the risk of corruptions?


 Dropping tables and altering tables to drop fields are pretty dangerous
 operations, but even if that is what's happening, the development group
 should be given a reproducible case that corrupts databases.


As explained in a previous post, we never do that with other transactions
running.


 4. Just read a discussion about whether it is needed or not to call
 fb_shutdown to stop Firebird Embedded. Could this be the reason why we are
 getting corruptions? Should we change our service to perform this call when
 it is stopped?

 5. I have also seen discussions of turning of automatic sweeps of the
 database (and doing them manually instead). Is this a likely source of
 corruptions for our setup?


 No. Sweeping the database is very much like backing it up without creating
 the backup file.  When a sweep starts during heavy database usage, it can
 reduce performance but not corrupt the database.

 So, question back to you:  what errors are you seeing and how have you
 fixed them?


Typically we fix our problems with a gfix -mend and then doing a backup
restore cycle. Usually some tables then still have problems (typically
foreign keys that refers to non existing primary keys), so if possible we
then remove the faulty records and then it works again.

However, some database are so heavily corrupted that this strategy would
give us an empty database and if that is the case we have to tell the
customer to start all over again again with an empty file.

Best Regards
//Jan Flyborg


Re: [firebird-support] Firebird Embedded corruptions

2014-09-14 Thread Ann Harrison aharri...@ibphoenix.com [firebird-support]
On Sat, Sep 13, 2014 at 12:22 PM, Jan Flyborg jan.pers...@gmail.com
[firebird-support] firebird-support@yahoogroups.com wrote:



 We have shipped Firebird Embedded bundled together with our product for a
 few years now and the system is currently in production at several thousand
 of our customer's sites...

 All is well and Firebird has served us good so far with the exception of
 database corruptions that gets reported from a new set of customers every
 week. We are now at the planning stage for the next major release of
 our product and we are thus rethinking if Firebird really is a good choice,
 because of this.

 I can understand that.




 Lots of effort has gone into solving this problem on our side, so I think
 the normal prerequisites has already been put into place (e.g using forced
 writes and so forth), but our system needs to be up and running 24x7, which
 means that it is not possible to schedule periodic backup/restore cycles
 and my personal theory is that Firebird embedded gets corrupted over time
 if you are not doing this regularly.


Nice theory, but if the database is physically corrupt, you can't back it
up, and if it's logically corrupt, you can't restore it.  I think it's
worth looking elsewhere for the problem.

So I have have a few questions that I would appreciate if someone could
 answer:

 1. Is it feasible to run Firebird Embedded 24x7 in a setup where there are
 no scheduled backup/restore cycles. If not, how often should this be
 performed to ensure that the database does not get corrupted.


It should be possible to run Firebird Embedded 24x7.  Without knowing what
you're seeing as corruptions, it's very hard to guess why they're
occurring.  What errors are your customers seeing?  What do they (and you)
do to correct the errors?


 2. Most of our customers are not using a UPS. From my experiments I have
 not managed to create a corrupted database by turning of the power while
 doing a large set of writes (in a session running in VirtualBox). Could
 someone please confirm that this is indeed safe when you are running with
 synchronized writes turned on?


A hard shutdown should not corrupt a database that has forced writes
enabled.  It might corrupt the file system, but again, without knowing what
the errors and problem are, it's hard to guess.


 3. Are there any operations on a live database that should be avoided to
 minimize the risk of corruptions?


Dropping tables and altering tables to drop fields are pretty dangerous
operations, but even if that is what's happening, the development group
should be given a reproducible case that corrupts databases.


 4. Just read a discussion about whether it is needed or not to call
 fb_shutdown to stop Firebird Embedded. Could this be the reason why we are
 getting corruptions? Should we change our service to perform this call when
 it is stopped?

 5. I have also seen discussions of turning of automatic sweeps of the
 database (and doing them manually instead). Is this a likely source of
 corruptions for our setup?


No. Sweeping the database is very much like backing it up without creating
the backup file.  When a sweep starts during heavy database usage, it can
reduce performance but not corrupt the database.

So, question back to you:  what errors are you seeing and how have you
fixed them?

Good luck,

Ann





Re: [firebird-support] Firebird Embedded corruptions

2014-09-13 Thread Alexey Kovyazin a...@ib-aid.com [firebird-support]

Hi Jan,

You did not tell what kind of corruption you had (please provide full 
text of error). There are plenty of them, as well as reasons.
You also could use our tool FirstAID (Direct) to analyze database on low 
level and see where are the problems.


Regards,
Alexey Kovyazin
IBSurgeon (www.ib-aid.com)




Hi,

We have shipped Firebird Embedded bundled together with our product 
for a few years now and the system is currently in production at 
several thousand of our customer's sites. Currently we are using 
Firebird Embedded 2.5.1 with the latest .NET-driver and a stack 
consisting of Castle Active Record on top on NHibernate and the system 
is running on the latest versions of Windows.


All is well and Firebird has served us good so far with the exception 
of database corruptions that gets reported from a new set of customers 
every week. For some of them it is possible to instruct the customer 
on how to repair the databases themselves, but some of the databases 
are unfortunately so heavily corrupted that they need to be sent to us 
for repairing (which is a tedious work that steals time from other 
tasks). Most of them corruptions are normally found in the tables that 
gets the most writes, but I guess that is only natural.


We are now at the planning stage for the next major release of our 
product and we are thus rethinking if Firebird really is a good 
choice, because of this.


Lots of effort has gone into solving this problem on our side, so I 
think the normal prerequisites has already been put into place (e.g 
using forced writes and so forth), but our system needs to be up and 
running 24x7, which means that it is not possible to schedule periodic 
backup/restore cycles and my personal theory is that Firebird embedded 
gets corrupted over time if you are not doing this regularly.


So I have have a few questions that I would appreciate if someone 
could answer:


1. Is it feasible to run Firebird Embedded 24x7 in a setup where there 
are no scheduled backup/restore cycles. If not, how often should this 
be performed to ensure that the database does not get corrupted.


2. Most of our customers are not using a UPS. From my experiments I 
have not managed to create a corrupted database by turning of the 
power while doing a large set of writes (in a session running in 
VirtualBox). Could someone please confirm that this is indeed safe 
when you are running with synchronized writes turned on?


3. Are there any operations on a live database that should be avoided 
to minimize the risk of corruptions?


4. Just read a discussion about whether it is needed or not to call 
fb_shutdown to stop Firebird Embedded. Could this be the reason why we 
are getting corruptions? Should we change our service to perform this 
call when it is stopped?


5. I have also seen discussions of turning of automatic sweeps of the 
database (and doing them manually instead). Is this a likely source of 
corruptions for our setup?


Thanks in advance. Maybe are there no certain answers to my questions, 
but any pointers in the right direction would be very appreciated. 
Firebird has been a real workhorse for us and we would rather like to 
keep it.


Best Regards
//Jan Flyborg