Re: [Zope-dev] How bad _are_ ConflictErrors

2005-11-22 Thread Dieter Maurer
Chris Withers wrote at 2005-11-21 16:33 +:
 ...
here's a line from one of our event logs:

2005-11-17T08:00:27 INFO(0) ZODB conflict error at /some_uri
(347 conflicts since startup at 2005-11-08T17:56:20)

What is this telling me?

It is incredibly stupid.

The message above only tells you, that (at the given time)
a request for /some_uri resulted in a ConflictError
and that since startup (at the given time) 347 conflicts occured.

Unfortunately, it does not tell you

  *  what object caused the conflict

  *  whether it is a read or a write conflict
 (read conflicts are very rare since MVCC introduction,
 but they may still happen)

  *  for write conflicts: what versions of the object did particate

A long time ago, I posted an
extension making this additional information available (it
is all present in the exception instance. Zope is just too stupid
to read (and log) it).

Did the user actually see a ConflictError page?

Usually not.

Or was this error successfully resolved?

It may (or may not) later be resolved. This is still not clear
when the message is generated.

What object did this ConflictError occur on and/or how can I modify my 
our Zope instances to find out where the conflict was occurring?

See above -- search the archive for the extension...


Now, when should the number of ConflictErrors logged in this way start 
to become worrying?

When you start to see lots on them (per time unit).


I analysed the logs from our cluster and we're getting about 450 
conflict errors in our busiest hours when the cluster of 8 ZEO clients 
is taking about 11,000 hits in that hour.

Is this 'bad'?

I would not be happy: it is about 5 %.

This gives quite some chance that your customers see failures
caused by the conflicts (when 3 repetitions are not enough).

If so, where should I start to make things better?

You find out which objects cause the conflicts.

You analyse what you can do to reduce concurrent writes
to these objects (split into separate persistent subobjects)
or whether you can provide conflict resolution.

-- 
Dieter
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] How bad _are_ ConflictErrors

2005-11-21 Thread Dennis Allison

Conflicts and how they interact with the database and sessioning machinery 
is my hot button right at the moment )-:  I Hope I have not 
included too much information.

I ran a quick report and we see about 1000 conflicts per hour at 
about 12 hits per hour.  These are order of magnitude numbers and are 
highly variable.  The 1% number is way bigger than I am comfortable with 
although I have no basis to scale my expectations.  I'd be much happier were 
it a couple of orders of magnitude smaller.

Conflict errors are not always errors.  As I understand it, Zope retries
when a conflict occurs and usually is able to commit both sides of the 
conflicting transaction.  Sometimes Zope cannot commit conflicting 
transactions--and it is at that point that an error occurs.   There are 
supposed to be significant changes in the Zope 2.8.4/ZODB 3.4.2 system.
Read-read conflicts no longer generate conflict errors and the retry 
mechanism has been reworked at the ZODB level to retry once and then raise 
a POSKEY exception.

The optimistic locking used by Zope can cause problems, particularly when
the conflicting method changes external state.  We have seen instances
where an action was taken multiple times due to conflicts and their
resolution.  In one instance, we had an infinite loop in the conflict
resolution.   The interactions which can cause conflicts are not always 
obvious.  I am still learning.

We do have occasional instances where unresolved conflicts raise user 
visible diagnostics.  These are real errors.  While I have not explored 
the reasons why, it appears that at least some of these errors are not
logged in event.log but only displayed to the user.

I asked the list the other day whether anyone had prepared a set of best
practice guidelines on the techniques to use to minimize conflicts?
Dieter Maurer responded:

 
   *  Localize out into separate persistent objects attributes
  with high write frequency.
 
  E.g. when you have a counter, put into its own
  persistent object (you can use a BTrees.Length.Length object
  for a counter).
 
   *  Implement conflict resolution for your high frequently
  written persistent objects.
 
  Formerly, TemporaryStorage had only very limited
  history information to support conflict resolution (which
  limited the wholesome effect of conflict resolution).
  Rumours say that this improved with Zope 2.8.
 
   *  Write only when you really change something.
 
  E.g. instead of session[XXX] = sss use
  if session[XXX] != sss: session[XXX] = sss
  (at least, if there is a high chance that session already
  contains the correct value).

Session variable present a particularly vexing problem since they may 
trigger writes even though they are apparently read-only.   

Chris McDonough [EMAIL PROTECTED] wrote in response to my posting:
 
 On Nov 20, 2005, at 12:16 PM, Dennis Allison wrote:
[...]
  Looking at the code, I don't understand why I am seeing conflicts.
  As I understand things, neither variables in the dtml-let space nor
  the REQUEST/RESPONSE space are stored in the ZODB so modifications to
  them don't look like writes to the conflict mechanism.  Am I incorrect
  in my understanding?
 
 Yes, but that's understandable.  It's not exactly obvious.
 
 The sessioning machinery is one of the few places in Zope where it's  
 necessary for the code to do what's known as a write on read in the  
 ZODB database.
 
 Even if you're just reading from a session, looking up a session,  
 or doing anything otherwise related to sessioning, it's possible for  
 your code to generate a ZODB write.
 This is why you get conflicts even if you're just reading; whenever  
 you access the sessioning machinery, you are potentially (but not  
 always) causing a ZODB write.  All writes can potentially cause a  
 conflict error.
 
 While this might sound fantastic, it's pretty much impossible to  
 avoid when using ZODB as a sessioning backend.  The sessioning  
 machinery has been tuned to generate as few conflicts as possible,  
 and you can help it by doing your own timeout, resolution, and  
 housekeeping tuning as has been suggested.  MVCC gets rid of read  
 conflicts.  But it's not possible to completely avoid write conflicts  
 under the current design.
 
 Here's why.  The sessioning machinery is composed of three major data  
 structures:
 
 - an index of timeslice to bucket. A timeslice is an integer  
 representing
some range of time (the range of time is variable, depending on the
resolution, but out of the box, it represents 20 seconds).
 This mapping
is an IOBTree.
 
 - A bucket is a mapping from a browser id to session data  
 object (aka
transient object).  This mapping is an OOBTree.
 
 - three increasers which mark the last timeslice in which  
 something was done
(called the garbage collector, called the finalizer, etc).
 
 The point of sessioning is to provide a writable namespace 

Re: [Zope-dev] How bad _are_ ConflictErrors

2005-11-21 Thread Tim Peters
[Dennis Allison]
 ...
 Conflict errors are not always errors.

At the ZODB level, an unresolved conflict always raises an exception. 
Whether such an exception is considered to be an error isn't ZODB's
decision -- that's up to the app.  My understanding (which may be
wrong) is that Zope tries up to 3 times to perform  commit a given
transaction, suppressing any conflict exceptions for the duration,
before giving up.

 As I understand it, Zope retries when a conflict occurs and usually is able
 to commit both sides of the conflicting transaction.

Right (although note that there may be more than two sides).

 Sometimes Zope cannot commit conflicting transactions--and it is at that
 point that an error occurs.

Right, Zope eventually gives up on a transaction that keeps on raising
conflict exceptions.

 There are supposed to be significant changes in the Zope 2.8.4/ZODB 3.4.2
 system.

There are.  ZODB 3.3 introduced multiversion concurrency control
(MVCC), which eliminates read conflicts in normal operation.

 Read-read conflicts no longer generate conflict errors

Not really:  under MVCC, there simply aren't any read conflicts. 
There may still be write conflicts.

 and the retry mechanism has been reworked at the ZODB level to retry once
 and then raise a POSKEY exception.

Nope, no version of ZODB ever retries a transaction on its own.  If an
application (like Zope) wants to retry, it's entirely up to it do so.

 The optimistic locking used by Zope

ZODB's transactional approach is optimistic, precisely because it
_doesn't_ lock objects modified by a transaction.  Any number of
transactions are free to modify the same object at the same time -- no
locking mechanism attempts to stop that.  If multiple transactions do
modify the same object at the same time, and that object doesn't
implement conflict resolution, then only the first transaction to
commit its changes to that object can succeed.

 can cause problems, particularly when the conflicting method changes external
 state.

Yes -- but do note it's not a transactional system then (ZODB can roll
back all changes _it_ makes, so that a failure to commit does no harm
to the database state; external resources that can't take back
provisional changes are indeed challenging to use in a transactional
system).
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] How bad _are_ ConflictErrors

2005-11-21 Thread Chris McDonough

On Nov 21, 2005, at 2:10 PM, Dennis Allison wrote:


Conflicts and how they interact with the database and sessioning  
machinery

is my hot button right at the moment )-:  I Hope I have not
included too much information.

I ran a quick report and we see about 1000 conflicts per hour at
about 12 hits per hour.


Is this the number of log messages that indicate a conflict error  
occurred (e.g. x conflict errors since DATE messages in the event  
log) or the number of conflict errors that are retried more than  
three times and thus make it out to the app user?  I'm guessing the  
former.



  These are order of magnitude numbers and are
highly variable.  The 1% number is way bigger than I am comfortable  
with
although I have no basis to scale my expectations.  I'd be much  
happier were

it a couple of orders of magnitude smaller.


I would be too.  It's considerably difficult when ZODB is used as the  
sessioning backend.  A lot of effort has been put in to reducing the  
potential for conflicts already.  It could of course be better if  
more time was put in, but there hasn't been any reason (besides a  
sense of accomplishment and contribution to the greater good,  
anyway ;-) to put in that effort since the last time this machinery  
was overhauled.


That said, if no conflict errors actually bubble up to the user using  
the application, the penalty is just app performance and knowledge  
expense (e.g. you can't use a nontransactional mailhost, you can't  
use a nontransactional database table, etc).  You've already paid for  
the latter the hard way. ;-)  I can't judge the expense of the former  
to you but I assume that's what you're primarily worried about now.




Conflict errors are not always errors.


The real reason they're called errors is only because they're  
implemented as Python exceptions.  They are implemented as exceptions  
because it was the easiest mechanism to use (exceptions are already  
built into Python).



  As I understand it, Zope retries
when a conflict occurs and usually is able to commit both sides of the
conflicting transaction.


There can be more than two sides (actually there always are... there  
are three.. the two conflicting in-progress connection states and the  
database state).



  Sometimes Zope cannot commit conflicting
transactions--and it is at that point that an error occurs.


An exception occurs, yes.

Oops, I just realized Tim responded to the rest of these points, so I  
won't go on.



We do have occasional instances where unresolved conflicts raise user
visible diagnostics.  These are real errors.  While I have not  
explored

the reasons why, it appears that at least some of these errors are not
logged in event.log but only displayed to the user.


To be pedantic, if you're right about conflict error tracebacks being  
shown to end users, it's not because they are unresolved (in the  
sense that 'application-level conflict resolution' could have  
prevented them), it's because a request was issued that resulted in a  
conflict error, which was retried, and then that retried request  
raised a conflict error, and then twice more.  The only way to figure  
out what's going on here is to see the traceback.  IIRC, Zope logs  
conflict error tracebacks at the BLATHER log level (as well as a  
deluge of other ancillary info).


However, even if BLATHER logging mode is not on, if no obvious error  
is put in the event log when a conflict error is relayed to a user,  
that's definitely a bug.  I'd believe it in a second! ;-)


The Zope conflict exception catching code is written in such a  
complicated way (and without the benefit of any automated tests) that  
tracking that down could take an entire day which I don't have to  
burn ATM.  So I'm afraid the status quo will prevail until someone  
gets so indignant about it that they either pay for it to be fixed or  
fix it themselves.  Apologies for that. :-(


- C

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] How bad _are_ ConflictErrors

2005-11-21 Thread Chris McDonough

  These are order of magnitude numbers and are
highly variable.  The 1% number is way bigger than I am  
comfortable with
although I have no basis to scale my expectations.  I'd be much  
happier were

it a couple of orders of magnitude smaller.


I would be too.  It's considerably difficult when ZODB is used as  
the sessioning backend.  A lot of effort has been put in to  
reducing the potential for conflicts already.  It could of course  
be better if more time was put in, but there hasn't been any reason  
(besides a sense of accomplishment and contribution to the greater  
good, anyway ;-) to put in that effort since the last time this  
machinery was overhauled.


I should also say that without the benefit of knowing whether you've  
taken the advice of turning the knobs available to you that help  
reduce conflicts (bumping up timeout resolution, turning off inband  
housekeeping, using a local database rather than a ClientStorage- 
backed database for session data), that we enumerated in previous  
emails, it's hard to know whether doing any more work would be  
beneficial.


- C

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] How bad _are_ ConflictErrors

2005-11-21 Thread Dennis Allison
On Mon, 21 Nov 2005, Chris McDonough wrote:

 On Nov 21, 2005, at 2:10 PM, Dennis Allison wrote:
 
  Conflicts and how they interact with the database and sessioning  
  machinery
  is my hot button right at the moment )-:  I Hope I have not
  included too much information.
 
  I ran a quick report and we see about 1000 conflicts per hour at
  about 12 hits per hour.
 
 Is this the number of log messages that indicate a conflict error  
 occurred (e.g. x conflict errors since DATE messages in the event  
 log) or the number of conflict errors that are retried more than  
 three times and thus make it out to the app user?  I'm guessing the  
 former.

*** you are correct -- this is the easy hack on the event.log.  It's much 
harder to know how many make it out to the user.  We have an associated 
bug in the MySQL interface which generates threading errors, apparently 
triggered by a conflict error and the subsequent backout.  These occur 
with most conflicts which involve the database--almost every conflict with 
our system structure.

 
These are order of magnitude numbers and are
  highly variable.  The 1% number is way bigger than I am comfortable  
  with
  although I have no basis to scale my expectations.  I'd be much  
  happier were
  it a couple of orders of magnitude smaller.
 
 I would be too.  It's considerably difficult when ZODB is used as the  
 sessioning backend.  A lot of effort has been put in to reducing the  
 potential for conflicts already.  It could of course be better if  
 more time was put in, but there hasn't been any reason (besides a  
 sense of accomplishment and contribution to the greater good,  
 anyway ;-) to put in that effort since the last time this machinery  
 was overhauled.
 

*** I've moved from a ZODB sessioning backend to local sessioning.  There
has not been a significant change, I think because the MySQL problem
dominates at the moment.


 That said, if no conflict errors actually bubble up to the user using  
 the application, the penalty is just app performance and knowledge  
 expense (e.g. you can't use a nontransactional mailhost, you can't  
 use a nontransactional database table, etc).  You've already paid for  
 the latter the hard way. ;-)  I can't judge the expense of the former  
 to you but I assume that's what you're primarily worried about now.

*** Right now, we have major problems with our transactional database 
and locks.   Once that gets resolved, we will address how to refactor 
to minimize the cost of transactions and ensure correctness in the 
presence of conflicts.  Correctness is already pretty much guaranteed with 
our current systems structure.

 
 
  Conflict errors are not always errors.
 
 The real reason they're called errors is only because they're  
 implemented as Python exceptions.  They are implemented as exceptions  
 because it was the easiest mechanism to use (exceptions are already  
 built into Python).
 
As I understand it, Zope retries
  when a conflict occurs and usually is able to commit both sides of the
  conflicting transaction.
 
 There can be more than two sides (actually there always are... there  
 are three.. the two conflicting in-progress connection states and the  
 database state).
 
Sometimes Zope cannot commit conflicting
  transactions--and it is at that point that an error occurs.
 
 An exception occurs, yes.
 
 Oops, I just realized Tim responded to the rest of these points, so I  
 won't go on.
 
*** Yes, he did.  THANKS TIM for your comments and help.  (And you too 
Chris)


  We do have occasional instances where unresolved conflicts raise user
  visible diagnostics.  These are real errors.  While I have not  
  explored
  the reasons why, it appears that at least some of these errors are not
  logged in event.log but only displayed to the user.
 
 To be pedantic, if you're right about conflict error tracebacks being  
 shown to end users, it's not because they are unresolved (in the  
 sense that 'application-level conflict resolution' could have  
 prevented them), it's because a request was issued that resulted in a  
 conflict error, which was retried, and then that retried request  
 raised a conflict error, and then twice more.  The only way to figure  
 out what's going on here is to see the traceback.  IIRC, Zope logs  
 conflict error tracebacks at the BLATHER log level (as well as a  
 deluge of other ancillary info).
 
 However, even if BLATHER logging mode is not on, if no obvious error  
 is put in the event log when a conflict error is relayed to a user,  
 that's definitely a bug.  I'd believe it in a second! ;-)
 

*** have done that but no helpful results as of yet.


 The Zope conflict exception catching code is written in such a  
 complicated way (and without the benefit of any automated tests) that  
 tracking that down could take an entire day which I don't have to  
 burn ATM.  So I'm afraid the status quo will prevail until someone  
 gets so indignant about