On Wed, Mar 25, 2009 at 2:01 AM, Kenton Varda <ken...@google.com> wrote:
> On Tue, Mar 24, 2009 at 4:42 PM, David Anderson <d...@natulte.net> wrote:
>>
>> At Google, 99% of assertions remain enabled at all times. There are
>> some nuances (as always, mindless dogma gets you nowhere), but in
>> general, if it should crash during testing, it should also crash in
>> production. The alternative is entering the forbidden realm of
>> undefined/unpredictable behavior. We really much prefer to debug a
>> clean crash with a traceback at the error site, rather than sieving
>> through weirdness after the fact.
>
> Actually, there are a significant number of Google engineers who fiercely
> disagree with you.  In many cases it's actually better to log an error and
> then try your best to do something reasonable than to take down a process
> which is in the middle of serving dozens of other users. Yes, everything has
> to be fault-tolerant in the end due to the possibility of machine failures,
> but assertion failures are much less predictable than machine failures.
>  It's very easy to accidentally write an assertion that passes testing but
> then unexpectedly starts triggering in production when something changes.
>  If your service gets any significant amount of traffic, no reasonable
> amount of fault-tolerance can protect you against an assertion failure that
> occurs on even 0.1% of user requests.  Therefore, most critical user-facing
> servers at Google have rules against assertions that are fatal in
> production.
> Needless to say, there have been endless flamewars about this on internal
> mailing lists.  I'm not actually sure what side I'm on personally.
> In the case of the IsInitialized() check, the main reason that it is
> debug-only is for speed.  However, the considerations above are also
> important here.

This is true. Since I work mainly in infrastructure that never
directly sees a user, it is cheaper for us to crash immediately,
rather than have yet more error reporting machinery and spending time
to guess adequate answers to insane calls to functions. For us, it is
very cheap to crash, since upstream code will happily compensate, and
the worst case scenario is that we'll delay some batch/soft real time
processing by a couple of seconds. Obviously unacceptable for user
code. I should have better specified the context of my reply.

As you say, there has been endless debate on the subject, and it's
hard to come up with a definite "yes or no" rule. However, I'll note
that what you described for user-facing code, to log an error and try
to continue, is not the behavior you get with -DNDEBUG. By verbosely
logging/reporting an error, using the appropriate tools, you can get
as much information as a crash with traceback, which is identical in
terms of debugability as having crashed and reported the crash site,
except that users don't notice. -DNDEBUG however will just silently
skip all assertions and start stomping around on the stack, leaving
you with no idea of what the hell happened when the corruption does
eventually become critical enough to crash the process.

In other words, 99% of assertions and "non-fatal assertions" remain
enabled in production code. The choice for us is usually between "Do I
crash or make up an answer after verbosely reporting the failure?",
whereas the -DNDEBUG choice is "Do I report and crash, or silently
start unleashing the dragons?". Very different options :-)

- Dave

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to