Re: The extent of trust in errors and error handling

2017-02-07 Thread Steve Biedermann via Digitalmars-d

On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
tl;dr - Seeking thoughts on trusting a system that allows 
"handling" errors.


One of my extra-curricular interests is the Mill CPU[1]. A 
recent discussion in that context reminded me of the 
Error-Exception distinction in languages like D.


1) There is the well-known issue of whether Error should ever 
be caught. If Error represents conditions where the application 
is not in a defined state, hence it should stop operating as 
soon as possible, should that also carry over to other 
applications, to the OS, and perhaps even to other systems in 
the whole cluster?


For example, if a function detected an inconsistency in a DB 
that is available to all applications (as is the case in the 
Unix model of user-based access protection), should all 
processes that use that DB stop operating as well?


2) What if an intermediate layer of code did in fact handle an 
Error (perhaps raised by a function pre-condition check)? 
Should the callers of that layer have a say on that? Should a 
higher level code be able to say that Error should not be 
handled at all?


For example, an application code may want to say that no 
library that it uses should handle Errors that are thrown by a 
security library.


Aside, and more related to D: I think this whole discussion is 
related to another issue that has been raised in this forum a 
number of times: Whose responsibility is it to execute function 
pre-conditions? I think it was agreed that pre-condition checks 
should be run in the context of the caller. So, not the 
library, but the application code, should require that they be 
executed. In other words, it should be irrelevant whether the 
library was built in release mode or not, its pre-condition 
checks should be available to the caller. (I think we need to 
fix this anyway.)


And there is the issue of the programmer making the right 
decision: One person's Exception may be another person's Error.


It's fascinating that there are so many fundamental questions 
with CPUs, runtimes, loaders, and OSes, and that some of these 
issues are not even semantically describable. For example, I 
think there is no way of requiring that e.g. a square root 
function not have side effects at all: The compiler can allow a 
piece of code but then the library that was actually linked 
with the application can do anything else that it wants.


Thoughts? Are we doomed? Surprisingly, not seems to be as we 
use computers everywhere and they seem to work. :o)


Ali

[1] http://millcomputing.com/


If you can recover from an error depends on the capabilities of 
the language and the guarantees it makes for errors.


If the language has no pointers and it gives you the guarantee, 
that no memory can be unintentionally overwritten in any other 
way, then you can recover from an error. Because you have the 
guarantee, that no memory corruption can happen.


If it's exactly specified, what happens when an error happens, 
you can decide if it's safe to continue. But for that you need to 
know exactly what the runtime does when this error is raised. If 
you aren't 100% sure what your state is, you shouldn't continue. 
(this matters more in life critical software, than in command 
line tools, but still...).


Or if you have a software stack like erlang, where you can just 
restart the failing process. In erlang it doesn't matter if it's 
an exception or an error. If a process fails, restart it and move 
on. This works, because processes are isolated and an error can't 
corrupt other processes.


So there are many approaches to this problem and all of them are 
a bit different. The final answer can only be, it depends on the 
language and the guarantees it makes. (And how much you trust the 
compiler to do the right thing 
[https://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf] :D)


Re: The extent of trust in errors and error handling

2017-02-06 Thread Chris Wright via Digitalmars-d
On Mon, 06 Feb 2017 18:12:38 +, Caspar Kielwein wrote:
> I absolutely agree with Walter and Ali, that there are applications
> where on Error anything but termination of the process is unacceptable.

Sure, and it looks like you spend a ton of effort to make things work 
properly and to make things debuggable because your application has these 
requirements.

The position that D's runtime can make this decision for me is grating. 
Without the same kind of tooling that you're talking about available and 
shipped with dmd, it's absurd.

> I have definitely seen asserts violated because of buffer overflows in
> completely unrelated modules. Not sharing state unnecessarily, while
> certainly being good engineering practice is not enough.

Violated asserts catch this kind of problem after the fact. @safe prevents 
you from writing code with the problem in the first place.


Re: The extent of trust in errors and error handling

2017-02-06 Thread Walter Bright via Digitalmars-d

On 2/6/2017 9:10 AM, Chris Wright wrote:

Assuming that crashing results
in less loss of money or lives than marching on.


Any application that must continue or lives are lost is a BADLY designed system 
and should not be tolerated.


http://www.drdobbs.com/architecture-and-design/assertions-in-production-code/228700788


Re: The extent of trust in errors and error handling

2017-02-06 Thread Ali Çehreli via Digitalmars-d

On 02/06/2017 09:25 AM, Chris Wright wrote:

> https://github.com/munificent/vigil is the programming language for you.

Brilliant! :)

Ali



Re: The extent of trust in errors and error handling

2017-02-06 Thread Caspar Kielwein via Digitalmars-d

On Monday, 6 February 2017 at 17:40:50 UTC, Chris Wright wrote:

It works for every other programming language I've encountered.

This issue is language agnostic. It works in D as well but at 
the same level of

correctness and unknowns.


I haven't heard anyone complaining about this elsewhere. Have 
you?


What I've heard instead is that it's a bug if state 
unintentionally leaks between calls and it's undesirable to 
have implicitly shared state. Not sharing state unnecessarily 
means you don't have to put forth a ton of effort trying to 
detect corrupted shared state in order to throw an Error to 
signal that your library is unsafe to use.


I absolutely agree with Walter and Ali, that there are 
applications where on Error anything but termination of the 
process is unacceptable. This really is independent of the 
language used.


My work is in sensors for automation of heavy mining equipment 
and the software I write is used by the automation systems of our 
customers.


When our system detects an internal error I cannot guarantee for 
any of its outputs. Erroneous outputs can easily cost millions of 
dollars in machine damage, or in the worst case even human lives. 
(Usually there are redundant systems to mitigate that risk)
Termination of our system is automatically detected by automation 
systems within the specified latencies and is generally 
considered to be annoying but acceptable. Nonsense outputs 
because of errors in our system are never acceptable!


We try to find the cause of errors by logging the raw data from 
our sensors and feeding them to a clone of the system which has 
more debugging and logging enabled. Yes we usually don't even get 
a stack trace from the original crash.


I have definitely seen asserts violated because of buffer 
overflows in completely unrelated modules. Not sharing state 
unnecessarily, while certainly being good engineering practice is 
not enough.


Re: The extent of trust in errors and error handling

2017-02-06 Thread Chris Wright via Digitalmars-d
On Sun, 05 Feb 2017 22:23:19 -0800, Ali Çehreli wrote:

> On 02/05/2017 10:08 PM, Chris Wright wrote:
>  > How do you recommend it leave behind enough data for me to
>  > investigate the next day when I see there was a problem?
> 
> The current approach is to rely on the backtrace produced when aborting.

Which I can't log, according to you, because I don't know for certain that 
the logger is not corrupted. Which is provided by the runtime, which I 
can't trust not to be in a corrupted state. Which forces me to have at 
least two different logging systems.

At past jobs, I've used an SMTP logging appender with log4net. Wrangling 
that with a stacktrace reported only via stderr would be fun.

>  > Catching an error, logging it, and trying to move on is the obvious
> thing.
> 
> That part I can't agree with. It is not necessarily true that moving on
> will work the way we wanted. The invoice prepared for the next customer
> may have incorrect amount in it.

I've done billing. We march on, process as many invoices as possible, and 
detect problems. If there are any problems, we report them to a human for 
review instead of just submitting to the payment processor.

Besides which, you are trusting every line of code you depend on to 
appropriately distinguish between something that could impact shared state 
and something that couldn't, and to check continuously for whether shared 
state is corrupted. I'm merely trusting it not to share more state than it 
needs to.

>  > It works for every other programming language I've encountered.
> 
> This issue is language agnostic. It works in D as well but at the same
> level of correctness and unknowns.

I haven't heard anyone complaining about this elsewhere. Have you?

What I've heard instead is that it's a bug if state unintentionally leaks 
between calls and it's undesirable to have implicitly shared state. Not 
sharing state unnecessarily means you don't have to put forth a ton of 
effort trying to detect corrupted shared state in order to throw an Error 
to signal that your library is unsafe to use.

> I heard about the Exception-Error
> distinction first in Java and I think there are other languages that
> recommend not catching Errors.

I've only been using Java professionally for seven years, so maybe that's 
before my time. The common practice today is to have `catch(Exception)` at 
a central location and to catch other exceptions as needed to make the 
compiler shut up. (Which we all hate but *has* caused me to be more 
careful about a number of things, so there's that.)


Re: The extent of trust in errors and error handling

2017-02-06 Thread Chris Wright via Digitalmars-d
On Mon, 06 Feb 2017 09:09:31 +, Dominikus Dittes Scherkl wrote:

> On Monday, 6 February 2017 at 06:08:22 UTC, Chris Wright wrote:
>> On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
>>> What I and many others who say Errors should not be caught are saying
>>> is, once the program is in an unexpected state, attempting to do
>>> anything further is wishful thinking.
>>
>> I've been thinking about this a bit more, and I'm curious: how do you
>> recommend that an application behave when an Error is thrown?
> It has lost its face and shall commit sucide.
> That's the japanese way, and it has its merits.
> Continuing to work and pretend nothing has happened (the european way)
> makes it just untrustworthy from the begining.

https://github.com/munificent/vigil is the programming language for you.


Re: The extent of trust in errors and error handling

2017-02-06 Thread Chris Wright via Digitalmars-d
On Sun, 05 Feb 2017 23:48:07 -0800, Walter Bright wrote:
> This discussion has come up repeatedly on this forum. Many people
> strongly disagree with me, and believe that they can recover from Errors
> and continue executing the program.
> 
> That's fine if the program's output is nothing one cares about, such as
> a game or a music player. If the program's failure could result in the
> loss of money, property, health or lives, it is unacceptable.

Assuming there is no intervening process whereby a human will investigate 
errors by hand after the program completes. Assuming that crashing results 
in less loss of money or lives than marching on.

In Google Compute Engine billing, it was *always* worse for us if our 
billing jobs failed than if they completed with reported errors. If the 
job failed, it was difficult to investigate. If it completed with errors, 
we could investigate in a straightforward way, and the errors being 
reported meant the data was held aside and not automatically sent to the 
payment processor.


Re: The extent of trust in errors and error handling

2017-02-06 Thread Jacob Carlborg via Digitalmars-d

On 2017-02-06 08:48, Walter Bright wrote:


For example, if I feed a D source file to a C compiler and the C compiler
crashes, the C compiler has a bug in it, which is an Error. If the C
compiler instead writes a message "Error: D source code found instead of
C source code, please upgrade to a D compiler" then that is an Exception.


Does DMC do that :) ?

--
/Jacob Carlborg


Re: The extent of trust in errors and error handling

2017-02-06 Thread Dominikus Dittes Scherkl via Digitalmars-d

On Monday, 6 February 2017 at 06:08:22 UTC, Chris Wright wrote:

On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
What I and many others who say Errors should not be caught are 
saying is, once the program is in an unexpected state, 
attempting to do anything further is wishful thinking.


I've been thinking about this a bit more, and I'm curious: how 
do you recommend that an application behave when an Error is 
thrown?

It has lost its face and shall commit sucide.
That's the japanese way, and it has its merits.
Continuing to work and pretend nothing has happened (the european 
way) makes it just untrustworthy from the begining.
May be this is better for humans (they are untrustworthy anyway 
until some validation has been run on them), but for programs I 
prefer the japanese way.


Re: The extent of trust in errors and error handling

2017-02-05 Thread Walter Bright via Digitalmars-d

On 2/1/2017 11:25 AM, Ali Çehreli wrote:

1) There is the well-known issue of whether Error should ever be caught. If
Error represents conditions where the application is not in a defined state,
hence it should stop operating as soon as possible, should that also carry over
to other applications, to the OS, and perhaps even to other systems in the whole
cluster?


If it is possible for an application to leave other applications or the OS in a 
corrupted state, yes, it should stop the OS as soon as possible. MS-DOS fell 
into this category, it was normal for a crashing program to scramble MS-DOS 
along with it. Attempting to continue running MS-DOS risked scrambling your hard 
disk as well (happened many times to me). I eventually learned to reboot every 
time an app failed unexpectedly. As soon as I could, I moved all development to 
protected mode operating systems, and would port to DOS only as the last step.




For example, if a function detected an inconsistency in a DB that is available
to all applications (as is the case in the Unix model of user-based access
protection), should all processes that use that DB stop operating as well?


A DB inconsistency is not a bug in the application, it is a problem with the 
input to the application. Therefore, it is not an Error, it is an Exception.


Simply put, an Error is a bug in the application. An Exception is a bug in the 
input to the application. The former is not recoverable, the latter is.




2) What if an intermediate layer of code did in fact handle an Error (perhaps
raised by a function pre-condition check)? Should the callers of that layer have
a say on that? Should a higher level code be able to say that Error should not
be handled at all?


If the layer has access to the memory space of the caller, an Error in the layer 
is an Error in the caller as well.




For example, an application code may want to say that no library that it uses
should handle Errors that are thrown by a security library.


Depends on what you mean by "handling" an Error. If you mean continue running 
the application, you're running a corrupted program. If you mean logging the 
Error and then terminating the application, that would be reasonable.




This discussion has come up repeatedly on this forum. Many people strongly 
disagree with me, and believe that they can recover from Errors and continue 
executing the program.


That's fine if the program's output is nothing one cares about, such as a game 
or a music player. If the program's failure could result in the loss of money, 
property, health or lives, it is unacceptable.


Much other confusion comes from not carefully distinguishing Errors from 
Exceptions.

Corollary: bad input that causes a program to crash is an Error because it is a 
programming bug to fail to vet the input for correctness. For example, if I feed 
a D source file to a C compiler and the C compiler crashes, the C compiler has a 
bug in it, which is an Error. If the C compiler instead writes a message 
"Error: D source code found instead of C source code, please upgrade to a D 
compiler" then that is an Exception.


Re: The extent of trust in errors and error handling

2017-02-05 Thread Ali Çehreli via Digitalmars-d

On 02/05/2017 10:08 PM, Chris Wright wrote:
> On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
>> What I and many others who say Errors should not be caught are saying
>> is, once the program is in an unexpected state, attempting to do
>> anything further is wishful thinking.
>
> I've been thinking about this a bit more, and I'm curious: how do you
> recommend that an application behave when an Error is thrown?

I don't have the answers. That's why I opened this thread. However, I 
think I know what common approaches are.


The current recommendation is that it aborts immediately before 
producing (more) incorrect results.


> How do you
> recommend it leave behind enough data for me to investigate the next day
> when I see there was a problem?

The current approach is to rely on the backtrace produced when aborting.

> How do you recommend I orchestrate things
> to minimize disruption to user activities?

That's is a hard question. If the program is interacting with the user, 
it certainly seems appropriate to communicate with them but perhaps a 
drastic abort is as good.


> Catching an error, logging it, and trying to move on is the obvious 
thing.


That part I can't agree with. It is not necessarily true that moving on 
will work the way we wanted. The invoice prepared for the next customer 
may have incorrect amount in it.


> It works for every other programming language I've encountered.

This issue is language agnostic. It works in D as well but at the same 
level of correctness and unknowns. I heard about the Exception-Error 
distinction first in Java and I think there are other languages that 
recommend not catching Errors.


> If you're telling me it's not good enough for D, you must have something
> better in mind. What is it?

This is an interesting issue to think about. As Profile Anaysis and you 
say, this is a practical matter. We have to accept the imperfections and 
move on.


> Or, alternatively, you know something about D that means that, when
> something goes wrong, it effectively kills the entire application -- 
in a

> way that doesn't happen when an Error isn't thrown, in a way that can't
> happen in other languages.

I don't think it's possible with conventional CPUs and OSes and again.

Ali



Re: The extent of trust in errors and error handling

2017-02-05 Thread Ali Çehreli via Digitalmars-d

On 02/05/2017 08:49 AM, Chris Wright wrote:
> On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
>> Doesn't change what I'm saying. :) For example, RangeError may be thrown
>> due to a rogue function writing over memory that it did not intend to.
>> An index 42 may have become 42000 and that the RangeError may have been
>> thrown. Fine. What if nearby data that logf depends on has also been
>> overwritten? logf will fail as well.
>
> I can't count on an error being thrown, so I may as well not run my
> program in the first place.

Interesting. That's an angle I hadn't considered.

> That's the only defense. It's only wishful
> thinking that my program's data hasn't already been corrupted by the GC
> and the runtime but in a way that doesn't cause an Error to be thrown.

Yeah, all bets are off when memory is shared by different actors as is 
the case for conventional CPUs.


Thanks everyone who contributed to this thread. I learned more. :)

Ali



Re: The extent of trust in errors and error handling

2017-02-05 Thread Chris Wright via Digitalmars-d
On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
> What I and many others who say Errors should not be caught are saying
> is, once the program is in an unexpected state, attempting to do
> anything further is wishful thinking.

I've been thinking about this a bit more, and I'm curious: how do you 
recommend that an application behave when an Error is thrown? How do you 
recommend it leave behind enough data for me to investigate the next day 
when I see there was a problem? How do you recommend I orchestrate things 
to minimize disruption to user activities?

Catching an error, logging it, and trying to move on is the obvious thing. 
It works for every other programming language I've encountered.

If you're telling me it's not good enough for D, you must have something 
better in mind. What is it?

Or, alternatively, you know something about D that means that, when 
something goes wrong, it effectively kills the entire application -- in a 
way that doesn't happen when an Error isn't thrown, in a way that can't 
happen in other languages.


Re: The extent of trust in errors and error handling

2017-02-05 Thread Ali Çehreli via Digitalmars-d

On 02/05/2017 07:17 AM, Cym13 wrote:

On Saturday, 4 February 2017 at 07:24:12 UTC, Ali Çehreli wrote:

[...]


A bit OT but I'm pretty sure you would be very interested in GOTO;
2016's conference by Kevlin Henney titled "The Error of Our Ways" which
discusses the fact that most catastrophic consequences of software come
from very simple errors : https://www.youtube.com/watch?v=IiGXq3yY70o


Thank you for that. I've always admired Kevlin Henney's writings and 
talks. He used to come to Silicon Valley at least once a year for SW 
conferences (the conferences are no more) and we would adjust our meetup 
schedules to have him as a speaker once a year.


Ali



Re: The extent of trust in errors and error handling

2017-02-05 Thread Chris Wright via Digitalmars-d
On Sat, 04 Feb 2017 23:48:48 -0800, Ali Çehreli wrote:
> Doesn't change what I'm saying. :) For example, RangeError may be thrown
> due to a rogue function writing over memory that it did not intend to.
> An index 42 may have become 42000 and that the RangeError may have been
> thrown. Fine. What if nearby data that logf depends on has also been
> overwritten? logf will fail as well.

I can't count on an error being thrown, so I may as well not run my 
program in the first place. That's the only defense. It's only wishful 
thinking that my program's data hasn't already been corrupted by the GC 
and the runtime but in a way that doesn't cause an Error to be thrown.


Re: The extent of trust in errors and error handling

2017-02-05 Thread Cym13 via Digitalmars-d

On Saturday, 4 February 2017 at 07:24:12 UTC, Ali Çehreli wrote:

[...]


A bit OT but I'm pretty sure you would be very interested in 
GOTO; 2016's conference by Kevlin Henney titled "The Error of Our 
Ways" which discusses the fact that most catastrophic 
consequences of software come from very simple errors : 
https://www.youtube.com/watch?v=IiGXq3yY70o


Re: The extent of trust in errors and error handling

2017-02-05 Thread Profile Anaysis via Digitalmars-d

On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
tl;dr - Seeking thoughts on trusting a system that allows 
"handling" errors.


One of my extra-curricular interests is the Mill CPU[1]. A 
recent discussion in that context reminded me of the 
Error-Exception distinction in languages like D.


1) There is the well-known issue of whether Error should ever 
be caught. If Error represents conditions where the application 
is not in a defined state, hence it should stop operating as 
soon as possible, should that also carry over to other 
applications, to the OS, and perhaps even to other systems in 
the whole cluster?




No, because your logic would then extend to all of the human 
race, to animals, etc. It is not practical and not necessary.


1. The ball must keep rolling. All of this stuff we do is fantasy 
anyways so if an error occurs in that lemmings game, it is just a 
game. It might take down every computer in the universe(if we 
went with the logic above) but it can't affect humans because 
they are distinct from computers(it might kill a few humans but 
that has always been acceptable to humans).


That is, it is not practical to take everything down because an 
error is not that serious and ultimately has limited affect.


That is, in the practical world, we are ok with some errors. This 
allows us not to worry to much. The more we would have to worry 
about such errors the more things would have to be shut down 
exactly because of the logic you have given. So, it is not a 
problem if "should we do x or not x" but how much of x is 
acceptable.


(The human race has decided that quite a bit of errors are ok. We 
can even have errors such as a medical device malfunctioning 
because some error like invalid array access kill people and it's 
ok(it's just money, and lawyers will be happy))


2. Not all errors will systematically propagate in to all other 
systems. e.g., two computers not connected to in any way. If one 
has an error, the other won't be affected so no reason to take 
that computer down too.


So, what matters, like anything else, is that we try to do the 
best we can. We don't have to pick an arbitrary point of when to 
stop because we actually don't know. What we do is use reason and 
experience to decide what is the most likely solution and see how 
much risk that has. If it has too much we back off, if not enough 
we back off.


There is an optimal point, more or less, because risk requires 
energy to manage(even for no risk).


Basically if you assume, like you seem to be doing, that a 
singular error creates an unstable state in the whole system at 
every point, then you are screwed from the get go if you do not 
any any unstable state at any cost. The only solution is to not 
have any errors at any point then. (which requires perfection, 
something humans gave up on trying to achieve a long time ago)



3. Things are not so cut and dry. Intelligence can be used to 
understand the problem. Not all errors are the simple. Some 
errors are catastrophic and need everything shut down and some 
don't. Knowing those error types is important. Hence, the more 
descriptive something is the better as it allows one create 
separation. Also, designing things to be robust is another way to 
mitigate the problems.


Programming is not much different than banking. You have a 
certain amount of risk in a certain portfolio(program), you hedge 
your bets(create a good robust design), and hope for the best. 
It's up to the individual to decide how much the hedging is 
required as it will require time/money to do it.


Example: Windows. Obviously windows was a design that didn't care 
too much about robustness. Just enough to get the job done was 
their motto. If someone dies because of some BSOD, it's not that 
big a deal... it will be hard to trace the cause, and if it can 
be done they have enough money to afford it. (similar to the ford 
fiasco 
https://en.wikibooks.org/wiki/Professionalism/The_Ford_Pinto_Gas_Tank_Controversy)









Re: The extent of trust in errors and error handling

2017-02-04 Thread Ali Çehreli via Digitalmars-d

On 02/04/2017 08:17 AM, Chris Wright wrote:
> On Fri, 03 Feb 2017 23:24:12 -0800, Ali Çehreli wrote:

> Again, this is for a restricted type of application that I happen to 
write

> rather often. And it's restricted to a subset of the application that
> shares very little state with the rest.

I agree that there are different kinds of applications that require 
different levels of correctness.


>> What operations can you safely assume that you can still perform? Can
>> you log? Are you sure? Even if you caught RangeError, are you sure that
>> arr.ptr is still sane? etc.
>
> You seem to be assuming that I'll write:
>
>   try {
> foo = foo[1..$];
>   } catch (RangeError e) {
> log(foo);
>   }
>
> I'm actually talking about:
>
>   try {
> results = process(documentName, document);
>   } catch (Throwable t) {
> logf("error while processing %s: %s", documentName, t);
>   }

Doesn't change what I'm saying. :) For example, RangeError may be thrown 
due to a rogue function writing over memory that it did not intend to. 
An index 42 may have become 42000 and that the RangeError may have been 
thrown. Fine. What if nearby data that logf depends on has also been 
overwritten? logf will fail as well.


What I and many others who say Errors should not be caught are saying 
is, once the program is in an unexpected state, attempting to do 
anything further is wishful thinking.


Again, in practice, it is likely that the program will log correctly but 
there is no guarantee that it will do so; it's merely "likely" and 
likely is far from "correct".


> where somewhere deep in `process` I get a RangeError.
>
>> Even if you caught RangeError, are you sure that
>> arr.ptr is still sane?
>
> Well, yes. Bounds checking happens before the slice gets assigned for
> obvious reasons. But I'm not going to touch the slice that produced the
> problem, so it's irrelevant anyway.

Agreed but the slice is just one part of the application's memory. We're 
not sure what happened to the rest of it.


Ali



Re: The extent of trust in errors and error handling

2017-02-04 Thread Chris Wright via Digitalmars-d
On Fri, 03 Feb 2017 23:24:12 -0800, Ali Çehreli wrote:
> In practice, both null pointer and range error can probably be dealt
> with and the program can move forward.
> 
> However, in theory you cannot be sure why that pointer is null or why
> that index is out of range. It's possible that something horrible
> happened many clock cycles ago and you're seeing the side effects of
> that thing now.

Again, this is for a restricted type of application that I happen to write 
rather often. And it's restricted to a subset of the application that 
shares very little state with the rest.

> What operations can you safely assume that you can still perform? Can
> you log? Are you sure? Even if you caught RangeError, are you sure that
> arr.ptr is still sane? etc.

You seem to be assuming that I'll write:

  try {
foo = foo[1..$];
  } catch (RangeError e) {
log(foo);
  }

I'm actually talking about:

  try {
results = process(documentName, document);
  } catch (Throwable t) {
logf("error while processing %s: %s", documentName, t);
  }

where somewhere deep in `process` I get a RangeError.

> Even if you caught RangeError, are you sure that
> arr.ptr is still sane?

Well, yes. Bounds checking happens before the slice gets assigned for 
obvious reasons. But I'm not going to touch the slice that produced the 
problem, so it's irrelevant anyway.


Re: The extent of trust in errors and error handling

2017-02-03 Thread Ali Çehreli via Digitalmars-d

On 02/01/2017 06:29 PM, Chris Wright wrote:
> On Wed, 01 Feb 2017 11:25:07 -0800, Ali Çehreli wrote:
>> 1) There is the well-known issue of whether Error should ever be caught.
>> If Error represents conditions where the application is not in a defined
>> state, hence it should stop operating as soon as possible, should that
>> also carry over to other applications, to the OS, and perhaps even to
>> other systems in the whole cluster?
>
> My programs tend to apply operations to a queue of data. It might be a
> queue over time, like incoming requests, or it might be a queue based on
> something else, like URLs that I extract from HTML documents.
>
> Anything that does not impact my ability to manipulate the queue can be
> safely caught and recovered from.
>
> Stack overflow? Be my guest.
>
> Null pointer? It's a bug, but it's probably specific to a small 
subset of

> queue items -- log it, put it in the dead letter queue, move on.
>
> RangeError? Again, a bug, but I can successfully process everything else.

In practice, both null pointer and range error can probably be dealt 
with and the program can move forward.


However, in theory you cannot be sure why that pointer is null or why 
that index is out of range. It's possible that something horrible 
happened many clock cycles ago and you're seeing the side effects of 
that thing now.


What operations can you safely assume that you can still perform? Can 
you log? Are you sure? Even if you caught RangeError, are you sure that 
arr.ptr is still sane? etc.


In theory, at least the way I understand it, a program lives on a very 
narrow path. Once it steps outside that well known path, all bets are 
off. Can a caught Error bring it back on the path or are we on an 
alternate path now.


>> 2) What if an intermediate layer of code did in fact handle an Error
>> (perhaps raised by a function pre-condition check)? Should the callers
>> of that layer have a say on that? Should a higher level code be able to
>> say that Error should not be handled at all?
>>
>> For example, an application code may want to say that no library that it
>> uses should handle Errors that are thrown by a security library.
>
> There's a bit of a wrinkle there. "Handling" an error might include
> catching it, adding some extra data, and then rethrowing.

Interestingly, attempting to add extra data can very well produce the 
opposite effect: Stack trace information that would potentially be 
available can indeed be corrupted while adding that extra data.


The interesting part is trust. Once there is an Error, what can you trust?

>> I think there is no way of
>> requiring that e.g. a square root function not have side effects at all:
>> The compiler can allow a piece of code but then the library that was
>> actually linked with the application can do anything else that it wants.
>
> You can write a compiler with its own object format and linker, which 
lets

> you verify these promises at link time.

Good idea. :) As Joakim reminded, the designers of Midori did that and more.

> As an aside on this topic, I might recommend looking at Vigil, the
> eternally morally vigilant programming language:
> https://github.com/munificent/vigil
>
> It has a rather effective way of dealing with errors that aren't
> explicitly handled.
>

Thank you, I will look at it next.

Ali



Re: The extent of trust in errors and error handling

2017-02-03 Thread Ali Çehreli via Digitalmars-d

On 02/01/2017 01:27 PM, Joakim wrote:
> On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
>> tl;dr - Seeking thoughts on trusting a system that allows "handling"
>> errors.
>>
>> [...]
>
> Have you seen this long post from last year, where Joe Duffy laid out
> what they did with Midori?
>
> http://joeduffyblog.com/2016/02/07/the-error-model/
>
> Some relevant stuff in there.

Thank you. Yes, very much related and very interesting!

Joe Duffy says "Midori [is] a system that "drew significant inspiration 
from KeyKOS and its successors EROS and Coyotos." I'm happy to see that 
KeyKOS is mentioned there as Norm Hardy, the main architect of KeyKOS, 
is someone who is involved in the Mill CPU and whom I have the privilege 
of knowing personally and seeing weekly. :)


Ali



Re: The extent of trust in errors and error handling

2017-02-02 Thread Joakim via Digitalmars-d
On Thursday, 2 February 2017 at 09:14:43 UTC, Paolo Invernizzi 
wrote:

On Wednesday, 1 February 2017 at 21:55:40 UTC, Dukc wrote:

On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli

Regarding that, I have trought that wouldn't it be better if 
it was bounds checking instead of debug vs release what 
determined if in contracts are called? If the contract had 
asserts, they would still be compiled out in release mode like 
all asserts are. But if it had enforce():s, their existence 
would obey the same logic as array bounds checks.


This would let users to implement custom bounds checked types. 
Fibers for example could be made @trusted, with no loss in 
performance for @system code in release mode.


The right move is to ship a compiled debug version of the 
library, if closed source, along with the release one.
I still don't understand why that's not the default also for 
Phobos and runtime


/Paolo


It is, for both official dmd downloads and ldc:

https://www.archlinux.org/packages/community/x86_64/liblphobos/

Some packages may leave it out, not sure why.


Re: The extent of trust in errors and error handling

2017-02-02 Thread Paolo Invernizzi via Digitalmars-d

On Wednesday, 1 February 2017 at 21:55:40 UTC, Dukc wrote:

On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli

Regarding that, I have trought that wouldn't it be better if it 
was bounds checking instead of debug vs release what determined 
if in contracts are called? If the contract had asserts, they 
would still be compiled out in release mode like all asserts 
are. But if it had enforce():s, their existence would obey the 
same logic as array bounds checks.


This would let users to implement custom bounds checked types. 
Fibers for example could be made @trusted, with no loss in 
performance for @system code in release mode.


The right move is to ship a compiled debug version of the 
library, if closed source, along with the release one.
I still don't understand why that's not the default also for 
Phobos and runtime


/Paolo


Re: The extent of trust in errors and error handling

2017-02-01 Thread Chris Wright via Digitalmars-d
On Wed, 01 Feb 2017 11:25:07 -0800, Ali Çehreli wrote:
> 1) There is the well-known issue of whether Error should ever be caught.
> If Error represents conditions where the application is not in a defined
> state, hence it should stop operating as soon as possible, should that
> also carry over to other applications, to the OS, and perhaps even to
> other systems in the whole cluster?

My programs tend to apply operations to a queue of data. It might be a 
queue over time, like incoming requests, or it might be a queue based on 
something else, like URLs that I extract from HTML documents.

Anything that does not impact my ability to manipulate the queue can be 
safely caught and recovered from.

Stack overflow? Be my guest.

Null pointer? It's a bug, but it's probably specific to a small subset of 
queue items -- log it, put it in the dead letter queue, move on.

RangeError? Again, a bug, but I can successfully process everything else.

Out of memory? This is getting a bit dangerous -- if I dequeue another 
item after OOM, I might be able to process it, and it might work (for 
instance, maybe you tried to download a 40GB HTML, but the next document 
is reasonably small). But it's not necessarily that easy to fix, and it 
might compromise my ability to manipulate the queue.

Assertions? That obviously isn't a good situation, but it's likely to 
apply only to a subset of the data.

This requires me to have two flavors of error handling: one regarding 
queue operations and one regarding the function I'm applying to the queue.

> For example, if a function detected an inconsistency in a DB that is
> available to all applications (as is the case in the Unix model of
> user-based access protection), should all processes that use that DB
> stop operating as well?

As stated, that implies each application tags itself with whether it 
accesses that database. Then, when the database is known to be 
inconsistent, we immediately shut down every application that's tagged as 
uing that database -- and presumably prevent other applications with the 
tag from starting.

It seems much more friendly not to punish applications when they're not 
trying to use the affected resource. Maybe init read a few configuration 
flags from the database on startup and it doesn't have to touch it ever 
again. Maybe a human will resolve the problem before this application 
makes its once-per-day query.

> 2) What if an intermediate layer of code did in fact handle an Error
> (perhaps raised by a function pre-condition check)? Should the callers
> of that layer have a say on that? Should a higher level code be able to
> say that Error should not be handled at all?
> 
> For example, an application code may want to say that no library that it
> uses should handle Errors that are thrown by a security library.

There's a bit of a wrinkle there. "Handling" an error might include 
catching it, adding some extra data, and then rethrowing.

> I think there is no way of
> requiring that e.g. a square root function not have side effects at all:
> The compiler can allow a piece of code but then the library that was
> actually linked with the application can do anything else that it wants.

You can write a compiler with its own object format and linker, which lets 
you verify these promises at link time.

As an aside on this topic, I might recommend looking at Vigil, the 
eternally morally vigilant programming language:
https://github.com/munificent/vigil

It has a rather effective way of dealing with errors that aren't 
explicitly handled.


Re: The extent of trust in errors and error handling

2017-02-01 Thread Dukc via Digitalmars-d

On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
Aside, and more related to D: I think this whole discussion is 
related to another issue that has been raised in this forum a 
number of times: Whose responsibility is it to execute function 
pre-conditions?


Regarding that, I have trought that wouldn't it be better if it 
was bounds checking instead of debug vs release what determined 
if in contracts are called? If the contract had asserts, they 
would still be compiled out in release mode like all asserts are. 
But if it had enforce():s, their existence would obey the same 
logic as array bounds checks.


This would let users to implement custom bounds checked types. 
Fibers for example could be made @trusted, with no loss in 
performance for @system code in release mode.


Re: The extent of trust in errors and error handling

2017-02-01 Thread Joakim via Digitalmars-d

On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
tl;dr - Seeking thoughts on trusting a system that allows 
"handling" errors.


[...]


Have you seen this long post from last year, where Joe Duffy laid 
out what they did with Midori?


http://joeduffyblog.com/2016/02/07/the-error-model/

Some relevant stuff in there.