On 2014-03-31 16:38, Dirk Hohndel wrote:
On Mon, 2014-03-31 at 10:36 +0200, Jef Driesen wrote:
To be honest, I don't feel comfortable removing the error checking for
the syscalls, and not just because omitting error checking is usually
the wrong thing to do.
But instead of making the error fatal, how about you just log it and
try
to keep going, anyway?
Think of it this way: "what's the worst that could happen?"
a) the error is real, communication will fail (same result as today)
b) the error is bogus, communication will succeed (winning!)
I get your point, but please read on...
First of all, this only seems to be a problem for pty's and not the
real
thing. The use of pty's is basically a hack to make the simulator
work.
For that purpose I'm happy with the --enable-pty option. Only a
libdivecomputer developer will ever need it.
That assumes there are no issues with the libraries used or the kernel
used and the specific serial device.
True, but I think that's a reasonable assumption. If we can no longer
rely on the kernel or device drivers to get things right, then I think
we're in far more trouble. Of course there can and will be bugs (as we
just discovered the hard way), but those are supposed to get fixed,
right?
Linus' mail to the fedora/redhat developers made that pretty clear I
think.
But what worries me a lot more is that trying to communicate with a
real
device using the wrong settings may actually harm the device. This has
already happened in the past with the Mares Icon HD and Matrix. Both
use
the same protocol, but trying to talk to a Matrix with the Icon HD
baudrate will cause the device to lock up completely. Luckily there
was
no permanent damage and the device could be revived by removing the
batteries. But that's pretty scary and something I want to avoid at
all
cost. This Mares problem is now gone, but that doesn't mean it can't
happen for others.
But that wasn't a problem of RTS/DTR or other serial parameters. That
was a USB communication issue. And IIRC this wouldn't have been caught
by an error return, either.
Yes and no.
The scenario where this particular problem happened, was that a user
accidentally selected the Icon HD protocol variant to connect to a
Matrix dive computer. That happens to be the default choice when using
the libdivecomputer universal application with just the -b iconhd
option. To select the Matrix variant you need the extra -m 0x0F option
(or use the -n "Mares Matrix" option).
The key difference between the two protocol variants was the different
baudrate. You are right that trying to set the wrong baudrate did not
cause the tcsetattr call to fail. That's because the usb-serial
driver/chip happens to support that baudrate too. But, assume for a
moment we ignore tcsetattr errors, and try to proceed. How do we know
the baudrate was changed correctly? We simply don't know. It could have
failed before it even tried to change the baudrate. The only case where
we do know, is when it returns without errors. The result is that trying
to proceeding anyway after an error, may put the device at risk!
Note that this Mares issue is gone now, because we are no longer using
different baudrates, but that's irrelevant for this discussion.
So I'm not convinced ignoring the return value of a syscall, and
hoping
for the best, is the right solution here. For pty's I'm fine with
that,
but not for real serial ports.
It's your library... I think it makes much more sense to try to keep
going and fail if the responses of actual communication make no sense,
not if something odd happens during the setup of the communication.
I'm aware that in practice this risk will probably be very small. But if
I have to choose between keeping the strict error checking, and removing
it as a workaround for a bug in one particular linux distribution (and
which only affects developers but not end users!), then that's an easy
choice for me.
For as long I have been working on the libdivecomputer project, I've
always been very cautious and never caused any damage to any dive
computer, and I would like to continue that trend. I can assure you that
the incident with the Mares Matrix was pretty scary, and I never want to
go through that again. If that causes some minor inconvenience here and
there, then that's a price I'm willing to pay. I hope you'll understand
that.
Note that I absolutely don't mind enabling the pty "hack" on the
tcsetattr call too. I think that should already be sufficient to work
around this glibc bug in fedora/redhat.
Jef
_______________________________________________
subsurface mailing list
[email protected]
http://lists.hohndel.org/cgi-bin/mailman/listinfo/subsurface