Re: SSL_peek vs. SSL_pending...

Steffen DETTMER Mon, 03 Sep 2007 13:06:39 -0700

* David Schwartz wrote on Thu, Aug 30, 2007 at 13:44 -0700:
> > If the first byte (or any part of the buffer) could be
> > written instantly or (e.g. if no select returned ready before
> > :)) after some amount of time waited, write should return to
> > give the calling application the control.
> 
> I can think of no situation where you'd want to wait forever
> for the first byte to be sent but only for a certain amount of
> time for the second byte to be sent. That's one of the
> strangest suggestions I've ever heard.


I cannot imagine any situation where to wait forever, right, but
for some small command line tools (interruptable by ^C or so)
sometimes it makes sense to program in such a way. Some small
helper tool may wait for a request to arive (forever or ^C,
whatever happens first :)), but `during' communication, i.e.
after the first write worked, some different error handling after
some reasonable timeout is needed (however, this may not match
exactly here, because there is is on top of some protocol on top
of a serial link).

Maybe infinite timeouts or blocking I/O makes sense only in
more or less interactive things.

> > I looked up write(2) on opengroup.org and found a page that
> > surprised me :)
> >
> > The information on opengroup.org tell `The write() function
> > shall attempt to write nbyte bytes from the buffer...'. My
> > man page tell `write  writes  up to count bytes to the
> > file...'. My man page claims to be conforming to `SVr4, SVID,
> > POSIX, X/OPEN, 4.3BSD'. The opengroup.org page distinguishes
> > write semantics based on what the fd is kind of (file, FIFO,
> > STREAM, ...) which IMHO cannot be correct because it destroys
> > the abstraction.
> 
> You don't really have a choice. If they published only the
> semantics for 'write' that applied to every possible thing you
> could ever want to write to, they wouldn't be enough to allow
> you to write sane programs to deal with sockets, files, or
> anything for that matter.

mmm... this would be a pitty... I expect that write does
the same `something' reasonable on a socket as on a serial line.
I mean, I don't want a WaitForMultipleObject in case the object
accidently is something selectable :) but of course there are
always specifics.

> For example, suppose they only documented the semantics for
> 'select' that applied to everything you could ever 'select' on.
> That would mean they couldn't tell you that a listening TCP
> socket that had a new connection would be marked ready for
> reading, because that applies only to sockets. So then how
> would you know how to use 'select' for that?

yeah, but using select before accept has a little taste of a
`workaround' in absence of another call, hasn't it? In this case
probably someone uses select just because its description is most
close to what is desired; according to my man page technically
select should not be influenced by accept situations but would be
a pitty. So yes, how would I know how to use 'select' for that?
now that you rose it, nice question.

> /dev/urandom isn't a file, it's a device.

For me, it is a file. Wikipedia mentiones /dev/null as file
(clarifying that names are something associated with the file
 itself)

> > When the file is /dev/urandom (a random number generator
> > device at least on linux), it shall NOT return the data
> > previously written. For devices (and sockets :)), I think
> > this is obvious.  anyway.
> 
> Again, /dev/urandom is not a file. If it helps, where you see
> the word "file" replace the definition of "file" from the
> standard. (Or the words "regular file".)

Would be horrible I think; such a limitation would reduce the
flexiblity a lot. I think device files and many other
files (including sockets) are a great idea. Its something where
you can read from and/or write to, why should it matter if it is
on a local hard disk.

> > > > correct implementation needs to guarantee that. If queue
> > > > discarding is possible, a flag must be stored (or so) to make
> > > > read return EAGAIN or whatever (probably causing most code to
> > > > break anyway, because noone expects EAGAIN after select > 0 :-)).
> > >
> > > That's simply impossible to do. The problem is that there is no
> > > unambiguous way to figure out whether an operation is the one
> > > that's not supposed to block. Consider:
> > >
> > > 1) A thread calls 'select'.
> > >
> > > 2) That thread later calls 'read'.
> > >
> > > If the 'select' changes the semantics of the 'read', then if the thread
> > > didn't know that some other code called 'select' earlier, the later
> > > read-calling code can break.
> 
> > If some other thread called read (or another function), of course
> > before the next read a select must be called again. Only for the
> > next read guarantees may be made, not for the 103th read called
> > after the second reboot :)
> 
> No, you missed my entire point. Please read it again. I was
> talking about *THAT* read, not another read after that.

sorry, seems I'm unable to get it (I read it several times :)). I
think the select could (if needed) store some flag (associated
with some fd) to remember that it returned that read must not
block by guarantee. Maybe some list including all fds where
select returned this. Any OS function (or, if possible, any OS
function that may influence this fd) resets the flag (no
guarantee anymore). But if read is called and would block because
of some changed situation it could decide to return right before
resetting the flag, maybe setting errno to EAGAIN. So I think the
guarantee itself could be given (not claiming that this would be
a good idea).

> > I think, the API must behave in the same way independently
> > whether used by multiple threads or a single one, if
> > possible.  Of course, care must be taken when e.g. two
> > threads call select on the same file descriptor and so on.
> > Many ways to get it very complex, hum...
> 
> Exactly, and because of that, it's impossible for 'select' to
> change the semantics of a following 'read' or 'write'. There is
> no way to know whether an application considers a particular
> write to be a "following" operation, and it may be relying on
> the current semantics.

Ahh, yes, interesting point! The concurrency may look different,
yeah. But it could be clarified, for instance, if
(semaphore-controlled / mutex protected) no read is done at the
(potentially) same time as select and vice versa. Don't know if
this would help much because it would mean that select and
reading cannot be done concurrently so someone may wonder why to
use threads. But could make sense if defined per fd.

> > > There are known implementations that perform some data
> > > integrity checks at 'recv' time, so a 'select' hit that results
> > > in the data being dropped later can lead to 'recvmsg' blocking.
> 
> > I think, select simply cannot work on the low layer that may have
> > data that may not be valid because of pending integrity checks.
> > select must work on the same buffer as read (which gets
> > integrity checked data only).
> 
> In other words, 'select' must predict the future. Sorry, that's
> not possible. There is no way for 'select' to know what
> integrity checks will be performed at read time.

Why predict future? If data was put to the read buffer (whether
verified or not), select and read won't block. If data is in the
buffer and by contract can be only removed by read (or close
maybe, doesn't matter), read won't block.
Wouldn't this work? I mean, at least theoretically?

> Consider the following:
> 
> 1) An application disables UDP cheksums.
> 
> 2) An application calls 'select' and gets a 'read' hit on a
> packet with a bad checksum.

So this would mean the bad checksum would not be
detected/evaluated and the data would be stored to the buffer,
right?

> 3) An application peforms a socket option call asking for
> checksum checking to be enabled.

ok, so from now new arriving data would not stored anymore to the
input buffer unless checksum was proven (not applied retroactive
of course - wouldn't be possible, because the checksums where not
even stored).

> 4) An application calls 'recvmsg'.
> 
> Should it get the packet with the bad checksum? In other words,
> are you really sure you want 'select' to *change* the semantics
> of the socket?

It gets the data arrived in the packet where the checksum was not
evaluated at all, because this was configured, yes, that would be
what I expect. select should not influence the mechanism at all
(checksum verification).

I'm sure you gave this example because my intuitive
interpretation is wrong, right? So where is the mistake?

> > I think, read and accept in conjunction with select simply have
> > different semantics. Is that right?
> 
> They have different semantics for those devices that specify
> different semantics. For 'select' overall, they have precisely
> the same semantics. One checks for device readability one
> checks for device writability and what that means depends on
> the device.

yes, of course, the device may not even know about accept at all.
But I mean, for read non-blocking is guaranteed according to some
understandings but for accept noone claims this (but expects it
because the working examples show it :)).

> > setsockopt gets a filedescriptor (socket) as parameter, so it
> > counts as other call. Maybe there are problematic calls, right,
> > or sockets/files shared through fork(), where processes may
> > influence each other, so that it might look for one process that
> > behavior would be strange, but maybe in fact it is as expected
> > just confusing because of the other processes actions.
> 
> > I think best is to avoid such constructions, seems to be very
> > difficult to handle...
> 
> I bring them up to address the fundamental point -- you don't
> want 'select' to change the semantics of future socket
> operations. As a result, you can't ask for 'select' to make
> future guarantees. You really can't have one without the other.

Ahh, this one I understand! My `flag setting' example surely
would change the semantics of future socket operation. But in a
case the application cannot distinguish (because it has no way to
decide whether this was changed or original behavior). 

So you say that it is not only that select does not give this
non-block guarantee, but also that it is not desired to get such
a guarantee because of the price of changed semantics of future
socket operations, right?

What would be bad to change the semantics of future socket
operation, such as guaranteeing read won't block?

> > But select on STDIN usually works (on linux) - is this linux
> > specific? I would consider it almost useless, if select couldn't
> > be called on STDIN because STDIN could be a `fast file system'!
> > Does this also mean that select wouldn't work as I expect it when
> > reading a e.g. ext2 filesystem from a slow media, let's say NFS
> > loop back, USB Stick or floppy disk? I assumed it would work, but
> > now as you pointed to it I find no statement about in the man
> > pages... :(
> 
> In general, 'select' is useless on ordinary files. 

why that? hard disks are slow, just a few hunderds MB per second,
shared among the processes.

Ohh, actually here I assume that select does not only guarantee
that a read will not block, but also that a read will return
quickly. Don't know what the exact definition of `blocking' is.
In what I understand a 30 sec or more (undeterminded/unknown in
advance) call would be blocking.

> How would it work with an NFS file? 

for read, it would return ready for read as soon as some data
arrived in some local buffer I assume? Of course this requires
the usage of such intermediate buffers (which is not required by
the read API which is unlimited, so a performant implementation
would do things differently).

> If you 'select' on a file for readability, what are you waiting
> for? What's going to change?  

the block devices data read into a kernel buffer? could take
quite time, for instance, on IDE CD-ROMs a read error may be
reported after 30 or even 120 seconds. Would be a mess if a GUI
would block such long. I mean, it would be like explorer.exe when
inserting a CD media: stalling. So in this case, the GUI could
select with some short timeout to be able to update the GUI in
between - or so.

> Since no operation ever blocks on a regular file, what would
> "readiness" mean in that context?

mmm... in the same way as for sockets of course: your data is
ready to be retrieved right now, instantly so to say.

> > So I would conclude that it is a status-reporting function but
> > also could guarantee. What do I miss?
> 
> That it cannot guarantee. If something after the 'select'
> returns success causes the condition to change, the guarantee
> can only be sustained by an unamibiguous way to identify the
> "subsequent" operation, and as I've already explained, that's
> impossible.

mmm... a file (descriptor) is process-local. Let's require
single-threading. Now we could say: a subsequent operation is the
one that starts (is called) after the first operation has
finished (returned). Of course this would mean that calling
getpid() would expire the guarantee of select (so in practice
this simple approach would make no sense).
Wouldn't this work?

> > At least the discussion IMHO shows that specs are not clear
> > and using APIs correctly is a challenge, because of those
> > doubts the best practice is to explicitely use non-blocking
> > fds and that the best documentation is no replacement for
> > deep testing :-)
> 
> I agree with that. Don't assume you don't have a guarantee that
> isn't explicit in the standard and that you don't even need.
> Especially when precisely that has caused code to break in the
> past.

yeah, I can imagine... well, now I'll try to find out why close
sometimes blocks on my serial line (maybe kind of related, at
least it is causing some kind of defect in some system...).

I still failed to understand all the aspects (unfortunality
including the core aspect), I'll read all it again tomorrow, but
thanks a lot for your patienceful explanations!

oki,

Steffen
 
About Ingenico Throughout the world businesses rely on Ingenico for secure and 
expedient electronic transaction acceptance. Ingenico products leverage proven 
technology, established standards and unparalleled ergonomics to provide 
optimal reliability, versatility and usability. This comprehensive range of 
products is complemented by a global array of services and partnerships, 
enabling businesses in a number of vertical sectors to accept transactions 
anywhere their business takes them.
www.ingenico.com This message may contain confidential and/or privileged 
information. If you are not the addressee or authorized to receive this for the 
addressee, you must not use, copy, disclose or take any action based on this 
message or any information herein. If you have received this message in error, 
please advise the sender immediately by reply e-mail and delete this message. 
Thank you for your cooperation.
 
About Ingenico Throughout the world businesses rely on Ingenico for secure and 
expedient electronic transaction acceptance. Ingenico products leverage proven 
technology, established standards and unparalleled ergonomics to provide 
optimal reliability, versatility and usability. This comprehensive range of 
products is complemented by a global array of services and partnerships, 
enabling businesses in a number of vertical sectors to accept transactions 
anywhere their business takes them.
www.ingenico.com This message may contain confidential and/or privileged 
information. If you are not the addressee or authorized to receive this for the 
addressee, you must not use, copy, disclose or take any action based on this 
message or any information herein. If you have received this message in error, 
please advise the sender immediately by reply e-mail and delete this message. 
Thank you for your cooperation.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    openssl-users@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Re: SSL_peek vs. SSL_pending...

Reply via email to