On Fri, 2012-09-07 at 08:15 -0600, Alex Rousskov wrote: > On 09/07/2012 02:32 AM, Alexander Komyagin wrote: > > OK. I agree. It sounds rather reasonable to avoid excess code complexity > > and CPU consuming in order to gain performance for the common case. > > I am very glad that we are in agreement here. Will you work on a patch > to fix ConnOpener?
Sure. I will make it on the next week. > > > > However, as I stated earlier, the comm.cc problem (actually semantics > > problem) persists. I think it should be documented that second and > > subsequent calls to comm_connect_addr() do not guarantee connection > > establishment unless there was a correct select() notification. > > Agreed: We should document what comm_connect_addr() does. However, does > it provide any guarantees even _after_ select() notification? According > to Stevens, the current code may still report that there is no problem > when in fact there is one (and we will detect it later during I/O), right? After select() notification there are two scenarios possible: 1) connection was successfully established (we got EPOLLOUT there). Then getsockopt() would correctly report success; 2) there was a problem (we got EPOLLHUP or EPOLLERR). In this case getsockopt() shall report an error (i.e. appropriate errno). So I think we can rely on comm_connect_addr(), but only _after_ the notification. > > > Thank you, > > Alex. > > > > On Thu, 2012-09-06 at 11:21 -0600, Alex Rousskov wrote: > >> On 09/06/2012 02:35 AM, Alexander Komyagin wrote: > >>> On Wed, 2012-09-05 at 09:59 -0600, Alex Rousskov wrote: > >>>> On 09/05/2012 09:27 AM, Alexander Komyagin wrote: > >>>> > >>>>> So you think that it's ok for comm_coonect_addr() to return COMM_OK if > >>>>> it was called before the appropriate select() notification. Am I right? > >>>> > >>>> Hard to say for sure since comm_coonect_addr() lacks an API description, > >>>> and there are at least three similar but different ways the function is > >>>> being used by Squid. > >>>> > >>>> One natural way to define this API is to say that it should return > >>>> COMM_OK if and only if the socket is connected (making any select > >>>> notifications irrelevant). However, this definition may be too > >>>> CPU-expensive and/or too unportable to support. And this level of > >>>> certainty may not actually be needed for current Squid needs! > >>> > >>> Maybe you're right, Alex. However, I would prefer this function to have > >>> the very strict semantics: it should return COMM_OK if and only if the > >>> socket is connected (just like you said). Because this way any > >>> upper-level code can rely on it. > >> > >> That would be my preference as well, but I would like to see the costs > >> of doing that first. If strict semantics is not currently needed and is > >> more CPU/portability-expensive than the current code, then we should not > >> perfect the code (beyond documentation). But again and again, I > >> recommend resolving the higher-level (ConnOpener) issue before > >> discussing any of this low-level stuff. > >> > >> > >>>> I would not be surprised if there is some gray area where we cannot > >>>> really tell for sure (without too much additional overhead or > >>>> portability risk) whether the async socket is connected. The Stevens > >>>> book seem to imply that much. Inside that gray area, the function > >>>> should probably return COMM_OK so that the rest of the code works: If we > >>>> guessed wrong, we will get failures during I/O, but the code has to deal > >>>> with those anyway. > >>> > >>> I guess those I/O failures would cost us CPU cycles > >> > >> I/O failures are rare exceptions. We need to optimize the common case or > >> at least not make it worse. The common case does not deal with timeouts > >> and errors. > >> > >> > >>> and make connection problems very-very-very hard to debug. > >> > >> Would not the error be the same, regardless of whether it is discovered > >> during "connect" time or during "write" time? > >> > >> > >>> Also all connection timeouts > >>> become useless if we can't tell for sure whether the socket is really > >>> connected. > >> > >> Why would they become useless? If Squid fails to connect (from Squid > >> point of view!) in X seconds, it should treat this as a timeout and > >> proceed accordingly. There will be vary rare events where Squid point of > >> view would differ from OS point of view, but I do not see (a) why we > >> should care about those very rare events and (b) how we can avoid them > >> completely without implementing Squid inside the kernel. > >> > >> > >> > >>>> > >>>> In other words, if you want to work on this, consider defining the API > >>>> based on current Squid needs and then make sure we support those minimum > >>>> requirements, keeping overheads and portability in mind. However, I > >>>> would _start_ by fixing the calling code first (as it may affect the > >>>> minimum requirements) -- your ConnOpener patch was a step in that > >>>> direction. > >>> > >>> Amos wrote that calling code is actually OK. > >> > >> My interpretation is that Amos had explained what the code is doing. I > >> did not hear Amos being convinced that the code does what it should. And > >> I think that the code does not do what it should. > >> > >> > >> Cheers, > >> > >> Alex. > >> > > > -- Best wishes, Alexander Komyagin
