Re: [Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".
Hiya Jeremy, > So your problem is the debug statement being triggered repeatedly ? Yup. > Adding a sleep is (IMHO) the wrong thing to do. It has the advantage of pretty much guaranteeing the problem will "go away"; it has the disadvantage of blocking the thread/process. However it is what the Samba4 client code does (so a similar change to the Samba3 would be consistent; of course, so would a different change to both codebases ...). > Once the accept() has failed the 'POLLIN' event should not be triggered repeatedly on the polled socket. > Your truss trace doesn't show enough. Does a subsequent pollsys() keep returning fd=35 ev=POLLIN|POLLHUP rev=POLLIN after the: > > accept(35, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) Err#130 ECONNABORTED Now that's a very interesting question ... OK, a quick dig around later and we get (abridged): pollsys(0x080849F0, 8, 0xFEFFDF58, 0x) = 1 fd=36 ev=POLLIN|POLLHUP rev=0 fd=35 ev=POLLIN|POLLHUP rev=0 fd=34 ev=POLLIN|POLLHUP rev=0 fd=31 ev=POLLIN|POLLHUP rev=0 fd=33 ev=POLLIN|POLLHUP rev=0 fd=32 ev=POLLIN|POLLHUP rev=POLLIN fd=6 ev=POLLIN|POLLHUP rev=0 fd=30 ev=POLLIN|POLLHUP rev=0 timeout: 32.54700 sec accept(32, 0xFEFFDE0C, 0xFEFFDDF8, SOV_DEFAULT) Err#130 ECONNABORTED ... write(8, " a c c e p t : S o".., 43) = 43 pollsys(0x080849F0, 8, 0xFEFFDF58, 0x) = 1 fd=36 ev=POLLIN|POLLHUP rev=0 fd=35 ev=POLLIN|POLLHUP rev=0 fd=34 ev=POLLIN|POLLHUP rev=0 fd=31 ev=POLLIN|POLLHUP rev=POLLIN fd=33 ev=POLLIN|POLLHUP rev=0 fd=6 ev=POLLIN|POLLHUP rev=0 fd=30 ev=POLLIN|POLLHUP rev=0 fd=32 ev=POLLIN|POLLHUP rev=0 timeout: 32.54600 sec accept(31, 0xFEFFDE0C, 0xFEFFDDF8, SOV_DEFAULT) = 38 AF_INET name = X.X.X.X port = 55935 forkx(0)= 10502 ... pollsys(0x080849F0, 8, 0xFEFFDF58, 0x) = 1 fd=36 ev=POLLIN|POLLHUP rev=0 fd=35 ev=POLLIN|POLLHUP rev=0 fd=34 ev=POLLIN|POLLHUP rev=0 fd=33 ev=POLLIN|POLLHUP rev=0 fd=32 ev=POLLIN|POLLHUP rev=POLLIN fd=31 ev=POLLIN|POLLHUP rev=0 fd=6 ev=POLLIN|POLLHUP rev=0 fd=30 ev=POLLIN|POLLHUP rev=0 timeout: 31.03400 sec accept(32, 0xFEFFDE0C, 0xFEFFDDF8, SOV_DEFAULT) Err#130 ECONNABORTED ... write(8, " a c c e p t : S o".., 43) = 43 Received signal #18, SIGCLD, in pollsys() [caught] siginfo: SIGCLD CLD_EXITED pid=10504 status=0x pollsys(0x080849F0, 8, 0xFEFFDF58, 0x) Err#4 EINTR fd=36 ev=POLLIN|POLLHUP rev=0 fd=35 ev=POLLIN|POLLHUP rev=0 fd=34 ev=POLLIN|POLLHUP rev=0 fd=33 ev=POLLIN|POLLHUP rev=0 fd=31 ev=POLLIN|POLLHUP rev=0 fd=6 ev=POLLIN|POLLHUP rev=0 fd=30 ev=POLLIN|POLLHUP rev=0 fd=32 ev=POLLIN|POLLHUP rev=0 timeout: 31.03200 sec So that would be a no - next "poll()" and there's no revent flagged on that same socket. Which would confirm your thought that "sleep()" is perhaps not the way to go. However I don't know the Samba code (at all!) nearly well enough to comment - that "sleep()" may be serving some other vital purpose under different circumstances? Either way, it would appear that my second suggestion would still be valid - only log this (and possibly a couple of other error conditions) when more debugging is enabled? Another passing thought ... That "truss" only captured 2 ECONNABORTED incidents - typical that nothing much happens when you're specifically looking at it. However, is it likely to be a coincidence that both were on the same socket? FD#32 happens to be bound to port 445 on one specific interface of the machine; tomorrow I might try a more extended test and poke lots of traffic at that interface (and/or might stick the socket descriptor number into the debug message) - if anything interesting presents itself (E.g., it's always the same port, or interface, ... where the problem occurs) I'll post an update saying so. Probably doesn't affect the solution, but possibly technically interesting anyway ... Many thanks, and regards, Tris. -Original Message- From: Jeremy Allison [mailto:j...@samba.org] Sent: 29 August 2013 20:52 To: Tris Mabbs Cc: 'Andrew Bartlett'; samba@lists.samba.org; samba-techni...@samba.org Subject: Re: [Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort". On Thu, Aug 29, 2013 at 10:10:38AM +0100, Tris Mabbs wrote: > Hiya Andrew, > > Many thanks for the typically helpful and comprehensive reply :-) > > > I think that's probably the right track :-) > > > > The code here is triggered when poll() in
Re: [Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".
On Thu, Aug 29, 2013 at 10:10:38AM +0100, Tris Mabbs wrote: > Hiya Andrew, > > Many thanks for the typically helpful and comprehensive reply :-) > > > I think that's probably the right track :-) > > > > The code here is triggered when poll() indicates that the socket is > readable. > > This socket should only be readable when a new connection is being made, > and accept() should succeed. > > ... > > So, my only conclusion is that your box momentarily does not have the > resources to accept the connection, > > and because there isn't the sleep() in the source3 code, it prints this in > a loop until the resources become available. > > Absolutely, and on any normal Unix implementation I'd agree entirely. That > sort of "poll()"/"accept()"/... code is perfectly normal and exactly what > you'd expect - I've written plenty of very similar code myself over the > years ... > However this is "Solaris" :-( > > Caught in the act: > > ... > 16327: pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x) = 1 > 16327: fd=39 ev=POLLIN|POLLHUP rev=0 > 16327: fd=38 ev=POLLIN|POLLHUP rev=0 > 16327: fd=34 ev=POLLIN|POLLHUP rev=0 > 16327: fd=36 ev=POLLIN|POLLHUP rev=0 > 16327: fd=37 ev=POLLIN|POLLHUP rev=POLLIN > 16327: fd=35 ev=POLLIN|POLLHUP rev=0 > 16327: fd=33 ev=POLLIN|POLLHUP rev=0 > 16327: fd=6 ev=POLLIN|POLLHUP rev=0 > 16327: timeout: 59.99900 sec > 16327: accept(37, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) = 41 > 16327: AF_INET name = X.X.X.X port = 28986 > 16327: forkx(0)= 26942 > 16327: lwp_sigmask(SIG_SETMASK, 0x00011080, 0x, 0x, > 0x) = 0xFFBFFEFF [0x] > 16327: close(41) = 0 > 16327: pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x) = 1 > 16327: fd=39 ev=POLLIN|POLLHUP rev=0 > 16327: fd=38 ev=POLLIN|POLLHUP rev=0 > 16327: fd=34 ev=POLLIN|POLLHUP rev=0 > 16327: fd=36 ev=POLLIN|POLLHUP rev=0 > 16327: fd=35 ev=POLLIN|POLLHUP rev=POLLIN > 16327: fd=33 ev=POLLIN|POLLHUP rev=0 > 16327: fd=6 ev=POLLIN|POLLHUP rev=0 > 16327: fd=37 ev=POLLIN|POLLHUP rev=0 > 16327: timeout: 44.69600 sec > 16327: accept(35, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) Err#130 > ECONNABORTED > ... > > So there's nothing odd about the "poll()". Typically Solaris will flag > POLLERR in "revents" if it's out of resources, and POLLHUP if the remote end > closed the connection before it was fully established (remote NAKed, or > ignored, the connection SYN; terminally low on resources at t'other end of > the socket; ...). Neither is happening here which would suggest things are > proceeding as normal for the connection establishment. > > The server darn' well shouldn't be out of any resources either. In terms of > physical resources, at the point that occurred the CPUs were at 99.9% idle, > there was 15Gb of free RAM (so not out of kernel memory then ...) and only a > total of about 400 sockets (TCP, Unix, ...) in use across the entire system, > as reported by "netstat -na | wc -l" - well below peak levels seen on this > system. > > So it's going to be that hypothetical Solaris specific > SO_DONT_RANDOMLY_ABORT_CONNECTIONS socket() option, isn't it :-) > > So could I request please, that in the source3 code, either: > a. The same "sleep()" is added as in the source4 code; -and/or- > b. If errno == ECONNABORTED then only log the error if the debug > level is (substantially?) higher than zero. So your problem is the debug statement being triggered repeatedly ? Adding a sleep is (IMHO) the wrong thing to do. Once the accept() has failed the 'POLLIN' event should not be triggered repeatedly on the polled socket. Your truss trace doesn't show enough. Does a subsequent pollsys() keep returning fd=35 ev=POLLIN|POLLHUP rev=POLLIN after the: accept(35, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) Err#130 > ECONNABORTED ? Jeremy. -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
Re: [Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".
Hiya Andrew, Many thanks for the typically helpful and comprehensive reply :-) > I think that's probably the right track :-) > > The code here is triggered when poll() indicates that the socket is readable. > This socket should only be readable when a new connection is being made, and accept() should succeed. > ... > So, my only conclusion is that your box momentarily does not have the resources to accept the connection, > and because there isn't the sleep() in the source3 code, it prints this in a loop until the resources become available. Absolutely, and on any normal Unix implementation I'd agree entirely. That sort of "poll()"/"accept()"/... code is perfectly normal and exactly what you'd expect - I've written plenty of very similar code myself over the years ... However this is "Solaris" :-( Caught in the act: ... 16327: pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x) = 1 16327: fd=39 ev=POLLIN|POLLHUP rev=0 16327: fd=38 ev=POLLIN|POLLHUP rev=0 16327: fd=34 ev=POLLIN|POLLHUP rev=0 16327: fd=36 ev=POLLIN|POLLHUP rev=0 16327: fd=37 ev=POLLIN|POLLHUP rev=POLLIN 16327: fd=35 ev=POLLIN|POLLHUP rev=0 16327: fd=33 ev=POLLIN|POLLHUP rev=0 16327: fd=6 ev=POLLIN|POLLHUP rev=0 16327: timeout: 59.99900 sec 16327: accept(37, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) = 41 16327: AF_INET name = X.X.X.X port = 28986 16327: forkx(0)= 26942 16327: lwp_sigmask(SIG_SETMASK, 0x00011080, 0x, 0x, 0x) = 0xFFBFFEFF [0x] 16327: close(41) = 0 16327: pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x) = 1 16327: fd=39 ev=POLLIN|POLLHUP rev=0 16327: fd=38 ev=POLLIN|POLLHUP rev=0 16327: fd=34 ev=POLLIN|POLLHUP rev=0 16327: fd=36 ev=POLLIN|POLLHUP rev=0 16327: fd=35 ev=POLLIN|POLLHUP rev=POLLIN 16327: fd=33 ev=POLLIN|POLLHUP rev=0 16327: fd=6 ev=POLLIN|POLLHUP rev=0 16327: fd=37 ev=POLLIN|POLLHUP rev=0 16327: timeout: 44.69600 sec 16327: accept(35, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) Err#130 ECONNABORTED ... So there's nothing odd about the "poll()". Typically Solaris will flag POLLERR in "revents" if it's out of resources, and POLLHUP if the remote end closed the connection before it was fully established (remote NAKed, or ignored, the connection SYN; terminally low on resources at t'other end of the socket; ...). Neither is happening here which would suggest things are proceeding as normal for the connection establishment. The server darn' well shouldn't be out of any resources either. In terms of physical resources, at the point that occurred the CPUs were at 99.9% idle, there was 15Gb of free RAM (so not out of kernel memory then ...) and only a total of about 400 sockets (TCP, Unix, ...) in use across the entire system, as reported by "netstat -na | wc -l" - well below peak levels seen on this system. So it's going to be that hypothetical Solaris specific SO_DONT_RANDOMLY_ABORT_CONNECTIONS socket() option, isn't it :-) So could I request please, that in the source3 code, either: a. The same "sleep()" is added as in the source4 code; -and/or- b. If errno == ECONNABORTED then only log the error if the debug level is (substantially?) higher than zero. I think it's probably safe to assume that ECONNABORTED is generally ignoreable; for whatever reason, Solaris seems to return this at the drop of a metaphorical hat (and ignoring it on other OS' isn't going to be a problem either). Maybe the same with EAGAIN (and possibly EWOULDBLOCK), as other "Ignore this unless the user REALLY wants a lot of debug output" type "errors"? This would also seem to be common practice - a quick Google for "accept() ignore ECONNABORTED" comes back with a lot of results, mainly showing other open source code having been modified specifically to ignore ECONNABORTED. Cheers! Tris. -----Original Message----- From: Andrew Bartlett [mailto:abart...@samba.org] Sent: 29 August 2013 00:41 To: Tris Mabbs Cc: samba@lists.samba.org; samba-techni...@samba.org Subject: Re: [Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort". On Sun, 2013-08-25 at 18:50 +0100, Tris Mabbs wrote: > Probably should have posted this to "samba-technical" > in the first place, so re-posting in case anyone has any useful ideas . > > > > From: Tris Mabbs > > Sent: 12 August 2013 23:08 > To: 'samba@lists.samba.org' > Subject: Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using > client) behaviour #
Re: [Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".
On Sun, 2013-08-25 at 18:50 +0100, Tris Mabbs wrote: > Probably should have posted this to "samba-technical" in the > first place, so re-posting in case anyone has any useful ideas . > > > > From: Tris Mabbs > > Sent: 12 August 2013 23:08 > To: 'samba@lists.samba.org' > Subject: Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) > behaviour #2 - "accept: Software caused connection abort". > > > > Good day oh technical ones . > > > > I was running Samba 4 (client only, not using it as a DC so > effectively running Samba 3 code from the Samba 4 tree) and, other than a > little "Gotcha!" regarding decoding Kerberos PACs, it was all working > perfectly. > > Then recently I had to upgrade, to "4.2.0pre1-GIT-b505111" > (I had to upgrade the OS on the server running Samba - 'twas "OpenSolaris" > and is now "Solaris 11.1") so I recompiled it all up and installed afresh > (so no ".tdb"s from the previous installation or anything). > > > > But here's a funny thing (#2). The log file gets absolutely > ridiculous numbers of messages thus: > > > > Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 > 22:45:01.731562, 0] ../source3/smbd/server.c:556(smbd_accept_connection) > > Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error] accept: > Software caused connection abort > > Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 > 22:45:03.556423, 0] ../source3/smbd/server.c:556(smbd_accept_connection) > > Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] accept: > Software caused connection abort > > Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 > 22:45:03.556688, 0] ../source3/smbd/server.c:556(smbd_accept_connection) > > Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] accept: > Software caused connection abort > > > > And so on. These will come in spurts; there won't be any > such messages for several minutes then a whole load will come along all at > once. Rather like busses . > > I will catch "smbd" in the act at some point though, and > when I do I'll follow-up with a system call trace to show exactly what is > happening when this message gets triggered. It will, of course, be > something bizarrely Solaris specific (you didn't set the > "SO_DONT_RANDOMLY_ABORT_CONNECTIONS" socket() option, did you? Tsk tsk tsk > .). I think that's probably the right track :-) The code here is triggered when poll() indicates that the socket is reaable. This socket should only be readable when a new connection is being made, and accept() should succeed. In the source4/smbd/process_single.c code equivalent to this, there is this helpful hint: /* accept an incoming connection. */ status = socket_accept(listen_socket, &connected_socket); if (!NT_STATUS_IS_OK(status)) { DEBUG(0,("single_accept_connection: accept: %s\n", nt_errstr(status))); /* this looks strange, but is correct. We can only be here if woken up from select, due to an incoming connection. We need to throttle things until the system clears enough resources to handle this new socket. If we don't then we will spin filling the log and causing more problems. We don't panic as this is probably a temporary resource constraint */ sleep(1); return; } So, my only conclusion is that your box momentarily does not have the resources to accept the connection, and because there isn't the sleep() in the source3 code, it prints this in a loop until the resources become available. Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org Samba Developer, Catalyst IT http://catalyst.net.nz -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
Re: [Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".
Probably should have posted this to "samba-technical" in the first place, so re-posting in case anyone has any useful ideas . From: Tris Mabbs Sent: 12 August 2013 23:08 To: 'samba@lists.samba.org' Subject: Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort". Good day oh technical ones . I was running Samba 4 (client only, not using it as a DC so effectively running Samba 3 code from the Samba 4 tree) and, other than a little "Gotcha!" regarding decoding Kerberos PACs, it was all working perfectly. Then recently I had to upgrade, to "4.2.0pre1-GIT-b505111" (I had to upgrade the OS on the server running Samba - 'twas "OpenSolaris" and is now "Solaris 11.1") so I recompiled it all up and installed afresh (so no ".tdb"s from the previous installation or anything). But here's a funny thing (#2). The log file gets absolutely ridiculous numbers of messages thus: Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:01.731562, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:03.556423, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:03.556688, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort And so on. These will come in spurts; there won't be any such messages for several minutes then a whole load will come along all at once. Rather like busses . It doesn't seem to be affecting the operation so far as any client is concerned. Or rather it evidently will be having some effect, but it's not a noticeable one. However it is really irritating having the system log getting filled up with all these messages! Murphy's law, of course, states that trying to catch one of these messages being created, so I can include a suitable system call trace in this message, will be impossible - there will be no such messages logged until the instant I click "Send" (at which point probably about half-a-dozen will be logged all at once). That does indeed seem to be the case - I've now been trying to persuade one of these, normally very regularly occurring, messages to be logged for about 20 minutes and still, stubbornly, nothing continues to happen. I will catch "smbd" in the act at some point though, and when I do I'll follow-up with a system call trace to show exactly what is happening when this message gets triggered. It will, of course, be something bizarrely Solaris specific (you didn't set the "SO_DONT_RANDOMLY_ABORT_CONNECTIONS" socket() option, did you? Tsk tsk tsk .). Cheers, Tris. -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
Re: [Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".
Caught it in the act! As predicted, almost immediately after sending my original message (!) . However unfortunately, it's not particularly useful . ... 16327: write(7, "\0", 1)= 1 16327: setcontext(0xFEFFD690) 16327: waitid(P_ALL, 0, 0xFEFFDD70, WEXITED|WTRAPPED|WNOHANG) = 0 16327: siginfo: SIGCLD CLD_EXITED pid=26937 status=0x 16327: waitid(P_ALL, 0, 0xFEFFDD70, WEXITED|WTRAPPED|WNOHANG) = 0 16327: siginfo: SIG#0 16327: pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x) = 1 16327: fd=39 ev=POLLIN|POLLHUP rev=0 16327: fd=38 ev=POLLIN|POLLHUP rev=0 16327: fd=34 ev=POLLIN|POLLHUP rev=0 16327: fd=36 ev=POLLIN|POLLHUP rev=0 16327: fd=37 ev=POLLIN|POLLHUP rev=0 16327: fd=35 ev=POLLIN|POLLHUP rev=0 16327: fd=33 ev=POLLIN|POLLHUP rev=0 16327: fd=6 ev=POLLIN|POLLHUP rev=POLLIN 16327: timeout: 59.99900 sec 16327: read(6, "\0", 16)= 1 16327: pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x) (sleeping...) 16327: fd=39 ev=POLLIN|POLLHUP rev=0 16327: fd=38 ev=POLLIN|POLLHUP rev=0 16327: fd=34 ev=POLLIN|POLLHUP rev=0 16327: fd=36 ev=POLLIN|POLLHUP rev=0 16327: fd=37 ev=POLLIN|POLLHUP rev=0 16327: fd=35 ev=POLLIN|POLLHUP rev=0 16327: fd=33 ev=POLLIN|POLLHUP rev=0 16327: fd=6 ev=POLLIN|POLLHUP rev=0 16327: timeout: 59.99900 sec 16327: pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x) = 1 16327: fd=39 ev=POLLIN|POLLHUP rev=0 16327: fd=38 ev=POLLIN|POLLHUP rev=0 16327: fd=34 ev=POLLIN|POLLHUP rev=0 16327: fd=36 ev=POLLIN|POLLHUP rev=0 16327: fd=37 ev=POLLIN|POLLHUP rev=POLLIN 16327: fd=35 ev=POLLIN|POLLHUP rev=0 16327: fd=33 ev=POLLIN|POLLHUP rev=0 16327: fd=6 ev=POLLIN|POLLHUP rev=0 16327: timeout: 59.99900 sec 16327: accept(37, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) = 41 16327: AF_INET name = X.X.X.X port = 28986 16327: forkx(0)= 26942 16327: lwp_sigmask(SIG_SETMASK, 0x00011080, 0x, 0x, 0x) = 0xFFBFFEFF [0x] 16327: close(41) = 0 16327: pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x) = 1 16327: fd=39 ev=POLLIN|POLLHUP rev=0 16327: fd=38 ev=POLLIN|POLLHUP rev=0 16327: fd=34 ev=POLLIN|POLLHUP rev=0 16327: fd=36 ev=POLLIN|POLLHUP rev=0 16327: fd=35 ev=POLLIN|POLLHUP rev=POLLIN 16327: fd=33 ev=POLLIN|POLLHUP rev=0 16327: fd=6 ev=POLLIN|POLLHUP rev=0 16327: fd=37 ev=POLLIN|POLLHUP rev=0 16327: timeout: 44.69600 sec 16327: accept(35, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) Err#130 ECONNABORTED 16327: fstat(40, 0xFEFFD1A0)= 0 16327: d=0x0888 i=22079 m=002 l=0 u=0 g=0 rdev=0x02480248 16327: at = Aug 12 17:30:31 BST 2013 [ 1376325031.0 ] 16327: mt = Aug 12 17:30:31 BST 2013 [ 1376325031.0 ] 16327: ct = Aug 12 17:30:31 BST 2013 [ 1376325031.0 ] 16327: bsz=8192 blks=0 fs=BADVFS 16327: time() = 1376345330 16327: getpid()= 16327 [1] 16327: putmsg(40, 0xFEFFDB80, 0xFEFFDB90, 0) = 0 16327: ctl: maxlen=24 len=24 buf=0xFEFFD230: "98 0 tFE\0\010\0".. 16327: 98 0 tFE\0\010\0 XD2FFFE \CF iFE HFA\b\b1B\0\0\0 16327: dat: maxlen=1280 len=151 buf=0xFEFFD250: " A u g 1 2 2".. 16327:A u g 1 2 2 3 : 0 8 : 5 0 s m b d [ 1 6 3 2 7 ] : [ I D 16327: 7 0 2 9 1 1 F A C I L I T Y _ A N D _ P R I O R I T Y ] [ 16327:2 0 1 3 / 0 8 / 1 2 2 3 : 0 8 : 5 0 . 0 4 3 5 8 2 , 0 ] 16327:. . / s o u r c e 3 / s m b d / s e r v e r . c : 5 5 6 ( s m b 16327:d _ a c c e p t _ c o n n e c t i o n )\n\n\0 ... However hopefully that might be of use to someone - unfortunately it does just look like a Solaris specific nasty. There should be no reason for the abort; everything is directly connected on uncongested gigabit Ethernet, so it's unlikely to be TCP timeout related and I can't see a client aborting the connection before even it's established (that might happen if the clients, or possibly the server were desperately short of IP resources but that isn't going to be the case either). It doesn't appear to be related to the number of connections either, so I can't immediately think of any sensible reason for the connection to abort. What might also be interesting to investigate (though completely unrelated) is why there's an "fstat(40, 0xFEFFD1A0)" when writing the message - descriptor 40 is the socket for the "syslog()" conne
[Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".
Good day oh technical ones . I was running Samba 4 (client only, not using it as a DC so effectively running Samba 3 code from the Samba 4 tree) and, other than a little "Gotcha!" regarding decoding Kerberos PACs, it was all working perfectly. Then recently I had to upgrade, to "4.2.0pre1-GIT-b505111" (I had to upgrade the OS on the server running Samba - 'twas "OpenSolaris" and is now "Solaris 11.1") so I recompiled it all up and installed afresh (so no ".tdb"s from the previous installation or anything). But here's a funny thing (#2). The log file gets absolutely ridiculous numbers of messages thus: Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:01.731562, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:03.556423, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12 22:45:03.556688, 0] ../source3/smbd/server.c:556(smbd_accept_connection) Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] accept: Software caused connection abort And so on. These will come in spurts; there won't be any such messages for several minutes then a whole load will come along all at once. Rather like busses . It doesn't seem to be affecting the operation so far as any client is concerned. Or rather it evidently will be having some effect, but it's not a noticeable one. However it is really irritating having the system log getting filled up with all these messages! Murphy's law, of course, states that trying to catch one of these messages being created, so I can include a suitable system call trace in this message, will be impossible - there will be no such messages logged until the instant I click "Send" (at which point probably about half-a-dozen will be logged all at once). That does indeed seem to be the case - I've now been trying to persuade one of these, normally very regularly occurring, messages to be logged for about 20 minutes and still, stubbornly, nothing continues to happen. I will catch "smbd" in the act at some point though, and when I do I'll follow-up with a system call trace to show exactly what is happening when this message gets triggered. It will, of course, be something bizarrely Solaris specific (you didn't set the "SO_DONT_RANDOMLY_ABORT_CONNECTIONS" socket() option, did you? Tsk tsk tsk .). Cheers, Tris. -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba