Re: [HACKERS] Socket problem using beta2 on Windows-XP
Thomas Hallgren wrote: With great help from Magnus, who advised me to use lspfix from cexx.org to list my lsp's, I found that I had gapsp.dll, Neoteris DNS Provider installed. An uninstall of the Neoteris software made this problem go away. I guess the question is, why is a DNS Provider software blocking socket creation? Is there a way we could work around that? -- Alvaro Herrera Architect, http://www.EnterpriseDB.com El destino baraja y nosotros jugamos (A. Schopenhauer) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Socket problem using beta2 on Windows-XP
With great help from Magnus, who advised me to use lspfix from cexx.org to list my lsp's, I found that I had gapsp.dll, Neoteris DNS Provider installed. An uninstall of the Neoteris software made this problem go away. I guess the question is, why is a DNS Provider software blocking socket creation? Is there a way we could work around that? It's just another version of the Broken LSP that we've been having problems iwth before. But before, it's only been AV and firewall stuff. I guess they somehow put a LSP in there to intercept DNS packets or soemthign. Completely broken design IMHO, but that's a different thing ;-) And they apparantly don't support socket inheritance. The only way we can work around them breaking the concept of socket inheritance is to stop using it. Which would mean going multithread instead of multiprocess, which isn't very likely... To reiterate the basic point: The broken LSP breaks a fundamental promise in the sockets API that we absolutely require. The bug is completely within the LSP. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Socket problem using beta2 on Windows-XP
Magnus Hagander [EMAIL PROTECTED] writes: To reiterate the basic point: The broken LSP breaks a fundamental promise in the sockets API that we absolutely require. The bug is completely within the LSP. ISTM that maybe what we have here is a documentation shortcoming. I'm thinking that our Windows FAQ ought to suggest troubleshooting socket-related problems by removing LSPs one at a time. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Socket problem using beta2 on Windows-XP
To reiterate the basic point: The broken LSP breaks a fundamental promise in the sockets API that we absolutely require. The bug is completely within the LSP. ISTM that maybe what we have here is a documentation shortcoming. I'm thinking that our Windows FAQ ought to suggest troubleshooting socket-related problems by removing LSPs one at a time. We used to have this, but we removed it when we aded the code that fixed the problem in 95% of the cases. It's probably a good idea to bring it back :-( //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Socket problem using beta2 on Windows-XP
Martijn van Oosterhout wrote: On Thu, Sep 29, 2005 at 08:50:30AM +0200, Thomas Hallgren wrote: Hi, I've installed PostgreSQL 8.1-beta2 as a service on my Windows-XP box. It runs fine but I get repeated messages like this in the log: 2005-09-29 00:41:09 FATAL: could not duplicate socket 1880 for use in backend: error code 10038 That's from postmaster.c:write_inheritable_socket(). Error 10038 is WSAENOTSOCK. Very odd, time to get out the debugger? Get a backtrace at least. I finally managed to debug the postmaster and I'm now pretty sure the message is not from the postmaster itself. I put a breakpoint where the message is printed (postmaster.c:3762) and in errstart() where elevel = ERROR (elog.c:152) but I never get there although the message is printed. I know that my debugger works because if I put a break on elog.c:194 it stops for other messages. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Socket problem using beta2 on Windows-XP
I added some traces to the code. I know that the following happens when I start a postmaster. StartupDatabase will call internal_fork_exec, it calls write_inheritable_socket 4 times and succeeds. During the first iteration of ServerLoop: StartBackgroundWriter will call internal_fork_exec and succeed. pgstat_forkexec will call internal_fork_exec and succeed. In the second iteration of ServerLoop, pgstat_forkexec will again call will call internal_fork_exec. This time it fails. According to the log it fails on line: write_inheritable_socket(param-pgStatSock, pgStatSock, childPid); i.e. on it's second call to write_inheriable_socket. The failure is in a postgres.exe process, not postmaster.exe (and that's why I can't debug propery on Windoz). Hope this helps. Regards, Thomas Hallgren Magnus Hagander wrote: If it's two zombies per minute, then I bet it's the stat collector and stat bufferer. They are restarted by the postmaster if not found to be running. That would make some sense, because the stat processes need to set up new sockets (for the pipe between them). The autovacuum theory didn't hold any water in my eyes because autovacuum doesn't create any new sockets. However, why two zombies? That would mean that the grandchild process started, which should mean that the pipe was already created ... Does Windows have any equivalent of strace whereby we could watch what's happening during stats process launch? First of all, I won't be able to dig into this any more until next week - sorry about that. But others are always free to :-) There is no strace equivalent builtin, but you can get an addon from http://www.bindview.com/Services/RAZOR/Utilities/Windows/strace_readme.c fm. Don't put it on a production box permanently, though, it tends to cause BSODs in some cases. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Socket problem using beta2 on Windows-XP
On Sun, Oct 02, 2005 at 12:20:05PM +0200, Thomas Hallgren wrote: I added some traces to the code. I know that the following happens when I start a postmaster. snip In the second iteration of ServerLoop, pgstat_forkexec will again call will call internal_fork_exec. This time it fails. According to the log it fails on line: write_inheritable_socket(param-pgStatSock, pgStatSock, childPid); Well, pgStatSock is the only SOCK_DGRAM socket, all the others are SOCK_STREAM, maybe that's the difference? It's also connected to itself, although for DGRAM sockets that's not that special. The documentation isn't totally clear about this. Yet the error thrown should terminate the process, yet it obviously isn't. Very odd. Any Windows programmers with ideas? -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpKbZv1lHAcM.pgp Description: PGP signature
Re: [HACKERS] Socket problem using beta2 on Windows-XP
IIRC, the win32 installer will enable autovacuum by default. And yes, autovacuum was my first thought as well after Thomas last mail - that would be a good explanation to why it happens when the postmaster is idle. I used the win32 installer defaults so autovacuum is probably a safe assumption. Right. Please try turning it off and see if the problem goes away. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Socket problem using beta2 on Windows-XP
Magnus Hagander wrote: Right. Please try turning it off and see if the problem goes away. It does (go away). - thomas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Socket problem using beta2 on Windows-XP
Magnus Hagander wrote: Right. Please try turning it off and see if the problem goes away. No, wait! It does *not* go away. Do I need to do anything more than setting this in my postgresql.conf file: autovacuum = false# enable autovacuum subprocess? and restart the service? The two zombie entries occurs directly when I start the service, then there's two new entries popping up every minute. - thomas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Socket problem using beta2 on Windows-XP
Right. Please try turning it off and see if the problem goes away. No, wait! It does *not* go away. Do I need to do anything more than setting this in my postgresql.conf file: autovacuum = false# enable autovacuum subprocess? and restart the service? The two zombie entries occurs directly when I start the service, then there's two new entries popping up every minute. Yes, that should be enough. Hmm. Weird! If you can get a backtrace from the point where the error msg shows up, that certainly would help - this means it's not coming from where we thought it was coming from :-( //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Socket problem using beta2 on Windows-XP
On Fri, Sep 30, 2005 at 08:29:07AM +0200, Thomas Hallgren wrote: Magnus Hagander wrote: Right. Please try turning it off and see if the problem goes away. No, wait! It does *not* go away. Do I need to do anything more than setting this in my postgresql.conf file: autovacuum = false# enable autovacuum subprocess? and restart the service? The two zombie entries occurs directly when I start the service, then there's two new entries popping up every minute. If it's two zombies per minute, then I bet it's the stat collector and stat bufferer. They are restarted by the postmaster if not found to be running. The weird thing is that the postmaster _should_ call wait() for them if it detects that they died (when receiving a SIGCHLD signal AFAIR). If it doesn't, maybe it indicates there's a problem with the signal handling on Win32. -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 17.7, W 73º 14' 26.8 We are who we choose to be, sang the goldfinch when the sun is high (Sandman) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Socket problem using beta2 on Windows-XP
Alvaro Herrera [EMAIL PROTECTED] writes: If it's two zombies per minute, then I bet it's the stat collector and stat bufferer. They are restarted by the postmaster if not found to be running. That would make some sense, because the stat processes need to set up new sockets (for the pipe between them). The autovacuum theory didn't hold any water in my eyes because autovacuum doesn't create any new sockets. However, why two zombies? That would mean that the grandchild process started, which should mean that the pipe was already created ... Does Windows have any equivalent of strace whereby we could watch what's happening during stats process launch? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Socket problem using beta2 on Windows-XP
If it's two zombies per minute, then I bet it's the stat collector and stat bufferer. They are restarted by the postmaster if not found to be running. That would make some sense, because the stat processes need to set up new sockets (for the pipe between them). The autovacuum theory didn't hold any water in my eyes because autovacuum doesn't create any new sockets. However, why two zombies? That would mean that the grandchild process started, which should mean that the pipe was already created ... Does Windows have any equivalent of strace whereby we could watch what's happening during stats process launch? First of all, I won't be able to dig into this any more until next week - sorry about that. But others are always free to :-) There is no strace equivalent builtin, but you can get an addon from http://www.bindview.com/Services/RAZOR/Utilities/Windows/strace_readme.c fm. Don't put it on a production box permanently, though, it tends to cause BSODs in some cases. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Socket problem using beta2 on Windows-XP
Tom Lane wrote: However, why two zombies? That would mean that the grandchild process started, which should mean that the pipe was already created ... To clarify, I talk about the tcpview window and connections, and thus zombi-connections. They both belong to the same pid and seems to point to eachother. The actual process no longer exists (it can't be viewed anywhere). Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Socket problem using beta2 on Windows-XP
Hi, I've installed PostgreSQL 8.1-beta2 as a service on my Windows-XP box. It runs fine but I get repeated messages like this in the log: 2005-09-29 00:41:09 FATAL: could not duplicate socket 1880 for use in backend: error code 10038 and for each message printed, a new postgres process is created. To make things worse, those processes do not die when I stop the service. I use sysinternals tcpview to monitor my sockets. I know that no other process is using 1880. Each started postgres process will occupy two, seemingly random ports that apparently form a loop somehow. This is a typical entry: non-existent:3136 TCP 127.0.0.1:1554 127.0.0.1:1555 ESTABLISHED non-existent:3136 TCP 127.0.0.1:1555 127.0.0.1:1554 ESTABLISHED The weird thing is that there is no process with pid 3136 (hence the name non-existent). There is a postgres process with another pid in my process listing. If I kill that, the non-existstent entries go away. Looks like pid 3136 is talking to itself. A pipe() followed by failure to start the new process perhaps? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Socket problem using beta2 on Windows-XP
Hi, I've installed PostgreSQL 8.1-beta2 as a service on my Windows-XP box. It runs fine but I get repeated messages like this in the log: 2005-09-29 00:41:09 FATAL: could not duplicate socket 1880 for use in backend: error code 10038 and for each message printed, a new postgres process is created. To make things worse, those processes do not die when I stop the service. I use sysinternals tcpview to monitor my sockets. I know that no other process is using 1880. Each started postgres process will occupy two, seemingly random ports that apparently form a loop somehow. This is a typical entry: non-existent:3136TCP 127.0.0.1:1554 127.0.0.1:1555 ESTABLISHED non-existent:3136TCP 127.0.0.1:1555 127.0.0.1:1554ESTABLISHED The weird thing is that there is no process with pid 3136 (hence the name non-existent). There is a postgres process with another pid in my process listing. If I kill that, the non-existstent entries go away. Looks like pid 3136 is talking to itself. A pipe() followed by failure to start the new process perhaps? Do you by any chance run any antivirus or firewall software? If so, can you try removing it (note! actual uninstall, not just disabling it!) //Magnus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Socket problem using beta2 on Windows-XP
Nope, no anti-virus and no firewall (other then the box that fronts my home-network to the outside world). - thomas Magnus Hagander wrote: Hi, I've installed PostgreSQL 8.1-beta2 as a service on my Windows-XP box. It runs fine but I get repeated messages like this in the log: 2005-09-29 00:41:09 FATAL: could not duplicate socket 1880 for use in backend: error code 10038 and for each message printed, a new postgres process is created. To make things worse, those processes do not die when I stop the service. I use sysinternals tcpview to monitor my sockets. I know that no other process is using 1880. Each started postgres process will occupy two, seemingly random ports that apparently form a loop somehow. This is a typical entry: non-existent:3136 TCP 127.0.0.1:1554 127.0.0.1:1555 ESTABLISHED non-existent:3136 TCP 127.0.0.1:1555 127.0.0.1:1554 ESTABLISHED The weird thing is that there is no process with pid 3136 (hence the name non-existent). There is a postgres process with another pid in my process listing. If I kill that, the non-existstent entries go away. Looks like pid 3136 is talking to itself. A pipe() followed by failure to start the new process perhaps? Do you by any chance run any antivirus or firewall software? If so, can you try removing it (note! actual uninstall, not just disabling it!) //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Socket problem using beta2 on Windows-XP
On Thu, Sep 29, 2005 at 08:50:30AM +0200, Thomas Hallgren wrote: Hi, I've installed PostgreSQL 8.1-beta2 as a service on my Windows-XP box. It runs fine but I get repeated messages like this in the log: 2005-09-29 00:41:09 FATAL: could not duplicate socket 1880 for use in backend: error code 10038 That's from postmaster.c:write_inheritable_socket(). Error 10038 is WSAENOTSOCK. Very odd, time to get out the debugger? Get a backtrace at least. Hope this helps, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpH0H2mRTE8a.pgp Description: PGP signature
Re: [HACKERS] Socket problem using beta2 on Windows-XP
Hmm. Bummer. Anyway. The netstat indicates that the pipe() call works. The order is pretty much: parent: create socket pair, connected to each other. parent: Duplicate socket [this is what fails] parent: close own copy of socket child: recreate socket from structure [this is never called, thus the new socket is never attached to a process] Now *why* it's doing this, I hav eno idea. Questions: 1) Does it actually work? ;-) And just logs the error anyway? 2) Does this happen on *every* connection? 3) Can you reproduce this on a different machine, or just one? //Magnus -Original Message- From: Thomas Hallgren [mailto:[EMAIL PROTECTED] Sent: Thursday, September 29, 2005 9:48 AM To: Magnus Hagander Cc: PostgreSQL-development Subject: Re: [HACKERS] Socket problem using beta2 on Windows-XP Nope, no anti-virus and no firewall (other then the box that fronts my home-network to the outside world). - thomas Magnus Hagander wrote: Hi, I've installed PostgreSQL 8.1-beta2 as a service on my Windows-XP box. It runs fine but I get repeated messages like this in the log: 2005-09-29 00:41:09 FATAL: could not duplicate socket 1880 for use in backend: error code 10038 and for each message printed, a new postgres process is created. To make things worse, those processes do not die when I stop the service. I use sysinternals tcpview to monitor my sockets. I know that no other process is using 1880. Each started postgres process will occupy two, seemingly random ports that apparently form a loop somehow. This is a typical entry: non-existent:3136 TCP 127.0.0.1:1554 127.0.0.1:1555 ESTABLISHED non-existent:3136 TCP 127.0.0.1:1555 127.0.0.1:1554 ESTABLISHED The weird thing is that there is no process with pid 3136 (hence the name non-existent). There is a postgres process with another pid in my process listing. If I kill that, the non-existstent entries go away. Looks like pid 3136 is talking to itself. A pipe() followed by failure to start the new process perhaps? Do you by any chance run any antivirus or firewall software? If so, can you try removing it (note! actual uninstall, not just disabling it!) //Magnus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Socket problem using beta2 on Windows-XP
Hi, I'm Sorry, Time was short today. To answer your questions. 1. I can run a psql and other client programs. Everything works fine. But while doing it, I get a lot of zombies in the tcpview and eventually, I think I run out of ports. Psql just hangs when I try to connect. When that happens, I have two choices; a) Stop the service and then kill off all processes by hand (there's now a *lot* of them), or b) reboot. 2. It happens while the postmaster is idle. If I leave it idle for a while and then come back, I'll have a whole bunch of new processes in my task-manager and zombies in tcpview. 3. I don't have another machine handy for this right now. It sounds like you know where it happens. Martijn requested a stacktrace. Do you still need that? If you do, I'll try to get some time over this weekend. Regards, Thomas Hallgren Magnus Hagander wrote: Hmm. Bummer. Anyway. The netstat indicates that the pipe() call works. The order is pretty much: parent: create socket pair, connected to each other. parent: Duplicate socket [this is what fails] parent: close own copy of socket child: recreate socket from structure [this is never called, thus the new socket is never attached to a process] Now *why* it's doing this, I hav eno idea. Questions: 1) Does it actually work? ;-) And just logs the error anyway? 2) Does this happen on *every* connection? 3) Can you reproduce this on a different machine, or just one? //Magnus -Original Message- From: Thomas Hallgren [mailto:[EMAIL PROTECTED] Sent: Thursday, September 29, 2005 9:48 AM To: Magnus Hagander Cc: PostgreSQL-development Subject: Re: [HACKERS] Socket problem using beta2 on Windows-XP Nope, no anti-virus and no firewall (other then the box that fronts my home-network to the outside world). - thomas Magnus Hagander wrote: Hi, I've installed PostgreSQL 8.1-beta2 as a service on my Windows-XP box. It runs fine but I get repeated messages like this in the log: 2005-09-29 00:41:09 FATAL: could not duplicate socket 1880 for use in backend: error code 10038 and for each message printed, a new postgres process is created. To make things worse, those processes do not die when I stop the service. I use sysinternals tcpview to monitor my sockets. I know that no other process is using 1880. Each started postgres process will occupy two, seemingly random ports that apparently form a loop somehow. This is a typical entry: non-existent:3136 TCP 127.0.0.1:1554 127.0.0.1:1555 ESTABLISHED non-existent:3136 TCP 127.0.0.1:1555 127.0.0.1:1554 ESTABLISHED The weird thing is that there is no process with pid 3136 (hence the name non-existent). There is a postgres process with another pid in my process listing. If I kill that, the non-existstent entries go away. Looks like pid 3136 is talking to itself. A pipe() followed by failure to start the new process perhaps? Do you by any chance run any antivirus or firewall software? If so, can you try removing it (note! actual uninstall, not just disabling it!) //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Socket problem using beta2 on Windows-XP
On Thu, Sep 29, 2005 at 11:43:37PM +0200, Thomas Hallgren wrote: 2. It happens while the postmaster is idle. If I leave it idle for a while and then come back, I'll have a whole bunch of new processes in my task-manager and zombies in tcpview. Hmm ... how many processes? Did you enable autovacuum perchance? If so, does the number of processes correspond approximately to the autovacuum_naptime? -- Alvaro Herrerahttp://www.advogato.org/person/alvherre La espina, desde que nace, ya pincha (Proverbio africano) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Socket problem using beta2 on Windows-XP
2. It happens while the postmaster is idle. If I leave it idle for a while and then come back, I'll have a whole bunch of new processes in my task-manager and zombies in tcpview. Hmm ... how many processes? Did you enable autovacuum perchance? If so, does the number of processes correspond approximately to the autovacuum_naptime? IIRC, the win32 installer will enable autovacuum by default. And yes, autovacuum was my first thought as well after Thomas last mail - that would be a good explanation to why it happens when the postmaster is idle. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Socket problem using beta2 on Windows-XP
Magnus Hagander wrote: IIRC, the win32 installer will enable autovacuum by default. And yes, autovacuum was my first thought as well after Thomas last mail - that would be a good explanation to why it happens when the postmaster is idle. I used the win32 installer defaults so autovacuum is probably a safe assumption. - thomas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq