Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-16 Thread John DeSoi
On Aug 15, 2013, at 1:07 PM, Andrew Berman rexx...@gmail.com wrote: I'm having an issue where streaming replication just randomly stops working. I haven't been able to find anything in the logs which point to an issue, but the Postgres process shows a waiting status on the slave:

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-16 Thread Andrew Berman
Awesome, I'll give that a shot John. On Fri, Aug 16, 2013 at 8:39 AM, John DeSoi de...@pgedit.com wrote: On Aug 15, 2013, at 1:07 PM, Andrew Berman rexx...@gmail.com wrote: I'm having an issue where streaming replication just randomly stops working. I haven't been able to find anything

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-16 Thread Jeff Janes
On Thu, Aug 15, 2013 at 1:28 PM, Andrew Berman rexx...@gmail.com wrote: Hi Jeff, Here is the full process list at the time it stopped working (I have changed the actual username, db and IP for security). Would the idle in transaction process be the culprit? Most likely, yes. You should be

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-16 Thread Jeff Janes
On Fri, Aug 16, 2013 at 9:45 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Thu, Aug 15, 2013 at 1:28 PM, Andrew Berman rexx...@gmail.com wrote: Hi Jeff, Here is the full process list at the time it stopped working (I have changed the actual username, db and IP for security). Would the idle

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-16 Thread Andrew Berman
Ok, next time it happens I'll try to do more sleuthing to figure out if that's the issue. For now, I'm going to try adding --timeout=30 to the rsync command and see if that fixes things. Thanks again for your help! Andrew On Fri, Aug 16, 2013 at 10:12 AM, Jeff Janes jeff.ja...@gmail.com

[GENERAL] Streaming Replication Randomly Locking Up

2013-08-15 Thread Andrew Berman
Hello, I'm having an issue where streaming replication just randomly stops working. I haven't been able to find anything in the logs which point to an issue, but the Postgres process shows a waiting status on the slave: postgres 5639 0.1 24.3 3428264 2970236 ? Ss Aug14 1:54 postgres:

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-15 Thread Lonni J Friedman
I've never seen this happen. Looks like you might be using 9.1? Are you up to date on all the 9.1.x releases? Do you have just 1 slave syncing from the master? Which OS are you using? Did you verify that there aren't any network problems between the slave master? Or hardware problems (like the

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-15 Thread Andrew Berman
Hi Lonni, Yes, I am using PG 9.1.9. Yes, 1 slave syncing from the master CentOS 6.4 I don't see any network or hardware issues (e.g. NIC) but will look more into this. They are communicating on a private network and switch. I forgot to mention that after I restart the slave, everything syncs

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-15 Thread Lonni J Friedman
Are you certain that there are no relevant errors in the database logs (on both master slave)? Also, are you sure that you didn't misconfigure logging such that errors wouldn't appear? On Thu, Aug 15, 2013 at 11:45 AM, Andrew Berman rexx...@gmail.com wrote: Hi Lonni, Yes, I am using PG

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-15 Thread Andrew Berman
The only thing I see that is a possibility for the issue is in the slave log: LOG: unexpected EOF on client connection LOG: could not receive data from client: Connection reset by peer I don't know if that's related or not as it could just be somebody running a query. The log file does seem

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-15 Thread Lonni J Friedman
I'd suggest enhancing your logging to include time/datestamps for every entry, and also the client hostname. That will help to rule in/out those 'unexpected EOF' errors. On Thu, Aug 15, 2013 at 12:22 PM, Andrew Berman rexx...@gmail.com wrote: The only thing I see that is a possibility for the

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-15 Thread Andrew Berman
Yep, that's the first thing I'm going to do. On Thu, Aug 15, 2013 at 12:34 PM, Lonni J Friedman netll...@gmail.comwrote: I'd suggest enhancing your logging to include time/datestamps for every entry, and also the client hostname. That will help to rule in/out those 'unexpected EOF' errors.

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-15 Thread Jeff Janes
On Thu, Aug 15, 2013 at 11:07 AM, Andrew Berman rexx...@gmail.com wrote: Hello, I'm having an issue where streaming replication just randomly stops working. I haven't been able to find anything in the logs which point to an issue, but the Postgres process shows a waiting status on the slave:

Re: [GENERAL] Streaming Replication Randomly Locking Up

2013-08-15 Thread Andrew Berman
Hi Jeff, Here is the full process list at the time it stopped working (I have changed the actual username, db and IP for security). Would the idle in transaction process be the culprit? postgres 5639 0.1 24.3 3428264 2970236 ? Ss Aug14 1:54 postgres: startup process recovering