On 11-11-11 03:46 PM, Mike Wilson wrote: > General lag on the slave node (as recorded in sl_status) is less then 30 > seconds. This is a heavily transacted system running on very nice hardware > so perhaps any problems are being masked by that. > > I've read up on the issue and we don't appear to be experiencing any of the > bugs related to this issue that I can find in the news groups. No long > running transactions, no old nodes in the sl_ tables. In general, the system > appears to be healthy (idle proc time ~95%), good buffer cache hit ratios, > etc. > > Thanks for the replies though. I'll look into implementing 2.1 although we > just did the upgrade to 2.0.7 and I'm not sure management will go for another > down during the holiday season. Just doing my due diligence as our load will > rise steadily through the holiday season to very large load on these servers > and I wanted to make sure the servers looked solid before we through 30 X the > current load at them.
Now that a few days have passed, is your sl_log_1 count still growing or has it dropped? If your sl_log tables keep growing and aren't being truncated by the cleanup thread then you have a problem that will eventually get worse. If the 119,239 rows in sl_log_1 was a temporary thing due to your application doing lots of updates then that might be normal for your system. > > Mike Wilson > Predicate Logic > Cell: (310) 600-8777 > SkypeID: lycovian > > > > > On Nov 11, 2011, at 11:09 AM, Steve Singer wrote: > >> On 11-11-11 02:04 PM, Mike Wilson wrote: >>> >>> Mike Wilson >>> Predicate Logic >>> Cell: (310) 600-8777 >>> SkypeID: lycovian >>> >>> >>> From my postgresql.log: >>> 2011-11-11 11:03:15.237 PST db1.lax.jib(55096):LOG: duration: 133.011 ms >>> statement: fetch 500 from LOG; >>> 2011-11-11 11:03:17.241 PST db1.lax.jib(55096):LOG: duration: 134.842 ms >>> statement: fetch 500 from LOG; >>> 2011-11-11 11:03:19.239 PST db1.lax.jib(55096):LOG: duration: 133.919 ms >>> statement: fetch 500 from LOG; >>> 2011-11-11 11:03:21.240 PST db1.lax.jib(55096):LOG: duration: 133.194 ms >>> statement: fetch 500 from LOG; >>> 2011-11-11 11:03:23.241 PST db1.lax.jib(55096):LOG: duration: 134.288 ms >>> statement: fetch 500 from LOG; >>> 2011-11-11 11:03:25.241 PST db1.lax.jib(55096):LOG: duration: 133.226 ms >>> statement: fetch 500 from LOG; >>> >>> I'm only logging statements that take longer than 100ms to run. >>> >>> Here is my output from sl_log1/2: >>> select (select count(*) from sl_log_1) sl_log_1, (select count(*) from >>> sl_log_2) sl_log_2; >>> sl_log_1 | sl_log_2 >>> ----------+---------- >>> 119239 | 43685 >> >> The fetch is taking a long time because sl_log_1 is big. (The reason it >> takes so long is actually a bug that was fixed in 2.1) sl_log_1 being that >> big probably means that log switching isn't happening. >> >> Do you have any nodes that are behind? (query sl_status on all your nodes) >> Do you have any old nodes that are still listed in sl_node that you aren't >> using anymore? >> Do (did) you have a long running transaction in your system that is >> preventing the log switch from taking place? >> >> >> >> >> >>> >>> >>> On Nov 11, 2011, at 5:07 AM, Steve Singer wrote: >>> >>>> On 11-11-09 01:19 PM, Mike Wilson wrote: >>>>> Seeing "fetch 500 from LOG" almost continuously in my PG logs for a new >>>>> Slony 2.0.7 install. The previous version (2.0.3?) didn't show these >>>>> messages in the PG log. Researching the issue, historically, this >>>>> message was usually accompanied by a performance issue. This isn't the >>>>> case with my databases though, they appear to be running just as well as >>>>> ever and the lag between replicated nodes appears to be about the same as >>>>> the previous version. >>>>> >>>>> I guess my question is what does this message mean in this version of >>>>> Slony? Is it an indication of sub-optimal slon parameters? >>>>> slon -g 20 $SLON_CLUSTER "host=$HOSTNAME port=$PORT dbname=$DB user=$USER" >>>>> >>>>> And how can I get rid of it if it's not an issue? >>>>> >>>>> Mike >>>> >>>> What is causing the 'fetch 500' statements to show up in the server log? >>>> Are you only logging SQL that takes longer than x milliseconds? If so how >>>> long are your fetch 500 statements taking? How many rows are in your >>>> sl_log_1 and sl_log_2? >>>> >>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Slony1-general mailing list >>>>> [email protected] >>>>> http://lists.slony.info/mailman/listinfo/slony1-general >>>> >>> >> > _______________________________________________ Slony1-general mailing list [email protected] http://lists.slony.info/mailman/listinfo/slony1-general
