On 11-11-11 03:46 PM, Mike Wilson wrote:
> General lag on the slave node (as recorded in sl_status) is less then 30 
> seconds.  This is a heavily transacted system running on very nice hardware 
> so perhaps any problems are being masked by that.
>
> I've read up on the issue and we don't appear to be experiencing any of the 
> bugs related to this issue that I can find in the news groups.  No long 
> running transactions, no old nodes in the sl_ tables.  In general, the system 
> appears to be healthy (idle proc time ~95%), good buffer cache hit ratios, 
> etc.
>
> Thanks for the replies though.  I'll look into implementing 2.1 although we 
> just did the upgrade to 2.0.7 and I'm not sure management will go for another 
> down during the holiday season.  Just doing my due diligence as our load will 
> rise steadily through the holiday season to very large load on these servers 
> and I wanted to make sure the servers looked solid before we through 30 X the 
> current load at them.

Now that a few days have passed,

is your sl_log_1 count still growing or has it dropped?  If your sl_log 
tables keep growing and aren't being truncated by the cleanup thread 
then you have a problem that will eventually get worse.  If the 119,239 
rows in sl_log_1 was a temporary thing due to your application doing 
lots of updates then that might be normal for your system.

>
> Mike Wilson
> Predicate Logic
> Cell: (310) 600-8777
> SkypeID: lycovian
>
>
>
>
> On Nov 11, 2011, at 11:09 AM, Steve Singer wrote:
>
>> On 11-11-11 02:04 PM, Mike Wilson wrote:
>>>
>>> Mike Wilson
>>> Predicate Logic
>>> Cell: (310) 600-8777
>>> SkypeID: lycovian
>>>
>>>
>>>  From my postgresql.log:
>>> 2011-11-11 11:03:15.237 PST db1.lax.jib(55096):LOG:  duration: 133.011 ms  
>>> statement: fetch 500 from LOG;
>>> 2011-11-11 11:03:17.241 PST db1.lax.jib(55096):LOG:  duration: 134.842 ms  
>>> statement: fetch 500 from LOG;
>>> 2011-11-11 11:03:19.239 PST db1.lax.jib(55096):LOG:  duration: 133.919 ms  
>>> statement: fetch 500 from LOG;
>>> 2011-11-11 11:03:21.240 PST db1.lax.jib(55096):LOG:  duration: 133.194 ms  
>>> statement: fetch 500 from LOG;
>>> 2011-11-11 11:03:23.241 PST db1.lax.jib(55096):LOG:  duration: 134.288 ms  
>>> statement: fetch 500 from LOG;
>>> 2011-11-11 11:03:25.241 PST db1.lax.jib(55096):LOG:  duration: 133.226 ms  
>>> statement: fetch 500 from LOG;
>>>
>>> I'm only logging statements that take longer than 100ms to run.
>>>
>>> Here is my output from sl_log1/2:
>>> select (select count(*) from sl_log_1) sl_log_1, (select count(*) from 
>>> sl_log_2) sl_log_2;
>>>   sl_log_1 | sl_log_2
>>> ----------+----------
>>>     119239 |    43685
>>
>> The fetch is taking a long time because sl_log_1 is big.  (The reason it 
>> takes so long is actually a bug that was fixed in 2.1)  sl_log_1 being that 
>> big probably means that log switching isn't happening.
>>
>> Do you have any nodes that are behind?  (query sl_status on all your nodes)
>> Do you have any old nodes that are still listed in sl_node that you aren't 
>> using anymore?
>> Do (did) you have a long running transaction in your system that is 
>> preventing the log switch from taking place?
>>
>>
>>
>>
>>
>>>
>>>
>>> On Nov 11, 2011, at 5:07 AM, Steve Singer wrote:
>>>
>>>> On 11-11-09 01:19 PM, Mike Wilson wrote:
>>>>> Seeing "fetch 500 from LOG" almost continuously in my PG logs for a new 
>>>>> Slony 2.0.7 install.  The previous version (2.0.3?) didn't show these 
>>>>> messages in the PG log.  Researching the issue, historically, this 
>>>>> message was usually accompanied by a performance issue.  This isn't the 
>>>>> case with my databases though, they appear to be running just as well as 
>>>>> ever and the lag between replicated nodes appears to be about the same as 
>>>>> the previous version.
>>>>>
>>>>> I guess my question is what does this message mean in this version of 
>>>>> Slony?  Is it an indication of sub-optimal slon parameters?
>>>>> slon -g 20 $SLON_CLUSTER "host=$HOSTNAME port=$PORT dbname=$DB user=$USER"
>>>>>
>>>>> And how can I get rid of it if it's not an issue?
>>>>>
>>>>> Mike
>>>>
>>>> What is causing the 'fetch 500' statements to show up in the server log? 
>>>> Are you only logging SQL that takes longer than x milliseconds? If so how 
>>>> long are your fetch 500 statements taking?  How many rows are in your 
>>>> sl_log_1 and sl_log_2?
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Slony1-general mailing list
>>>>> [email protected]
>>>>> http://lists.slony.info/mailman/listinfo/slony1-general
>>>>
>>>
>>
>

_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to