Here is one possibility:

A worker will stop writing to its log if the supervisor kills it.  As I know 
it, this happens when one of 2 things occur:
* nimbus orders the worker shut down - after not hearing from an executor or 
after a manual topology kill is run
* the supervisor doesn't receive a heartbeat message from the worker process.  
This appears to happen when the worker is resource constrained - namely - CPU.

In the second situation, very quickly after the worker is shutdown, nimbus 
tells a supervisor ( maybe the same one  as previously ) to start up a worker, 
doing the same tasks.  So, if you are watching the worker process running, you 
could say, walk away for a cup of coffee, and the worker could change from 
writing to worker_6702 to worker_6703.

How to tell this has happened:
* the worker pid will have changed
* the worker's communication port / node may have changed
* nimbus logs - if nimbus initiated the action and the logging level is set to 
info
* the supervisor logs - if the logging level is set to info
* count  and record the # of worker processes on your nodes
* the storm metric - new worker events - has not to me proved to be accurate.  
I do not know why.

From: clay teahouse <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: 2015,Tuesday, May 5 at 10:31
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, Bobby Evans 
<[email protected]<mailto:[email protected]>>
Subject: Re: storm logback freezing

Hello,
I am sorry for not being specific. The worker log is freezing. There is plenty 
of space on disk. I have made changes to logback config file, as follows.
<configuration scan="true" scanPeriod="30 seconds">
  <appender name="A1" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
      <pattern>%-4r [%t] %-5p %c - %m%n</pattern>
    </encoder>
  </appender>

  <appender name="A2" class="ch.qos.logback.core.rolling.RollingFileAppender">
      <file>/usr/local/storm/log/storm.log</file>
        <rollingPolicy 
class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
                
<fileNamePattern>/usr/local/storm/log/storm.log.%i.zip</fileNamePattern>
                <minIndex>1</minIndex>
                <maxIndex>5</maxIndex>
        </rollingPolicy>

        <triggeringPolicy 
class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
                <maxFileSize>10MB</maxFileSize>
        </triggeringPolicy>
      <append>true</append>
      <encoder>
          <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - 
%msg%n</pattern>
      </encoder>
  </appender>

  <logger name="org.apache.zookeeper" level="WARN"/>
  <root level="INFO">
    <appender-ref ref="A2"/>
  </root>

On Tue, May 5, 2015 at 8:34 AM, Bobby Evans 
<[email protected]<mailto:[email protected]>> wrote:
The only time I have seen logback "freeze" was when the disk it was writing to 
filled up.  And in that case some log messages started to block, and it was 
very painful.

- Bobby



On Tuesday, May 5, 2015 8:18 AM, Jeff Maass 
<[email protected]<mailto:[email protected]>> wrote:


there are at least 3 different logs everyone should have:
* nimbus
* worker_{port}
* supervisor

Which of these is "freezing"?

What version storm?

When a worker starts, does writing to its log work?

Have you made any changes to your logback configuration?



From: clay teahouse <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: 2015,Monday, May 4 at 20:21
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: storm logback freezing

Hi all,
Has anyone experienced a case where storm logback freezes? The topology seems 
to be functioning without an issue (I can see the results, in the destination 
consumers), but the storm log shows no progress. This usually happens a couple 
of hours after the topology starts, and not right away. I'd appreciate any 
feedback that what may be causing this issue.

thank you

Clay




Reply via email to