Hello,

I’m having issues having Storm log to stdout.

Ideally, I would like Storm to send its logs to stdout, so they can be collected by the service manager. I’m using Docker, but I know systemd and launchd also expect daemons they manage to send their logs to stdout so they can collect them. For now, it seems that such use case isn’t supported.

Looking at the documentation, all I find is <https://storm.apache.org/releases/2.5.0/Logs.html>, which merely says I can set the storm.log.dir property to choose an output directory. This isn’t very convenient. Aside from the fact that collecting files is harder than stdout, it create issues regarding file rotation. Also, I need to know the actual file names to monitor.

Digging a bit more, I found about the log4j2/worker.xml and log4j2/cluster.xml files. So I created a dummy cluster where I customized those files (see attachments) and deployed a simple “Hello World!” topology (which merely logs "Hello World!" each second).

This is the log messages I get:
2023-08-30 14:10:41.400+0000 <cluster logger="org.apache.storm.utils.Utils">Worker Process 
3d528be8-55f9-4b51-aca0-c829f493259b:2023-08-30 14:10:41.400+0000 <worker logger="STDERR">2023-08-30 14:10:41.385+0000 <worker 
logger="com.sciforma.test.LoggingSpout">Hello World!</worker></worker></cluster>

What happened is my topology is using Log4j2 for logging. This means whatever logs it creates is handled according to the worker.xml file. So this is the output on the worker JVM stdout:
2023-08-30 14:10:41.385+0000 <worker logger="com.sciforma.test.LoggingSpout">Hello 
World!</worker>

Then, this output is collected by some LogWriter process, that just logs it back to Log4j. Inside that other JVM, Log4j format the message again according to worker.xml and outputs:
2023-08-30 14:10:41.400+0000 <worker logger="STDERR">2023-08-30 14:10:41.385+0000 <worker 
logger="com.sciforma.test.LoggingSpout">Hello World!</worker></worker>

(By the way, this LogWriter uses a logger called STDERR to log both stdout and stderr, which I find weird.)

Finally, the LogWriter output is collected by a third JVM, the supervisor, which logs it back using Log4j, as configured by the cluster.xml file, and we end with an original message that was wrapped three times.

I need it wrapped exactly once, and I need the metadata of the log event (logger, MDC, timestamp…) to be those from my topology, not those from the LogWriter nor the supervisor. Any idea how I can achieve that?

I considered using a "%m%n" pattern layout for the LogWriter. This would allow it to be transparent, but I see the log4j.configurationFile property is set to the same value in the LogWriter and the worker JVMs. Is it possible to configure different ones?

Then, there is the supervisor JVM. I can set its pattern to %m%n too, but the “Worker Process ${workerId}:” prefix is hardcoded (see the BasicContainer.launch() method). Also, using such a pattern would make logs from the supervisor itself almost impossible to exploit.

In the end, the need is that the Docker container outputs the logs to its own stdout, so the Docker daemon can collect them. I’m now considering a very different approach where Log4j is configured to log to some fifo file, and the docker image has a PID 1 process that:
    1. creates the fifo,
    2. launches storm supervisor in the background,
    3. reads the fifo and forward it to stdout.

This last option seems tedious, but I believe it should work. Any better suggestion?

Regards,

--
Étienne Miret



 
<Sciforma_Icon_45.png> Etienne Miret
[email protected] <mailto:[email protected]>.
 
<?xml version="1.0" encoding="UTF-8" ?>
<Configuration>
  <Appenders>
    <Console name="stdout">
      <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSSZ} &lt;cluster logger=&quot;%c&quot;>%msg&lt;/cluster>%n"/>
    </Console>
  </Appenders>
  <Loggers>
    <Root level="info"> <!-- We log everything -->
      <AppenderRef ref="stdout"/>
    </Root>
  </Loggers>
</Configuration>
<?xml version="1.0" encoding="UTF-8"?>
<Configuration>
  <Appenders>
    <Console name="stdout" target="SYSTEM_OUT">
      <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSSZ} &lt;worker logger=&quot;%c&quot;>%msg&lt;/worker>%n"/>
    </Console>
  </Appenders>
  <Loggers>
    <Root level="ALL">
      <AppenderRef ref="stdout"/>
    </Root>
  </Loggers>
</Configuration>

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to