Hi Étienne,

> Aside from the fact that collecting files is
> harder than stdout, it create issues regarding file rotation. Also, I
> need to know the actual file names to monitor.


File rotation is entirely orchestrated by the Log4J framework. You just
need to define a RollingFile appender with the same fileName (and
filePattern) as the log file defined in worker.childopts of storm.yaml.

Finally, the LogWriter output is collected by a third JVM, the
> supervisor, which logs it back using Log4j, as configured by the
> cluster.xml file, and we end with an original message that was wrapped
> three times.
>

Can you try to define both RollingFile as appenders in both cluster.xml and
worker.xml ? Then collect the logs with whatever process is responsible for
that.

On Wed, 30 Aug 2023 at 17:59, Étienne Miret <etienne.mi...@sciforma.com>
wrote:

>
> Hello,
>
> I’m having issues having Storm log to stdout.
>
> Ideally, I would like Storm to send its logs to stdout, so they can be
> collected by the service manager. I’m using Docker, but I know systemd
> and launchd also expect daemons they manage to send their logs to stdout
> so they can collect them. For now, it seems that such use case isn’t
> supported.
>
> Looking at the documentation, all I find is
> <https://storm.apache.org/releases/2.5.0/Logs.html>, which merely says I
> can set the storm.log.dir property to choose an output directory. This
> isn’t very convenient. Aside from the fact that collecting files is
> harder than stdout, it create issues regarding file rotation. Also, I
> need to know the actual file names to monitor.
>
> Digging a bit more, I found about the log4j2/worker.xml and
> log4j2/cluster.xml files. So I created a dummy cluster where I
> customized those files (see attachments) and deployed a simple “Hello
> World!” topology (which merely logs "Hello World!" each second).
>
> This is the log messages I get:
> > 2023-08-30 14:10:41.400+0000 <cluster
> logger="org.apache.storm.utils.Utils">Worker Process
> 3d528be8-55f9-4b51-aca0-c829f493259b:2023-08-30 14:10:41.400+0000 <worker
> logger="STDERR">2023-08-30 14:10:41.385+0000 <worker
> logger="com.sciforma.test.LoggingSpout">Hello
> World!</worker></worker></cluster>
>
> What happened is my topology is using Log4j2 for logging. This means
> whatever logs it creates is handled according to the worker.xml file. So
> this is the output on the worker JVM stdout:
> > 2023-08-30 14:10:41.385+0000 <worker
> logger="com.sciforma.test.LoggingSpout">Hello World!</worker>
>
> Then, this output is collected by some LogWriter process, that just logs
> it back to Log4j. Inside that other JVM, Log4j format the message again
> according to worker.xml and outputs:
> > 2023-08-30 14:10:41.400+0000 <worker logger="STDERR">2023-08-30
> 14:10:41.385+0000 <worker logger="com.sciforma.test.LoggingSpout">Hello
> World!</worker></worker>
>
> (By the way, this LogWriter uses a logger called STDERR to log both
> stdout and stderr, which I find weird.)
>
> Finally, the LogWriter output is collected by a third JVM, the
> supervisor, which logs it back using Log4j, as configured by the
> cluster.xml file, and we end with an original message that was wrapped
> three times.
>
> I need it wrapped exactly once, and I need the metadata of the log event
> (logger, MDC, timestamp…) to be those from my topology, not those from
> the LogWriter nor the supervisor. Any idea how I can achieve that?
>
> I considered using a "%m%n" pattern layout for the LogWriter. This would
> allow it to be transparent, but I see the log4j.configurationFile
> property is set to the same value in the LogWriter and the worker JVMs.
> Is it possible to configure different ones?
>
> Then, there is the supervisor JVM. I can set its pattern to %m%n too,
> but the “Worker Process ${workerId}:” prefix is hardcoded (see
> the BasicContainer.launch() method). Also, using such a pattern would
> make logs from the supervisor itself almost impossible to exploit.
>
> In the end, the need is that the Docker container outputs the logs to
> its own stdout, so the Docker daemon can collect them. I’m now
> considering a very different approach where Log4j is configured to log
> to some fifo file, and the docker image has a PID 1 process that:
>      1. creates the fifo,
>      2. launches storm supervisor in the background,
>      3. reads the fifo and forward it to stdout.
>
> This last option seems tedious, but I believe it should work. Any better
> suggestion?
>
> Regards,
>
> --
> Étienne Miret
>
>
>
>
> <Sciforma_Icon_45.png> Etienne Miret
> etienne.mi...@sciforma.com <mailto:etienne.mi...@sciforma.com>.
>
>

Reply via email to