Hi Étienne,
> Aside from the fact that collecting files is > harder than stdout, it create issues regarding file rotation. Also, I > need to know the actual file names to monitor. File rotation is entirely orchestrated by the Log4J framework. You just need to define a RollingFile appender with the same fileName (and filePattern) as the log file defined in worker.childopts of storm.yaml. Finally, the LogWriter output is collected by a third JVM, the > supervisor, which logs it back using Log4j, as configured by the > cluster.xml file, and we end with an original message that was wrapped > three times. > Can you try to define both RollingFile as appenders in both cluster.xml and worker.xml ? Then collect the logs with whatever process is responsible for that. On Wed, 30 Aug 2023 at 17:59, Étienne Miret <etienne.mi...@sciforma.com> wrote: > > Hello, > > I’m having issues having Storm log to stdout. > > Ideally, I would like Storm to send its logs to stdout, so they can be > collected by the service manager. I’m using Docker, but I know systemd > and launchd also expect daemons they manage to send their logs to stdout > so they can collect them. For now, it seems that such use case isn’t > supported. > > Looking at the documentation, all I find is > <https://storm.apache.org/releases/2.5.0/Logs.html>, which merely says I > can set the storm.log.dir property to choose an output directory. This > isn’t very convenient. Aside from the fact that collecting files is > harder than stdout, it create issues regarding file rotation. Also, I > need to know the actual file names to monitor. > > Digging a bit more, I found about the log4j2/worker.xml and > log4j2/cluster.xml files. So I created a dummy cluster where I > customized those files (see attachments) and deployed a simple “Hello > World!” topology (which merely logs "Hello World!" each second). > > This is the log messages I get: > > 2023-08-30 14:10:41.400+0000 <cluster > logger="org.apache.storm.utils.Utils">Worker Process > 3d528be8-55f9-4b51-aca0-c829f493259b:2023-08-30 14:10:41.400+0000 <worker > logger="STDERR">2023-08-30 14:10:41.385+0000 <worker > logger="com.sciforma.test.LoggingSpout">Hello > World!</worker></worker></cluster> > > What happened is my topology is using Log4j2 for logging. This means > whatever logs it creates is handled according to the worker.xml file. So > this is the output on the worker JVM stdout: > > 2023-08-30 14:10:41.385+0000 <worker > logger="com.sciforma.test.LoggingSpout">Hello World!</worker> > > Then, this output is collected by some LogWriter process, that just logs > it back to Log4j. Inside that other JVM, Log4j format the message again > according to worker.xml and outputs: > > 2023-08-30 14:10:41.400+0000 <worker logger="STDERR">2023-08-30 > 14:10:41.385+0000 <worker logger="com.sciforma.test.LoggingSpout">Hello > World!</worker></worker> > > (By the way, this LogWriter uses a logger called STDERR to log both > stdout and stderr, which I find weird.) > > Finally, the LogWriter output is collected by a third JVM, the > supervisor, which logs it back using Log4j, as configured by the > cluster.xml file, and we end with an original message that was wrapped > three times. > > I need it wrapped exactly once, and I need the metadata of the log event > (logger, MDC, timestamp…) to be those from my topology, not those from > the LogWriter nor the supervisor. Any idea how I can achieve that? > > I considered using a "%m%n" pattern layout for the LogWriter. This would > allow it to be transparent, but I see the log4j.configurationFile > property is set to the same value in the LogWriter and the worker JVMs. > Is it possible to configure different ones? > > Then, there is the supervisor JVM. I can set its pattern to %m%n too, > but the “Worker Process ${workerId}:” prefix is hardcoded (see > the BasicContainer.launch() method). Also, using such a pattern would > make logs from the supervisor itself almost impossible to exploit. > > In the end, the need is that the Docker container outputs the logs to > its own stdout, so the Docker daemon can collect them. I’m now > considering a very different approach where Log4j is configured to log > to some fifo file, and the docker image has a PID 1 process that: > 1. creates the fifo, > 2. launches storm supervisor in the background, > 3. reads the fifo and forward it to stdout. > > This last option seems tedious, but I believe it should work. Any better > suggestion? > > Regards, > > -- > Étienne Miret > > > > > <Sciforma_Icon_45.png> Etienne Miret > etienne.mi...@sciforma.com <mailto:etienne.mi...@sciforma.com>. > >