Hi all, I have a Flink cluster (1.4.0) built from flink's docker image, with 1 job manager and 2 task managers. I'm trying to use logback isntead of log4j and as far as the cluster configurations goes, seems alright.
Following https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/best_practices.html I have removed log4j-1.2.xx.jar and sfl4j-log4j12-xxx.jar from lib/ and added logback-classic.jar, logback-cor.jar and log4j-over-slf4j.jar. This is a snippet from the logs: *2018-06-26 11:02:47.760 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - hadoop login* *2018-06-26 11:02:47.761 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - hadoop login commit* *2018-06-26 11:02:47.764 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - using local user:UnixPrincipal: flink* *2018-06-26 11:02:47.764 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - Using user: "UnixPrincipal: flink" with name flink* *2018-06-26 11:02:47.765 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - User entry: "flink"* *2018-06-26 11:02:47.765 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - Assuming keytab is managed externally since logged in from subject.* *2018-06-26 11:02:47.766 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - UGI loginUser:flink (auth:SIMPLE)* *2018-06-26 11:02:47.766 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - Current Hadoop/Kerberos user: flink* *2018-06-26 11:02:47.766 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.151-b12* *2018-06-26 11:02:47.766 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - Maximum heap size: 981 MiBytes* *2018-06-26 11:02:47.766 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - JAVA_HOME: /docker-java-home/jre* *2018-06-26 11:02:47.771 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - Hadoop version: 2.8.0* *2018-06-26 11:02:47.771 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options:* *2018-06-26 11:02:47.771 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms1024m* *2018-06-26 11:02:47.771 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx1024m* *2018-06-26 11:02:47.771 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties* *2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml* *2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - Program Arguments:* *2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - --configDir* *2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - /opt/flink/conf* *2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - --executionMode* *2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - cluster* *2018-06-26 11:02:47.772 [main] INFO org.apache.flink.runtime.jobmanager.JobManager - Classpath: /opt/flink/lib/flink-python_2.11-1.4.0.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.4.0.jar:/opt/flink/lib/log4j-over-slf4j.jar:/opt/flink/lib/logback-classic.jar:/opt/flink/lib/logback-core.jar:/opt/flink/lib/flink-dist_2.11-1.4.0.jar:::* Above you can see the logback jars are properly in the classpath and that is using logback-console.xml as the configuration file. This is the content logback-console.xml: *<configuration>* * <appender name="console" class="ch.qos.logback.core.ConsoleAppender">* * <encoder>* * <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{60} %X{sourceThread} - %msg%n</pattern>* * </encoder>* * </appender>* * <root level="DEBUG">* * <appender-ref ref="console"/>* * </root>* *</configuration>* Which justify the DEBUG logs seem above as well, setting the level to DEBUG is working fine. Now comes the weirdest part. This is part of the pipeline: import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class Pipeline { private static final Logger LOGGER = LoggerFactory.getLogger(Pipeline.class); public static void main(String[] args) throws Exception { LOGGER.info("**************************************************** INFO"); LOGGER.warn("**************************************************** WARN"); LOGGER.error("**************************************************** ERROR"); LOGGER.debug("**************************************************** DEBUG"); LOGGER.trace("**************************************************** TRACE"); As you can see, I'm important slf4j logger and logger factory and instantiating the logger. Then, I just log all 5 levels. I can't see ANY of these logs in the job manager / task manager logs. They work fine locally though: *13:16:11.030 [main] INFO Pipeline - **************************************************** INFO* *13:16:11.035 [main] WARN Pipeline - **************************************************** WARN* *13:16:16.239 [main] ERROR Pipeline - **************************************************** ERROR* *13:16:16.239 [main] DEBUG Pipeline - **************************************************** DEBUG* It's not the log level as I can clearly see DEBUG and INFO logs from Flink, and the logback-console.xml is alright. The logs are not getting to the cluster for some reason. Then I tried this: ch.qos.logback.classic.Logger testLog = (ch.qos.logback.classic.Logger)LOGGER; testLog.info("**************************************************** INFO"); testLog.warn("**************************************************** WARN"); testLog.error("**************************************************** ERROR"); testLog.debug("**************************************************** DEBUG"); testLog.trace("**************************************************** TRACE"); if(LOGGER == null) { throw new Exception("************* logger is NULL"); } else if (LOGGER != null) { throw new Exception("************* logger is not NULL " + LOGGER.getName()) LOGGER.getClass().; } and got this: *2018-06-26 11:24:27.268 [jobmanager-future-thread-5] DEBUG org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler - Error while handling request.* *java.util.concurrent.CompletionException: org.apache.flink.util.FlinkException: Could not run the jar.* * at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:90)* * at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)* * at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)* * at java.util.concurrent.FutureTask.run(FutureTask.java:266)* * at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)* * at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)* * at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)* * at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)* * at java.lang.Thread.run(Thread.java:748)* *Caused by: org.apache.flink.util.FlinkException: Could not run the jar.* * ... 9 common frames omitted* *Caused by: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.* * at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:542)* * at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:417)* * at org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:83)* * at org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:334)* * at org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:76)* * at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:69)* * ... 8 common frames omitted* *Caused by: java.lang.Exception: ************* logger is not NULL DEBUG 1656864061* * at io.intellisense.MongoPipeline.main(MongoPipeline.java:79)* * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)* * at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)* * at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)* * at java.lang.reflect.Method.invoke(Method.java:498)* * at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)* * ... 13 common frames omitted* This means the logger was successfully set and received the DEBUG level as the *conf/logback-console.yml* and again, I was only able to see this error because it came from *org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler*, not from my pipeline. I'm running out of options, does anyone have any idea?