Ok I think I identified the issue: 1. I accidentally bundled another version of slf4j in my job jar, which results in some incompatibility with the slf4j jar bundled with flink/bin. Apparently slf4j in this case defaults to something that ignores the conf? Once I removed slf4j from my job jar, the logger properties were properly consumed. 2. Looks like the line log4j.appender.file.file=${log.file} on the default properties didn't work properly (resulting in log4j null errors), it started working after I just set it manually to opt/flink/logs/output.log.
Thanks for your guidance! Li On Thu, Dec 12, 2019 at 12:09 PM Li Peng <li.p...@doordash.com> wrote: > Hey ouywl, interesting, I figured something like that would happen. I > actually replaced all the log4j-x files with the same config I originally > posted, including log4j-console, but that didn't change the behavior either. > > Hey Yang, yes I verified the properties files are as I configured, and > that the logs don't match up with it. Here are the JVM arguments, if that's > what you were looking for: > > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: > OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.232-b09 > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > Maximum heap size: 989 MiBytes > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > JAVA_HOME: /usr/local/openjdk-8 > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > Hadoop version: 2.8.3 > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM > Options: > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Xms1024m > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Xmx1024m > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > Program Arguments: > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > --configDir > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > /opt/flink/conf > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Djobmanager.rpc.address=myjob-manager > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dparallelism.default=2 > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dblob.server.port=6124 > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dqueryable-state.server.ports=6125 > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Djobmanager.heap.size=3000m > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dtaskmanager.heap.size=3000m > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dmetrics.reporter.stsd.class=org.apache.flink.metrics.statsd.StatsDReporter > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dmetrics.reporter.stsd.host=myhost.com > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dmetrics.reporter.stsd.port=8125 > [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dmetrics.system-resource=true > > Thanks, > Li > > On Thu, Dec 12, 2019 at 4:40 AM ouywl <ou...@139.com> wrote: > >> @Li Peng >> I found your problems. Your start cmd use args “start-foreground”, >> It will run “exec "${FLINK_BIN_DIR}"/flink-console.sh ${ENTRY_POINT_NAME} >> "${ARGS[@]}””, and In ' flink-console.sh’, the code is “log_setting=( >> "-Dlog4j.configuration=file:${FLINK_CONF_DIR}/log4j-console.properties" >> "-Dlogback.configurationFile=file:${FLINK_CONF_DIR}/logback-console.xml")” >> . So the log4j.properties not work. It need log4j-console.properties and >> logback-console.xml. >> >> ouywl >> ou...@139.com >> >> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=ouywl&uid=ouywl%40139.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fsma8dc7719018ba2517da7111b3db5a170.jpg&items=%5B%22ouywl%40139.com%22%5D> >> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >> >> On 12/12/2019 15:35,ouywl<ou...@139.com> <ou...@139.com> wrote: >> >> HI yang, >> Could you give more info detail? log4j.properties content, and The k8s >> yaml. Is use the dockerfile in flink-container? When I test it use the >> default per-job yaml in flick-container? It is only show logs in docker >> infos. And not logs in /opt/flink/log. >> >> ouywl >> ou...@139.com >> >> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=ouywl&uid=ouywl%40139.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fsma8dc7719018ba2517da7111b3db5a170.jpg&items=%5B%22ouywl%40139.com%22%5D> >> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >> >> On 12/12/2019 13:47,Yang Wang<danrtsey...@gmail.com> >> <danrtsey...@gmail.com> wrote: >> >> Hi Peng, >> >> What i mean is to use `docker exec` into the running pod and `ps` to get >> the real >> command that is running for jobmanager. >> Do you have checked the /opt/flink/conf/log4j.properties is right? >> >> I have tested standalone per-job on my kubernetes cluster, the logs show >> up as expected. >> >> >> Best, >> Yang >> >> Li Peng <li.p...@doordash.com> 于2019年12月12日周四 上午2:59写道: >> >>> Hey Yang, here are the commands: >>> >>> "/opt/flink/bin/taskmanager.sh", >>> "start-foreground", >>> "-Djobmanager.rpc.address={{ .Chart.Name }}-job-manager", >>> "-Dtaskmanager.numberOfTaskSlots=1" >>> >>> "/opt/flink/bin/standalone-job.sh", >>> "start-foreground", >>> "-Djobmanager.rpc.address={{ .Chart.Name }}-job-manager", >>> "-Dparallelism.default={{ .Values.task.replicaCount }}" >>> >>> Yes it's very curious that I don't see any logs actually written to >>> /opt/flink/log. >>> >>> On Tue, Dec 10, 2019 at 11:17 PM Yang Wang <danrtsey...@gmail.com> >>> wrote: >>> >>>> Could you find the logs under /opt/flink/log/jobmanager.log? If not, >>>> please share the >>>> commands the JobManager and TaskManager are using? If the command is >>>> correct >>>> and the log4j under /opt/flink/conf is expected, it is so curious why >>>> we could not get the logs. >>>> >>>> >>>> Best, >>>> Yang >>>> >>>> Li Peng <li.p...@doordash.com> 于2019年12月11日周三 下午1:24写道: >>>> >>>>> Ah I see. I think the Flink app is reading files from >>>>> /opt/flink/conf correctly as it is, since changes I make to flink-conf are >>>>> picked up as expected, it's just the log4j properties that are either not >>>>> being used, or don't apply to stdout or whatever source k8 uses for its >>>>> logs? Given that the pods don't seem to have logs written to file >>>>> anywhere, contrary to the properties, I'm inclined to say it's the former >>>>> and that the log4j properties just aren't being picked up. Still have no >>>>> idea why though. >>>>> >>>>> On Tue, Dec 10, 2019 at 6:56 PM Yun Tang <myas...@live.com> wrote: >>>>> >>>>>> Sure, /opt/flink/conf is mounted as a volume from the configmap. >>>>>> >>>>>> >>>>>> >>>>>> Best >>>>>> >>>>>> Yun Tang >>>>>> >>>>>> >>>>>> >>>>>> *From: *Li Peng <li.p...@doordash.com> >>>>>> *Date: *Wednesday, December 11, 2019 at 9:37 AM >>>>>> *To: *Yang Wang <danrtsey...@gmail.com> >>>>>> *Cc: *vino yang <yanghua1...@gmail.com>, user <user@flink.apache.org> >>>>>> *Subject: *Re: Flink on Kubernetes seems to ignore log4j.properties >>>>>> >>>>>> >>>>>> >>>>>> 1. Hey Yun, I'm calling /opt/flink/bin/standalone-job.sh and >>>>>> /opt/flink/bin/taskmanager.sh on my job and task managers respectively. >>>>>> It's based on the setup described here: >>>>>> http://shzhangji.com/blog/2019/08/24/deploy-flink-job-cluster-on-kubernetes/ >>>>>> . >>>>>> I haven't tried the configmap approach yet, does it also replace the conf >>>>>> files in /opt/flink/conf? >>>>>> >>>>>> 2. Hey Vino, here's a sample of the kubernetes: >>>>>> https://pastebin.com/fqJrgjZu I didn't change any patterns from the >>>>>> default, so the string patterns should look the same, but as you can see >>>>>> it's full of info checkpoint logs that I originally was trying to >>>>>> suppress. >>>>>> Based on my log4j.properties, the level should be set to WARN. I couldn't >>>>>> actually find any .out files on the pod, this is from the kubectl logs >>>>>> command. I also didn't see any files in /opt/flink/log, which I thought >>>>>> my >>>>>> log4j was specified to do, hence me thinking that the properties weren't >>>>>> actually being consumed. I also have the same properties in my >>>>>> src/main/resources folder. >>>>>> >>>>>> 3. Hey Yang, yes this is a standalone session cluster. I did specify >>>>>> in the docker file to copy the log4j.properties to the /opt/flink/conf >>>>>> folder on the image, and I confirmed that the properties are correct >>>>>> when I >>>>>> bash'd into the pod and viewed them manually. >>>>>> >>>>>> >>>>>> >>>>>> Incidentally, I also tried passing the -Dlog4j.configuration argument >>>>>> to the programs, and it doesn't work either. And based on what I'm >>>>>> reading >>>>>> on jira, that option is not really supported anymore? >>>>>> >>>>>> >>>>>> >>>>>> Thanks for your responses, folks! >>>>>> >>>>>> Li >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Dec 9, 2019 at 7:10 PM Yang Wang <danrtsey...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Hi Li Peng, >>>>>> >>>>>> >>>>>> >>>>>> You are running standalone session cluster or per-job cluster on >>>>>> kubernetes. Right? >>>>>> >>>>>> If so, i think you need to check your log4j.properties in the image, >>>>>> not local. The log is >>>>>> >>>>>> stored to /opt/flink/log/jobmanager.log by default. >>>>>> >>>>>> >>>>>> >>>>>> If you are running active Kubernetes integration for a fresh taste. >>>>>> The following cli option >>>>>> >>>>>> could be used to remove the redirect. >>>>>> >>>>>> -Dkubernetes.container-start-command-template="%java% %classpath% >>>>>> %jvmmem% %jvmopts% %logging% %class% %args%" >>>>>> >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> Yang >>>>>> >>>>>> >>>>>> >>>>>> vino yang <yanghua1...@gmail.com> 于2019年12月10日周二 上午10:55写道: >>>>>> >>>>>> Hi Li, >>>>>> >>>>>> >>>>>> >>>>>> A potential reason could be conflicting logging frameworks. Can you >>>>>> share the log in your .out file and let us know if the print format of >>>>>> the >>>>>> log is the same as the configuration file you gave. >>>>>> >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> Vino >>>>>> >>>>>> >>>>>> >>>>>> Li Peng <li.p...@doordash.com> 于2019年12月10日周二 上午10:09写道: >>>>>> >>>>>> Hey folks, I noticed that my kubernetes flink logs (reached via *kubectl >>>>>> logs <pod-name>*) completely ignore any of the configurations I put >>>>>> into /flink/conf/. I set the logger level to WARN, yet I still see INFO >>>>>> level logging from flink loggers >>>>>> like org.apache.flink.runtime.checkpoint.CheckpointCoordinator. I even >>>>>> made >>>>>> copied the same properties to /flink/conf/log4j-console.properties >>>>>> and log4j-cli.properties. >>>>>> >>>>>> >>>>>> >>>>>> From what I can tell, kubernetes just listens to stdout and stderr, >>>>>> so shouldn't the log4j.properties control output to them? Anyone seen >>>>>> this >>>>>> issue before? >>>>>> >>>>>> >>>>>> >>>>>> Here is my log4j.properties: >>>>>> >>>>>> >>>>>> # This affects logging for both user code and Flink >>>>>> log4j.rootLogger=WARN, file, console, stdout >>>>>> >>>>>> # Uncomment this if you want to _only_ change Flink's logging >>>>>> log4j.logger.org.apache.flink=WARN >>>>>> >>>>>> # The following lines keep the log level of common libraries/connectors >>>>>> on >>>>>> # log level INFO. The root logger does not override this. You have to >>>>>> manually >>>>>> # change the log levels here. >>>>>> log4j.logger.akka=INFO >>>>>> log4j.logger.org.apache.kafka=INFO >>>>>> log4j.logger.org.apache.hadoop=INFO >>>>>> log4j.logger.org.apache.zookeeper=INFO >>>>>> >>>>>> # Log all infos in the given file >>>>>> log4j.appender.file=org.apache.log4j.FileAppender >>>>>> log4j.appender.file.file=${log.file} >>>>>> log4j.appender.file.append=false >>>>>> log4j.appender.file.layout=org.apache.log4j.PatternLayout >>>>>> log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} >>>>>> %-5p %-60c %x - %m%n >>>>>> >>>>>> # Log all infos to the console >>>>>> log4j.appender.console=org.apache.log4j.ConsoleAppender >>>>>> log4j.appender.console.layout=org.apache.log4j.PatternLayout >>>>>> log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd >>>>>> HH:mm:ss,SSS} %-5p %-60c %x - %m%n >>>>>> >>>>>> # Suppress the irrelevant (wrong) warnings from the Netty channel handler >>>>>> log4j.logger.org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline=ERROR, >>>>>> file, console >>>>>> log4j.logger.org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction=WARN >>>>>> log4j.logger.org.apache.flink.runtime.checkpoint=WARN >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Li >>>>>> >>>>>>