Hello,
I have some spark streaming jobs listening Kafka and deployed over YARN.
I used the variable ${spark.yarn.app.container.log.dir} in my log4j in order to
write my logs.
It works fine , the logs are well written, and well agregated in my HDFS.
But, I have 2 issues with this approach :
1/ I need to retrieve in realtime those logs to load them inside ELK and have
Kibana dashboard.
Usually I use syslog & logstash to do that but as the directories of my logs
change everytime it is not possible
2/ The logs aggregated inside HDFS is not easily readable , I must use "yarn
logs"
So, what is the best practice to do my requirement :
Write logs from each datanode and load them inside ELK
Tks a lot for your support
Nicolas