here is the stackoverflow link https://stackoverflow.com/questions/73780259/spark-structured-streaming-stderr-getting-filled-up
On Mon, Sep 19, 2022 at 4:41 PM karan alang <karan.al...@gmail.com> wrote: > I've created a stackoverflow ticket for this as well > > On Mon, Sep 19, 2022 at 4:37 PM karan alang <karan.al...@gmail.com> wrote: > >> Hello All, >> I've a Spark Structured Streaming job on GCP Dataproc - which picks up >> data from Kafka, does processing and pushes data back into kafka topics. >> >> Couple of questions : >> 1. Does Spark put all the log (incl. INFO, WARN etc) into stderr ? >> What I notice is that stdout is empty, while all the logging is put in to >> stderr >> >> 2. Is there a way for me to expire the data in stderr (i.e. expire the >> older logs) ? >> Since I've a long running streaming job, the stderr gets filled up over >> time and nodes/VMs become unavailable. >> >> Pls advice. >> >> Here is output of the yarn logs command : >> ``` >> >> root@versa-structured-stream-v1-w-1:/home/karanalang# yarn logs >> -applicationId application_1663623368960_0008 -log_files stderr -size -500 >> >> 2022-09-19 23:26:01,439 INFO client.RMProxy: Connecting to >> ResourceManager at versa-structured-stream-v1-m/10.142.0.62:8032 >> >> 2022-09-19 23:26:01,696 INFO client.AHSProxy: Connecting to Application >> History server at versa-structured-stream-v1-m/10.142.0.62:10200 >> >> Can not find any log file matching the pattern: [stderr] for the >> container: container_e01_1663623368960_0008_01_000003 within the >> application: application_1663623368960_0008 >> >> Container: container_e01_1663623368960_0008_01_000002 on >> versa-structured-stream-v1-w-2.c.versa-sml-googl.internal:8026 >> >> LogAggregationType: LOCAL >> >> >> ======================================================================================================================= >> >> LogType:stderr >> >> LogLastModifiedTime:Mon Sep 19 23:26:02 +0000 2022 >> >> LogLength:44309782124 >> >> LogContents: >> >> , tenantId=3, vsnId=0, mstatsTotSentOctets=48210, >> mstatsTotRecvdOctets=242351, mstatsTotSessDuration=300000, >> mstatsTotSessCount=34, mstatsType=dest-stats, destIp=165.225.216.24, >> mstatsAttribs=,topic=syslog.ueba-us4.v1.versa.demo3,customer=versa type(row) >> is -> <class 'str'> >> >> 22/09/19 23:26:02 WARN >> org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer: KafkaDataConsumer >> is not running in UninterruptibleThread. It may hang when >> KafkaDataConsumer's methods are interrupted because of KAFKA-1894 >> >> End of LogType:stderr.This log file belongs to a running container >> (container_e01_1663623368960_0008_01_000002) and so may not be complete. >> >> *********************************************************************** >> >> >> >> Container: container_e01_1663623368960_0008_01_000001 on >> versa-structured-stream-v1-w-1.c.versa-sml-googl.internal:8026 >> >> LogAggregationType: LOCAL >> >> >> ======================================================================================================================= >> >> LogType:stderr >> >> LogLastModifiedTime:Mon Sep 19 22:54:55 +0000 2022 >> >> LogLength:17367929 >> >> LogContents: >> >> on syslog.ueba-us4.v1.versa.demo3-2 >> >> 22/09/19 22:52:52 INFO >> org.apache.kafka.clients.consumer.internals.SubscriptionState: [Consumer >> clientId=consumer-spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor-1, >> groupId=spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor] >> Resetting offset for partition syslog.ueba-us4.v1.versa.demo3-2 to offset >> 449568676. >> >> 22/09/19 22:54:55 ERROR >> org.apache.spark.executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM >> >> End of LogType:stderr. >> >> *********************************************************************** >> >> ``` >> >> >>