Hi When I use python logging for my unit test. I am able to control the output format. I get the log level, the file and line number, then the msg
[INFO testEstimatedScalingFactors.py:166 - test_B_convertCountsToInts()] BEGIN In my spark driver I am able to get the log4j logger spark = SparkSession\ .builder\ .appName("estimatedScalingFactors")\ .getOrCreate() # # https://medium.com/@lubna_22592/building-production-pyspark-jobs-5480d03fd71e # initialize logger for yarn cluster logs # log4jLogger = spark.sparkContext._jvm.org.apache.log4j logger = log4jLogger.LogManager.getLogger(__name__) However it only outputs the message. As a hack I have been adding the function names to the msg. I wonder if this is because of the way I make my python code available. When I submit my job using ‘$ gcloud dataproc jobs submit pyspark’ I pass my python file in a zip file --py-files ${extraPkg} I use level warn because the driver info logs are very verbose ############################################################################### def rowSums( self, countsSparkDF, columnNames ): self.logger.warn( "rowSums BEGIN" ) # https://stackoverflow.com/a/54283997/4586180 retDF = countsSparkDF.na.fill( 0 ).withColumn( "rowSum" , reduce( add, [col( x ) for x in columnNames] ) ) self.logger.warn( "rowSums retDF numRows:{} numCols:{}"\ .format( retDF.count(), len( retDF.columns ) ) ) self.logger.warn( "rowSums END\n" ) return retDF kind regards Andy