LuciferYang commented on PR #47500:
URL: https://github.com/apache/spark/pull/47500#issuecomment-2320426384

   In the scenario described by the current PR, I believe the issue is real:
   
   1. The initialization of the `org.apache.spark.internal.config` object 
precedes the actual effectiveness of the `--conf 
spark.log.structuredLogging.enabled=false` option from the command line. The 
former is initialized in 
`org.apache.spark.deploy.SparkSubmitArguments#loadEnvironmentArguments`, while 
the latter waits until `org.apache.spark.deploy.SparkSubmit#doSubmit` to 
execute `Logging.enableStructuredLogging()` or 
`Logging.disableStructuredLogging()`.
   
   2. The `org.apache.spark.internal.config` object triggers the invocation of 
the `Utils.localCanonicalHostName()` method during initialization.
   
   
https://github.com/apache/spark/blob/0602020eb3b346a8c50ad32eeda4e6dabb70c584/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1149-L1153
   
   3. If the environment variable `SPARK_LOCAL_HOSTNAME` is not set, the 
`Utils.localCanonicalHostName()` method will call 
`org.apache.spark.util.Utils#findLocalInetAddress` to initialize 
`org.apache.spark.util.Utils#localIpAddress`.
   
   
https://github.com/apache/spark/blob/0602020eb3b346a8c50ad32eeda4e6dabb70c584/core/src/main/scala/org/apache/spark/util/Utils.scala#L927-L929
   
   
https://github.com/apache/spark/blob/0602020eb3b346a8c50ad32eeda4e6dabb70c584/core/src/main/scala/org/apache/spark/util/Utils.scala#L913
   
   
https://github.com/apache/spark/blob/0602020eb3b346a8c50ad32eeda4e6dabb70c584/core/src/main/scala/org/apache/spark/util/Utils.scala#L871
   
   4. In the `findLocalInetAddress()` method, if the environment variable 
`SPARK_LOCAL_IP` is set, the issue described in the current PR will not occur.
   
   5. Otherwise, if `InetAddress.getLocalHost` can obtain the actual IP of the 
current host instead of `127.0.1.1`, the issue described in the current PR will 
also not occur.
   
   6. When the return value of `InetAddress.getLocalHost` is `127.0.1.1`, it 
triggers additional calculations to obtain the actual physical machine IP. At 
this time, it triggers log printing. However, since 
`Logging.enableStructuredLogging()` or `Logging.disableStructuredLogging()` has 
not been called based on `spark.log.structuredLogging.enabled` at this point, 
`org.apache.spark.internal.Logging#isStructuredLoggingEnabled` will return the 
default value, which is true. Therefore, it will load 
`org/apache/spark/log4j2-defaults.properties` to use `StructuredLogging`.
   
   
https://github.com/apache/spark/blob/0602020eb3b346a8c50ad32eeda4e6dabb70c584/core/src/main/scala/org/apache/spark/util/Utils.scala#L873-L911
   
   A possible solution I can think of is to change 
`org.apache.spark.internal.config#DRIVER_HOST_ADDRESS` and 
`org.apache.spark.internal.config#DRIVER_BIND_ADDRESS` from `val` to `lazy 
val`. This can avoid the aforementioned unexpected early initialization. I 
manually tested this solution, and it can restore the logs to normal. However, 
whether this will cause other side effects, I am currently unable to determine.
   
   Could you help confirm whether `InetAddress.getLocalHost` returns the actual 
IP address instead of `127.0.0.1` in your testing environment? @HyukjinKwon 
@gengliangwang 
   
   Also, do you have any other ideas for fixing this issue?  @HyukjinKwon 
@gengliangwang @pan3793 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to