LuciferYang commented on PR #47500: URL: https://github.com/apache/spark/pull/47500#issuecomment-2320426384
In the scenario described by the current PR, I believe the issue is real: 1. The initialization of the `org.apache.spark.internal.config` object precedes the actual effectiveness of the `--conf spark.log.structuredLogging.enabled=false` option from the command line. The former is initialized in `org.apache.spark.deploy.SparkSubmitArguments#loadEnvironmentArguments`, while the latter waits until `org.apache.spark.deploy.SparkSubmit#doSubmit` to execute `Logging.enableStructuredLogging()` or `Logging.disableStructuredLogging()`. 2. The `org.apache.spark.internal.config` object triggers the invocation of the `Utils.localCanonicalHostName()` method during initialization. https://github.com/apache/spark/blob/0602020eb3b346a8c50ad32eeda4e6dabb70c584/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1149-L1153 3. If the environment variable `SPARK_LOCAL_HOSTNAME` is not set, the `Utils.localCanonicalHostName()` method will call `org.apache.spark.util.Utils#findLocalInetAddress` to initialize `org.apache.spark.util.Utils#localIpAddress`. https://github.com/apache/spark/blob/0602020eb3b346a8c50ad32eeda4e6dabb70c584/core/src/main/scala/org/apache/spark/util/Utils.scala#L927-L929 https://github.com/apache/spark/blob/0602020eb3b346a8c50ad32eeda4e6dabb70c584/core/src/main/scala/org/apache/spark/util/Utils.scala#L913 https://github.com/apache/spark/blob/0602020eb3b346a8c50ad32eeda4e6dabb70c584/core/src/main/scala/org/apache/spark/util/Utils.scala#L871 4. In the `findLocalInetAddress()` method, if the environment variable `SPARK_LOCAL_IP` is set, the issue described in the current PR will not occur. 5. Otherwise, if `InetAddress.getLocalHost` can obtain the actual IP of the current host instead of `127.0.1.1`, the issue described in the current PR will also not occur. 6. When the return value of `InetAddress.getLocalHost` is `127.0.1.1`, it triggers additional calculations to obtain the actual physical machine IP. At this time, it triggers log printing. However, since `Logging.enableStructuredLogging()` or `Logging.disableStructuredLogging()` has not been called based on `spark.log.structuredLogging.enabled` at this point, `org.apache.spark.internal.Logging#isStructuredLoggingEnabled` will return the default value, which is true. Therefore, it will load `org/apache/spark/log4j2-defaults.properties` to use `StructuredLogging`. https://github.com/apache/spark/blob/0602020eb3b346a8c50ad32eeda4e6dabb70c584/core/src/main/scala/org/apache/spark/util/Utils.scala#L873-L911 A possible solution I can think of is to change `org.apache.spark.internal.config#DRIVER_HOST_ADDRESS` and `org.apache.spark.internal.config#DRIVER_BIND_ADDRESS` from `val` to `lazy val`. This can avoid the aforementioned unexpected early initialization. I manually tested this solution, and it can restore the logs to normal. However, whether this will cause other side effects, I am currently unable to determine. Could you help confirm whether `InetAddress.getLocalHost` returns the actual IP address instead of `127.0.0.1` in your testing environment? @HyukjinKwon @gengliangwang Also, do you have any other ideas for fixing this issue? @HyukjinKwon @gengliangwang @pan3793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
