Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-26 Thread via GitHub


gengliangwang closed pull request #46193: [SPARK-47963][CORE] Make the external 
Spark ecosystem can use structured logging mechanisms
URL: https://github.com/apache/spark/pull/46193


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-26 Thread via GitHub


gengliangwang commented on PR #46193:
URL: https://github.com/apache/spark/pull/46193#issuecomment-2079790129

   Thanks, merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-25 Thread via GitHub


panbingkun commented on PR #46193:
URL: https://github.com/apache/spark/pull/46193#issuecomment-2078352745

   > @panbingkun Sorry, I have to merge #46192 since it was created before this 
one. Please resolve the conflicts and I will merge this one. Thanks as always.
   
   Don't worry, I have solved it, 😄
   Let's run GA first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-25 Thread via GitHub


gengliangwang commented on PR #46193:
URL: https://github.com/apache/spark/pull/46193#issuecomment-2078113658

   @panbingkun Sorry, I have to merge 
https://github.com/apache/spark/pull/46192 since it was created before this 
one. Please resolve the conflicts and I will merge this one.  Thanks as always.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-25 Thread via GitHub


gengliangwang commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1580054051


##
common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala:
##
@@ -57,20 +57,20 @@ class LogKeySuite
   private val logKeyFilePath = getWorkspaceFilePath("common", "utils", "src", 
"main", "scala",
 "org", "apache", "spark", "internal", "LogKey.scala")
 
-  // regenerate the file `LogKey.scala` with its enumeration fields sorted 
alphabetically
+  // regenerate the file `LogKey.scala` with its members sorted alphabetically
   private def regenerateLogKeyFile(
-  originalKeys: Seq[LogKey], sortedKeys: Seq[LogKey]): Unit = {
+  originalKeys: Seq[String], sortedKeys: Seq[String]): Unit = {
 if (originalKeys != sortedKeys) {
   val logKeyFile = logKeyFilePath.toFile
-  logInfo(s"Regenerating LogKey file $logKeyFile")
+  logInfo(s"Regenerating the file $logKeyFile")
   val originalContents = FileUtils.readLines(logKeyFile, 
StandardCharsets.UTF_8)
   val sortedContents = new JList[String]()
   var firstMatch = false
   originalContents.asScala.foreach { line =>
-if (line.trim.startsWith("val ") && line.trim.endsWith(" = Value")) {
+if (line.trim.startsWith("case object ") && line.trim.endsWith(" 
extends LogKey")) {

Review Comment:
   Not directly related to this PR, but we need to think about what if there 
are code comments around the LogKey.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578930058


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -144,6 +146,22 @@ trait LoggingSuiteBase
   }
   }
 
+  // An enumeration value that is not defined in `LogKey`, you can define it 
anywhere
+  private val externalLogKey = LogKey.VALUE
+  private val externalLog = log"${MDC(externalLogKey, "External log 
message.")}"
+  test("Logging with external LogKey") {

Review Comment:
   Yeah



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


LuciferYang commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578922427


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -144,6 +146,22 @@ trait LoggingSuiteBase
   }
   }
 
+  // An enumeration value that is not defined in `LogKey`, you can define it 
anywhere
+  private val externalLogKey = LogKey.VALUE
+  private val externalLog = log"${MDC(externalLogKey, "External log 
message.")}"
+  test("Logging with external LogKey") {

Review Comment:
   > If necessary in the future, I think it would be easier to implement based 
on the `current changes`, such as
   > 
   > * `ILogKey`
   > * `InternalLogKey` extends `ILogKey`
   > * `LogKey` extends `InternalLogKey`
   > * `ExternalLogKey` extends `ILogKey`
   > * `ThirdPartyLogKey` extends `ExternalLogKey`
   
   Yes, we can further expand after this PR is completed. 
   
   Personally, I think we can add a String field to the MDC to identify the 
namespace. For example, the namespace of Spark's internal MDC can always be 
`SPARK`, and the third-party Spark ecosystem can customize their own namespace. 
This can make the key of `MessageWithContext` become similar to `SPARK.mdcKey`, 
`CUSTOM_NAMESPACE.mdcKey`. 
   
   Perhaps this is not a perfect method, we can further discuss it after this 
PR is completed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


gengliangwang commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578903240


##
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##
@@ -16,399 +16,399 @@
  */
 package org.apache.spark.internal
 
+trait LogKey
+
 /**
  * Various keys used for mapped diagnostic contexts(MDC) in logging.
  * All structured logging keys should be defined here for standardization.
  */
-object LogKey extends Enumeration {
-  val ACCUMULATOR_ID = Value
-  val ACTUAL_NUM_FILES = Value
-  val ACTUAL_PARTITION_COLUMN = Value
-  val ALPHA = Value
-  val ANALYSIS_ERROR = Value
-  val APP_ATTEMPT_ID = Value
-  val APP_DESC = Value
-  val APP_ID = Value
-  val APP_NAME = Value
-  val APP_STATE = Value
-  val ARGS = Value
-  val BACKUP_FILE = Value
-  val BATCH_ID = Value
-  val BATCH_WRITE = Value
-  val BLOCK_ID = Value
-  val BLOCK_MANAGER_ID = Value
-  val BROADCAST_ID = Value
-  val BUCKET = Value
-  val BYTECODE_SIZE = Value
-  val CACHED_TABLE_PARTITION_METADATA_SIZE = Value
-  val CACHE_AUTO_REMOVED_SIZE = Value
-  val CACHE_UNTIL_HIGHEST_CONSUMED_SIZE = Value
-  val CACHE_UNTIL_LAST_PRODUCED_SIZE = Value
-  val CALL_SITE_LONG_FORM = Value
-  val CATALOG_NAME = Value
-  val CATEGORICAL_FEATURES = Value
-  val CHECKPOINT_FILE = Value
-  val CHECKPOINT_TIME = Value
-  val CHECKSUM_FILE_NUM = Value
-  val CLASS_LOADER = Value
-  val CLASS_NAME = Value
-  val CLUSTER_CENTROIDS = Value
-  val CLUSTER_ID = Value
-  val CLUSTER_LABEL = Value
-  val CLUSTER_LEVEL = Value
-  val CLUSTER_WEIGHT = Value
-  val CODEC_LEVEL = Value
-  val CODEC_NAME = Value
-  val CODEGEN_STAGE_ID = Value
-  val COLUMN_DATA_TYPE_SOURCE = Value
-  val COLUMN_DATA_TYPE_TARGET = Value
-  val COLUMN_DEFAULT_VALUE = Value
-  val COLUMN_NAME = Value
-  val COMMAND = Value
-  val COMMAND_OUTPUT = Value
-  val COMPONENT = Value
-  val CONFIG = Value
-  val CONFIG2 = Value
-  val CONFIG3 = Value
-  val CONFIG4 = Value
-  val CONFIG5 = Value
-  val CONSUMER = Value
-  val CONTAINER = Value
-  val CONTAINER_ID = Value
-  val CONTAINER_STATE = Value
-  val COST = Value
-  val COUNT = Value
-  val CROSS_VALIDATION_METRIC = Value
-  val CROSS_VALIDATION_METRICS = Value
-  val CSV_HEADER_COLUMN_NAME = Value
-  val CSV_HEADER_COLUMN_NAMES = Value
-  val CSV_HEADER_LENGTH = Value
-  val CSV_SCHEMA_FIELD_NAME = Value
-  val CSV_SCHEMA_FIELD_NAMES = Value
-  val CSV_SOURCE = Value
-  val CURRENT_PATH = Value
-  val DATA = Value
-  val DATABASE_NAME = Value
-  val DATAFRAME_CACHE_ENTRY = Value
-  val DATAFRAME_ID = Value
-  val DATA_FILE_NUM = Value
-  val DATA_SOURCE = Value
-  val DATA_SOURCES = Value
-  val DATA_SOURCE_PROVIDER = Value
-  val DEFAULT_ISOLATION_LEVEL = Value
-  val DEFAULT_VALUE = Value
-  val DELAY = Value
-  val DELEGATE = Value
-  val DELTA = Value
-  val DESCRIPTION = Value
-  val DESIRED_PARTITIONS_SIZE = Value
-  val DIFF_DELTA = Value
-  val DIVISIBLE_CLUSTER_INDICES_SIZE = Value
-  val DRIVER_ID = Value
-  val DROPPED_PARTITIONS = Value
-  val DURATION = Value
-  val EFFECTIVE_STORAGE_LEVEL = Value
-  val ELAPSED_TIME = Value
-  val ENCODING = Value
-  val END_INDEX = Value
-  val END_POINT = Value
-  val ENGINE = Value
-  val ERROR = Value
-  val ESTIMATOR_PARAMETER_MAP = Value
-  val EVENT_LOOP = Value
-  val EVENT_QUEUE = Value
-  val EXCEPTION = Value
-  val EXECUTE_INFO = Value
-  val EXECUTE_KEY = Value
-  val EXECUTION_PLAN_LEAVES = Value
-  val EXECUTOR_DESIRED_COUNT = Value
-  val EXECUTOR_ENVS = Value
-  val EXECUTOR_ENV_REGEX = Value
-  val EXECUTOR_ID = Value
-  val EXECUTOR_IDS = Value
-  val EXECUTOR_LAUNCH_COMMANDS = Value
-  val EXECUTOR_LAUNCH_COUNT = Value
-  val EXECUTOR_RESOURCES = Value
-  val EXECUTOR_STATE = Value
-  val EXECUTOR_TARGET_COUNT = Value
-  val EXIT_CODE = Value
-  val EXPECTED_NUM_FILES = Value
-  val EXPECTED_PARTITION_COLUMN = Value
-  val EXPIRY_TIMESTAMP = Value
-  val EXPR = Value
-  val EXPR_TERMS = Value
-  val EXTENDED_EXPLAIN_GENERATOR = Value
-  val FAILURES = Value
-  val FALLBACK_VERSION = Value
-  val FEATURE_COLUMN = Value
-  val FEATURE_DIMENSION = Value
-  val FIELD_NAME = Value
-  val FILE_ABSOLUTE_PATH = Value
-  val FILE_FORMAT = Value
-  val FILE_FORMAT2 = Value
-  val FILE_NAME = Value
-  val FILE_VERSION = Value
-  val FINISH_TRIGGER_DURATION = Value
-  val FROM_OFFSET = Value
-  val FROM_TIME = Value
-  val FUNCTION_NAME = Value
-  val FUNCTION_PARAMETER = Value
-  val GROUP_ID = Value
-  val HADOOP_VERSION = Value
-  val HASH_JOIN_KEYS = Value
-  val HEARTBEAT_INTERVAL = Value
-  val HISTORY_DIR = Value
-  val HIVE_CLIENT_VERSION = Value
-  val HIVE_METASTORE_VERSION = Value
-  val HIVE_OPERATION_STATE = Value
-  val HIVE_OPERATION_TYPE = Value
-  val HOST = Value
-  val HOST_PORT = Value
-  val IDENTIFIER = Value
-  val INCOMPATIBLE_TYPES = Value
-  val INDEX = Value
-  val INDEX_FILE_NUM = Value
-  val INDEX_NAME = Value
-  val INFERENCE_MODE = Value
-  val INITIAL_CAPACITY = Value
-  val INITIAL_HEARTBEAT_INTE

Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578806516


##
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##
@@ -16,399 +16,399 @@
  */
 package org.apache.spark.internal
 
+trait LogKey
+
 /**
  * Various keys used for mapped diagnostic contexts(MDC) in logging.
  * All structured logging keys should be defined here for standardization.
  */
-object LogKey extends Enumeration {
-  val ACCUMULATOR_ID = Value
-  val ACTUAL_NUM_FILES = Value
-  val ACTUAL_PARTITION_COLUMN = Value
-  val ALPHA = Value
-  val ANALYSIS_ERROR = Value
-  val APP_ATTEMPT_ID = Value
-  val APP_DESC = Value
-  val APP_ID = Value
-  val APP_NAME = Value
-  val APP_STATE = Value
-  val ARGS = Value
-  val BACKUP_FILE = Value
-  val BATCH_ID = Value
-  val BATCH_WRITE = Value
-  val BLOCK_ID = Value
-  val BLOCK_MANAGER_ID = Value
-  val BROADCAST_ID = Value
-  val BUCKET = Value
-  val BYTECODE_SIZE = Value
-  val CACHED_TABLE_PARTITION_METADATA_SIZE = Value
-  val CACHE_AUTO_REMOVED_SIZE = Value
-  val CACHE_UNTIL_HIGHEST_CONSUMED_SIZE = Value
-  val CACHE_UNTIL_LAST_PRODUCED_SIZE = Value
-  val CALL_SITE_LONG_FORM = Value
-  val CATALOG_NAME = Value
-  val CATEGORICAL_FEATURES = Value
-  val CHECKPOINT_FILE = Value
-  val CHECKPOINT_TIME = Value
-  val CHECKSUM_FILE_NUM = Value
-  val CLASS_LOADER = Value
-  val CLASS_NAME = Value
-  val CLUSTER_CENTROIDS = Value
-  val CLUSTER_ID = Value
-  val CLUSTER_LABEL = Value
-  val CLUSTER_LEVEL = Value
-  val CLUSTER_WEIGHT = Value
-  val CODEC_LEVEL = Value
-  val CODEC_NAME = Value
-  val CODEGEN_STAGE_ID = Value
-  val COLUMN_DATA_TYPE_SOURCE = Value
-  val COLUMN_DATA_TYPE_TARGET = Value
-  val COLUMN_DEFAULT_VALUE = Value
-  val COLUMN_NAME = Value
-  val COMMAND = Value
-  val COMMAND_OUTPUT = Value
-  val COMPONENT = Value
-  val CONFIG = Value
-  val CONFIG2 = Value
-  val CONFIG3 = Value
-  val CONFIG4 = Value
-  val CONFIG5 = Value
-  val CONSUMER = Value
-  val CONTAINER = Value
-  val CONTAINER_ID = Value
-  val CONTAINER_STATE = Value
-  val COST = Value
-  val COUNT = Value
-  val CROSS_VALIDATION_METRIC = Value
-  val CROSS_VALIDATION_METRICS = Value
-  val CSV_HEADER_COLUMN_NAME = Value
-  val CSV_HEADER_COLUMN_NAMES = Value
-  val CSV_HEADER_LENGTH = Value
-  val CSV_SCHEMA_FIELD_NAME = Value
-  val CSV_SCHEMA_FIELD_NAMES = Value
-  val CSV_SOURCE = Value
-  val CURRENT_PATH = Value
-  val DATA = Value
-  val DATABASE_NAME = Value
-  val DATAFRAME_CACHE_ENTRY = Value
-  val DATAFRAME_ID = Value
-  val DATA_FILE_NUM = Value
-  val DATA_SOURCE = Value
-  val DATA_SOURCES = Value
-  val DATA_SOURCE_PROVIDER = Value
-  val DEFAULT_ISOLATION_LEVEL = Value
-  val DEFAULT_VALUE = Value
-  val DELAY = Value
-  val DELEGATE = Value
-  val DELTA = Value
-  val DESCRIPTION = Value
-  val DESIRED_PARTITIONS_SIZE = Value
-  val DIFF_DELTA = Value
-  val DIVISIBLE_CLUSTER_INDICES_SIZE = Value
-  val DRIVER_ID = Value
-  val DROPPED_PARTITIONS = Value
-  val DURATION = Value
-  val EFFECTIVE_STORAGE_LEVEL = Value
-  val ELAPSED_TIME = Value
-  val ENCODING = Value
-  val END_INDEX = Value
-  val END_POINT = Value
-  val ENGINE = Value
-  val ERROR = Value
-  val ESTIMATOR_PARAMETER_MAP = Value
-  val EVENT_LOOP = Value
-  val EVENT_QUEUE = Value
-  val EXCEPTION = Value
-  val EXECUTE_INFO = Value
-  val EXECUTE_KEY = Value
-  val EXECUTION_PLAN_LEAVES = Value
-  val EXECUTOR_DESIRED_COUNT = Value
-  val EXECUTOR_ENVS = Value
-  val EXECUTOR_ENV_REGEX = Value
-  val EXECUTOR_ID = Value
-  val EXECUTOR_IDS = Value
-  val EXECUTOR_LAUNCH_COMMANDS = Value
-  val EXECUTOR_LAUNCH_COUNT = Value
-  val EXECUTOR_RESOURCES = Value
-  val EXECUTOR_STATE = Value
-  val EXECUTOR_TARGET_COUNT = Value
-  val EXIT_CODE = Value
-  val EXPECTED_NUM_FILES = Value
-  val EXPECTED_PARTITION_COLUMN = Value
-  val EXPIRY_TIMESTAMP = Value
-  val EXPR = Value
-  val EXPR_TERMS = Value
-  val EXTENDED_EXPLAIN_GENERATOR = Value
-  val FAILURES = Value
-  val FALLBACK_VERSION = Value
-  val FEATURE_COLUMN = Value
-  val FEATURE_DIMENSION = Value
-  val FIELD_NAME = Value
-  val FILE_ABSOLUTE_PATH = Value
-  val FILE_FORMAT = Value
-  val FILE_FORMAT2 = Value
-  val FILE_NAME = Value
-  val FILE_VERSION = Value
-  val FINISH_TRIGGER_DURATION = Value
-  val FROM_OFFSET = Value
-  val FROM_TIME = Value
-  val FUNCTION_NAME = Value
-  val FUNCTION_PARAMETER = Value
-  val GROUP_ID = Value
-  val HADOOP_VERSION = Value
-  val HASH_JOIN_KEYS = Value
-  val HEARTBEAT_INTERVAL = Value
-  val HISTORY_DIR = Value
-  val HIVE_CLIENT_VERSION = Value
-  val HIVE_METASTORE_VERSION = Value
-  val HIVE_OPERATION_STATE = Value
-  val HIVE_OPERATION_TYPE = Value
-  val HOST = Value
-  val HOST_PORT = Value
-  val IDENTIFIER = Value
-  val INCOMPATIBLE_TYPES = Value
-  val INDEX = Value
-  val INDEX_FILE_NUM = Value
-  val INDEX_NAME = Value
-  val INFERENCE_MODE = Value
-  val INITIAL_CAPACITY = Value
-  val INITIAL_HEARTBEAT_INTERVA

Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578797076


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala:
##
@@ -33,8 +33,8 @@ import org.apache.spark.deploy.k8s.Config._
 import org.apache.spark.deploy.k8s.Constants._
 import org.apache.spark.deploy.k8s.KubernetesConf
 import org.apache.spark.deploy.k8s.KubernetesUtils.addOwnerReference
-import org.apache.spark.internal.{Logging, LogKey, MDC}
-import org.apache.spark.internal.LogKey.{COUNT, EXECUTOR_IDS, TIMEOUT}
+import org.apache.spark.internal.{Logging, LogKeys, MDC}
+import org.apache.spark.internal.LogKeys.{COUNT, EXECUTOR_IDS, TIMEOUT}

Review Comment:
   merge to `import org.apache.spark.internal.LogKeys`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578791101


##
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala:
##
@@ -33,8 +33,8 @@ import org.apache.spark.InternalAccumulator.{input, 
shuffleRead}
 import org.apache.spark.TaskState.TaskState
 import org.apache.spark.errors.SparkCoreErrors
 import org.apache.spark.executor.ExecutorMetrics
-import org.apache.spark.internal.{config, Logging, LogKey, MDC}
-import org.apache.spark.internal.LogKey.{REASON, TASK_SET_NAME, TASK_STATE, 
TID}
+import org.apache.spark.internal.{config, Logging, LogKeys, MDC}
+import org.apache.spark.internal.LogKeys.{REASON, TASK_SET_NAME, TASK_STATE, 
TID}

Review Comment:
   merge to `import org.apache.spark.internal.LogKeys`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


LuciferYang commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578788755


##
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##
@@ -16,399 +16,399 @@
  */
 package org.apache.spark.internal
 
+trait LogKey
+
 /**
  * Various keys used for mapped diagnostic contexts(MDC) in logging.
  * All structured logging keys should be defined here for standardization.
  */
-object LogKey extends Enumeration {
-  val ACCUMULATOR_ID = Value
-  val ACTUAL_NUM_FILES = Value
-  val ACTUAL_PARTITION_COLUMN = Value
-  val ALPHA = Value
-  val ANALYSIS_ERROR = Value
-  val APP_ATTEMPT_ID = Value
-  val APP_DESC = Value
-  val APP_ID = Value
-  val APP_NAME = Value
-  val APP_STATE = Value
-  val ARGS = Value
-  val BACKUP_FILE = Value
-  val BATCH_ID = Value
-  val BATCH_WRITE = Value
-  val BLOCK_ID = Value
-  val BLOCK_MANAGER_ID = Value
-  val BROADCAST_ID = Value
-  val BUCKET = Value
-  val BYTECODE_SIZE = Value
-  val CACHED_TABLE_PARTITION_METADATA_SIZE = Value
-  val CACHE_AUTO_REMOVED_SIZE = Value
-  val CACHE_UNTIL_HIGHEST_CONSUMED_SIZE = Value
-  val CACHE_UNTIL_LAST_PRODUCED_SIZE = Value
-  val CALL_SITE_LONG_FORM = Value
-  val CATALOG_NAME = Value
-  val CATEGORICAL_FEATURES = Value
-  val CHECKPOINT_FILE = Value
-  val CHECKPOINT_TIME = Value
-  val CHECKSUM_FILE_NUM = Value
-  val CLASS_LOADER = Value
-  val CLASS_NAME = Value
-  val CLUSTER_CENTROIDS = Value
-  val CLUSTER_ID = Value
-  val CLUSTER_LABEL = Value
-  val CLUSTER_LEVEL = Value
-  val CLUSTER_WEIGHT = Value
-  val CODEC_LEVEL = Value
-  val CODEC_NAME = Value
-  val CODEGEN_STAGE_ID = Value
-  val COLUMN_DATA_TYPE_SOURCE = Value
-  val COLUMN_DATA_TYPE_TARGET = Value
-  val COLUMN_DEFAULT_VALUE = Value
-  val COLUMN_NAME = Value
-  val COMMAND = Value
-  val COMMAND_OUTPUT = Value
-  val COMPONENT = Value
-  val CONFIG = Value
-  val CONFIG2 = Value
-  val CONFIG3 = Value
-  val CONFIG4 = Value
-  val CONFIG5 = Value
-  val CONSUMER = Value
-  val CONTAINER = Value
-  val CONTAINER_ID = Value
-  val CONTAINER_STATE = Value
-  val COST = Value
-  val COUNT = Value
-  val CROSS_VALIDATION_METRIC = Value
-  val CROSS_VALIDATION_METRICS = Value
-  val CSV_HEADER_COLUMN_NAME = Value
-  val CSV_HEADER_COLUMN_NAMES = Value
-  val CSV_HEADER_LENGTH = Value
-  val CSV_SCHEMA_FIELD_NAME = Value
-  val CSV_SCHEMA_FIELD_NAMES = Value
-  val CSV_SOURCE = Value
-  val CURRENT_PATH = Value
-  val DATA = Value
-  val DATABASE_NAME = Value
-  val DATAFRAME_CACHE_ENTRY = Value
-  val DATAFRAME_ID = Value
-  val DATA_FILE_NUM = Value
-  val DATA_SOURCE = Value
-  val DATA_SOURCES = Value
-  val DATA_SOURCE_PROVIDER = Value
-  val DEFAULT_ISOLATION_LEVEL = Value
-  val DEFAULT_VALUE = Value
-  val DELAY = Value
-  val DELEGATE = Value
-  val DELTA = Value
-  val DESCRIPTION = Value
-  val DESIRED_PARTITIONS_SIZE = Value
-  val DIFF_DELTA = Value
-  val DIVISIBLE_CLUSTER_INDICES_SIZE = Value
-  val DRIVER_ID = Value
-  val DROPPED_PARTITIONS = Value
-  val DURATION = Value
-  val EFFECTIVE_STORAGE_LEVEL = Value
-  val ELAPSED_TIME = Value
-  val ENCODING = Value
-  val END_INDEX = Value
-  val END_POINT = Value
-  val ENGINE = Value
-  val ERROR = Value
-  val ESTIMATOR_PARAMETER_MAP = Value
-  val EVENT_LOOP = Value
-  val EVENT_QUEUE = Value
-  val EXCEPTION = Value
-  val EXECUTE_INFO = Value
-  val EXECUTE_KEY = Value
-  val EXECUTION_PLAN_LEAVES = Value
-  val EXECUTOR_DESIRED_COUNT = Value
-  val EXECUTOR_ENVS = Value
-  val EXECUTOR_ENV_REGEX = Value
-  val EXECUTOR_ID = Value
-  val EXECUTOR_IDS = Value
-  val EXECUTOR_LAUNCH_COMMANDS = Value
-  val EXECUTOR_LAUNCH_COUNT = Value
-  val EXECUTOR_RESOURCES = Value
-  val EXECUTOR_STATE = Value
-  val EXECUTOR_TARGET_COUNT = Value
-  val EXIT_CODE = Value
-  val EXPECTED_NUM_FILES = Value
-  val EXPECTED_PARTITION_COLUMN = Value
-  val EXPIRY_TIMESTAMP = Value
-  val EXPR = Value
-  val EXPR_TERMS = Value
-  val EXTENDED_EXPLAIN_GENERATOR = Value
-  val FAILURES = Value
-  val FALLBACK_VERSION = Value
-  val FEATURE_COLUMN = Value
-  val FEATURE_DIMENSION = Value
-  val FIELD_NAME = Value
-  val FILE_ABSOLUTE_PATH = Value
-  val FILE_FORMAT = Value
-  val FILE_FORMAT2 = Value
-  val FILE_NAME = Value
-  val FILE_VERSION = Value
-  val FINISH_TRIGGER_DURATION = Value
-  val FROM_OFFSET = Value
-  val FROM_TIME = Value
-  val FUNCTION_NAME = Value
-  val FUNCTION_PARAMETER = Value
-  val GROUP_ID = Value
-  val HADOOP_VERSION = Value
-  val HASH_JOIN_KEYS = Value
-  val HEARTBEAT_INTERVAL = Value
-  val HISTORY_DIR = Value
-  val HIVE_CLIENT_VERSION = Value
-  val HIVE_METASTORE_VERSION = Value
-  val HIVE_OPERATION_STATE = Value
-  val HIVE_OPERATION_TYPE = Value
-  val HOST = Value
-  val HOST_PORT = Value
-  val IDENTIFIER = Value
-  val INCOMPATIBLE_TYPES = Value
-  val INDEX = Value
-  val INDEX_FILE_NUM = Value
-  val INDEX_NAME = Value
-  val INFERENCE_MODE = Value
-  val INITIAL_CAPACITY = Value
-  val INITIAL_HEARTBEAT_INTERV

Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


LuciferYang commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578786930


##
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##
@@ -16,399 +16,399 @@
  */
 package org.apache.spark.internal
 
+trait LogKey

Review Comment:
   Relevant explanatory comments should be added for `LogKey`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578780993


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##
@@ -20,15 +20,15 @@ package org.apache.spark.sql.catalyst.optimizer
 import scala.collection.mutable
 
 import org.apache.spark.SparkException
-import org.apache.spark.internal.LogKey._
+import org.apache.spark.internal.LogKeys._
 import org.apache.spark.internal.MDC
 import org.apache.spark.sql.catalyst.SQLConfHelper
 import org.apache.spark.sql.catalyst.analysis._
 import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, SessionCatalog}
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.aggregate._
 import org.apache.spark.sql.catalyst.plans._
-import org.apache.spark.sql.catalyst.plans.logical.{RepartitionOperation, _}
+import org.apache.spark.sql.catalyst.plans.logical._

Review Comment:
   This is not related to the renaming of `LogKey -> LogKeys`, but it is almost 
more reasonable to change it to this way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578780993


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##
@@ -20,15 +20,15 @@ package org.apache.spark.sql.catalyst.optimizer
 import scala.collection.mutable
 
 import org.apache.spark.SparkException
-import org.apache.spark.internal.LogKey._
+import org.apache.spark.internal.LogKeys._
 import org.apache.spark.internal.MDC
 import org.apache.spark.sql.catalyst.SQLConfHelper
 import org.apache.spark.sql.catalyst.analysis._
 import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, SessionCatalog}
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.aggregate._
 import org.apache.spark.sql.catalyst.plans._
-import org.apache.spark.sql.catalyst.plans.logical.{RepartitionOperation, _}
+import org.apache.spark.sql.catalyst.plans.logical._

Review Comment:
   This is not related to the renaming of LogKey -> LogKeys, but it is almost 
more reasonable to change it to this way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


gengliangwang commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578773809


##
common/utils/src/main/scala/org/apache/spark/internal/README.md:
##
@@ -2,7 +2,7 @@
 
 ## LogKey
 
-LogKeys serve as identifiers for mapped diagnostic contexts (MDC) within logs. 
Follow these guidelines when adding new LogKeys:
+LogKey serve as identifiers for mapped diagnostic contexts (MDC) within logs. 
Follow these guidelines when adding new LogKey:

Review Comment:
   ```suggestion
   `LogKey`s serve as identifiers for mapped diagnostic contexts (MDC) within 
logs. Follow these guidelines when adding a new LogKey:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578772766


##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala:
##
@@ -21,12 +21,12 @@ import org.json4s.{Formats, NoTypeHints}
 import org.json4s.jackson.Serialization
 
 import org.apache.spark.internal.{Logging, MDC}
-import org.apache.spark.internal.LogKey.{CONFIG, DEFAULT_VALUE, NEW_VALUE, 
OLD_VALUE, TIP}
+import org.apache.spark.internal.LogKeys.{CONFIG, DEFAULT_VALUE, NEW_VALUE, 
OLD_VALUE, TIP}
 import org.apache.spark.io.CompressionCodec
 import org.apache.spark.sql.RuntimeConfig
 import org.apache.spark.sql.connector.read.streaming.{Offset => OffsetV2, 
SparkDataStream}
 import 
org.apache.spark.sql.execution.streaming.state.{FlatMapGroupsWithStateExecHelper,
 StreamingAggregationStateManager, SymmetricHashJoinStateManager}
-import 
org.apache.spark.sql.internal.SQLConf.{FLATMAPGROUPSWITHSTATE_STATE_FORMAT_VERSION,
 _}
+import org.apache.spark.sql.internal.SQLConf._

Review Comment:
   This is not related to the renaming of `LogKey -> LogKeys`, but it is almost 
more reasonable to change it to this way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578769212


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -144,6 +146,21 @@ trait LoggingSuiteBase
   }
   }
 
+  private val externalSystemCustomLog =

Review Comment:
   Add a UT to test how external third-party can use `LogKey`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578768196


##
common/utils/src/main/scala/org/apache/spark/internal/README.md:
##
@@ -2,7 +2,7 @@
 
 ## LogKey
 
-LogKeys serve as identifiers for mapped diagnostic contexts (MDC) within logs. 
Follow these guidelines when adding new LogKeys:
+LogKey serve as identifiers for mapped diagnostic contexts (MDC) within logs. 
Follow these guidelines when adding new LogKey:

Review Comment:
   I'm not sure if it's more accurate to call it `LogKeys` or `LogKey`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578641891


##
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##
@@ -16,393 +16,393 @@
  */
 package org.apache.spark.internal
 
+trait ILogKey
+
 /**
  * Various keys used for mapped diagnostic contexts(MDC) in logging.
  * All structured logging keys should be defined here for standardization.
  */
-object LogKey extends Enumeration {
-  val ACCUMULATOR_ID = Value
-  val ACTUAL_NUM_FILES = Value
-  val ACTUAL_PARTITION_COLUMN = Value
-  val ALPHA = Value
-  val ANALYSIS_ERROR = Value
-  val APP_ATTEMPT_ID = Value
-  val APP_DESC = Value
-  val APP_ID = Value
-  val APP_NAME = Value
-  val APP_STATE = Value
-  val ARGS = Value
-  val BACKUP_FILE = Value
-  val BATCH_ID = Value
-  val BLOCK_ID = Value
-  val BLOCK_MANAGER_ID = Value
-  val BROADCAST_ID = Value
-  val BUCKET = Value
-  val BYTECODE_SIZE = Value
-  val CACHED_TABLE_PARTITION_METADATA_SIZE = Value
-  val CACHE_AUTO_REMOVED_SIZE = Value
-  val CACHE_UNTIL_HIGHEST_CONSUMED_SIZE = Value
-  val CACHE_UNTIL_LAST_PRODUCED_SIZE = Value
-  val CALL_SITE_LONG_FORM = Value
-  val CATALOG_NAME = Value
-  val CATEGORICAL_FEATURES = Value
-  val CHECKPOINT_FILE = Value
-  val CHECKPOINT_TIME = Value
-  val CHECKSUM_FILE_NUM = Value
-  val CLASS_LOADER = Value
-  val CLASS_NAME = Value
-  val CLUSTER_CENTROIDS = Value
-  val CLUSTER_ID = Value
-  val CLUSTER_LABEL = Value
-  val CLUSTER_LEVEL = Value
-  val CLUSTER_WEIGHT = Value
-  val CODEC_LEVEL = Value
-  val CODEC_NAME = Value
-  val CODEGEN_STAGE_ID = Value
-  val COLUMN_DATA_TYPE_SOURCE = Value
-  val COLUMN_DATA_TYPE_TARGET = Value
-  val COLUMN_DEFAULT_VALUE = Value
-  val COLUMN_NAME = Value
-  val COMMAND = Value
-  val COMMAND_OUTPUT = Value
-  val COMPONENT = Value
-  val CONFIG = Value
-  val CONFIG2 = Value
-  val CONFIG3 = Value
-  val CONFIG4 = Value
-  val CONFIG5 = Value
-  val CONSUMER = Value
-  val CONTAINER = Value
-  val CONTAINER_ID = Value
-  val CONTAINER_STATE = Value
-  val COST = Value
-  val COUNT = Value
-  val CROSS_VALIDATION_METRIC = Value
-  val CROSS_VALIDATION_METRICS = Value
-  val CSV_HEADER_COLUMN_NAME = Value
-  val CSV_HEADER_COLUMN_NAMES = Value
-  val CSV_HEADER_LENGTH = Value
-  val CSV_SCHEMA_FIELD_NAME = Value
-  val CSV_SCHEMA_FIELD_NAMES = Value
-  val CSV_SOURCE = Value
-  val CURRENT_PATH = Value
-  val DATA = Value
-  val DATABASE_NAME = Value
-  val DATAFRAME_CACHE_ENTRY = Value
-  val DATAFRAME_ID = Value
-  val DATA_FILE_NUM = Value
-  val DATA_SOURCE = Value
-  val DATA_SOURCES = Value
-  val DATA_SOURCE_PROVIDER = Value
-  val DEFAULT_ISOLATION_LEVEL = Value
-  val DEFAULT_VALUE = Value
-  val DELAY = Value
-  val DELEGATE = Value
-  val DELTA = Value
-  val DESCRIPTION = Value
-  val DESIRED_PARTITIONS_SIZE = Value
-  val DIFF_DELTA = Value
-  val DIVISIBLE_CLUSTER_INDICES_SIZE = Value
-  val DRIVER_ID = Value
-  val DROPPED_PARTITIONS = Value
-  val DURATION = Value
-  val EFFECTIVE_STORAGE_LEVEL = Value
-  val ELAPSED_TIME = Value
-  val ENCODING = Value
-  val END_INDEX = Value
-  val END_POINT = Value
-  val ENGINE = Value
-  val ERROR = Value
-  val ESTIMATOR_PARAMETER_MAP = Value
-  val EVENT_LOOP = Value
-  val EVENT_QUEUE = Value
-  val EXECUTE_INFO = Value
-  val EXECUTE_KEY = Value
-  val EXECUTION_PLAN_LEAVES = Value
-  val EXECUTOR_DESIRED_COUNT = Value
-  val EXECUTOR_ENVS = Value
-  val EXECUTOR_ENV_REGEX = Value
-  val EXECUTOR_ID = Value
-  val EXECUTOR_IDS = Value
-  val EXECUTOR_LAUNCH_COMMANDS = Value
-  val EXECUTOR_LAUNCH_COUNT = Value
-  val EXECUTOR_RESOURCES = Value
-  val EXECUTOR_STATE = Value
-  val EXECUTOR_TARGET_COUNT = Value
-  val EXIT_CODE = Value
-  val EXPECTED_NUM_FILES = Value
-  val EXPECTED_PARTITION_COLUMN = Value
-  val EXPIRY_TIMESTAMP = Value
-  val EXPR = Value
-  val EXPR_TERMS = Value
-  val EXTENDED_EXPLAIN_GENERATOR = Value
-  val FAILURES = Value
-  val FALLBACK_VERSION = Value
-  val FEATURE_COLUMN = Value
-  val FEATURE_DIMENSION = Value
-  val FIELD_NAME = Value
-  val FILE_ABSOLUTE_PATH = Value
-  val FILE_FORMAT = Value
-  val FILE_FORMAT2 = Value
-  val FILE_NAME = Value
-  val FILE_VERSION = Value
-  val FINISH_TRIGGER_DURATION = Value
-  val FROM_OFFSET = Value
-  val FROM_TIME = Value
-  val FUNCTION_NAME = Value
-  val FUNCTION_PARAMETER = Value
-  val GROUP_ID = Value
-  val HADOOP_VERSION = Value
-  val HASH_JOIN_KEYS = Value
-  val HEARTBEAT_INTERVAL = Value
-  val HISTORY_DIR = Value
-  val HIVE_CLIENT_VERSION = Value
-  val HIVE_METASTORE_VERSION = Value
-  val HIVE_OPERATION_STATE = Value
-  val HIVE_OPERATION_TYPE = Value
-  val HOST = Value
-  val HOST_PORT = Value
-  val INCOMPATIBLE_TYPES = Value
-  val INDEX = Value
-  val INDEX_FILE_NUM = Value
-  val INDEX_NAME = Value
-  val INFERENCE_MODE = Value
-  val INITIAL_CAPACITY = Value
-  val INITIAL_HEARTBEAT_INTERVAL = Value
-  val INIT_MODE = Value
-  val INTERVAL = Value
-  val ISOLATION_L

Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578640785


##
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##
@@ -16,393 +16,393 @@
  */
 package org.apache.spark.internal
 
+trait ILogKey
+
 /**
  * Various keys used for mapped diagnostic contexts(MDC) in logging.
  * All structured logging keys should be defined here for standardization.
  */
-object LogKey extends Enumeration {
-  val ACCUMULATOR_ID = Value
-  val ACTUAL_NUM_FILES = Value
-  val ACTUAL_PARTITION_COLUMN = Value
-  val ALPHA = Value
-  val ANALYSIS_ERROR = Value
-  val APP_ATTEMPT_ID = Value
-  val APP_DESC = Value
-  val APP_ID = Value
-  val APP_NAME = Value
-  val APP_STATE = Value
-  val ARGS = Value
-  val BACKUP_FILE = Value
-  val BATCH_ID = Value
-  val BLOCK_ID = Value
-  val BLOCK_MANAGER_ID = Value
-  val BROADCAST_ID = Value
-  val BUCKET = Value
-  val BYTECODE_SIZE = Value
-  val CACHED_TABLE_PARTITION_METADATA_SIZE = Value
-  val CACHE_AUTO_REMOVED_SIZE = Value
-  val CACHE_UNTIL_HIGHEST_CONSUMED_SIZE = Value
-  val CACHE_UNTIL_LAST_PRODUCED_SIZE = Value
-  val CALL_SITE_LONG_FORM = Value
-  val CATALOG_NAME = Value
-  val CATEGORICAL_FEATURES = Value
-  val CHECKPOINT_FILE = Value
-  val CHECKPOINT_TIME = Value
-  val CHECKSUM_FILE_NUM = Value
-  val CLASS_LOADER = Value
-  val CLASS_NAME = Value
-  val CLUSTER_CENTROIDS = Value
-  val CLUSTER_ID = Value
-  val CLUSTER_LABEL = Value
-  val CLUSTER_LEVEL = Value
-  val CLUSTER_WEIGHT = Value
-  val CODEC_LEVEL = Value
-  val CODEC_NAME = Value
-  val CODEGEN_STAGE_ID = Value
-  val COLUMN_DATA_TYPE_SOURCE = Value
-  val COLUMN_DATA_TYPE_TARGET = Value
-  val COLUMN_DEFAULT_VALUE = Value
-  val COLUMN_NAME = Value
-  val COMMAND = Value
-  val COMMAND_OUTPUT = Value
-  val COMPONENT = Value
-  val CONFIG = Value
-  val CONFIG2 = Value
-  val CONFIG3 = Value
-  val CONFIG4 = Value
-  val CONFIG5 = Value
-  val CONSUMER = Value
-  val CONTAINER = Value
-  val CONTAINER_ID = Value
-  val CONTAINER_STATE = Value
-  val COST = Value
-  val COUNT = Value
-  val CROSS_VALIDATION_METRIC = Value
-  val CROSS_VALIDATION_METRICS = Value
-  val CSV_HEADER_COLUMN_NAME = Value
-  val CSV_HEADER_COLUMN_NAMES = Value
-  val CSV_HEADER_LENGTH = Value
-  val CSV_SCHEMA_FIELD_NAME = Value
-  val CSV_SCHEMA_FIELD_NAMES = Value
-  val CSV_SOURCE = Value
-  val CURRENT_PATH = Value
-  val DATA = Value
-  val DATABASE_NAME = Value
-  val DATAFRAME_CACHE_ENTRY = Value
-  val DATAFRAME_ID = Value
-  val DATA_FILE_NUM = Value
-  val DATA_SOURCE = Value
-  val DATA_SOURCES = Value
-  val DATA_SOURCE_PROVIDER = Value
-  val DEFAULT_ISOLATION_LEVEL = Value
-  val DEFAULT_VALUE = Value
-  val DELAY = Value
-  val DELEGATE = Value
-  val DELTA = Value
-  val DESCRIPTION = Value
-  val DESIRED_PARTITIONS_SIZE = Value
-  val DIFF_DELTA = Value
-  val DIVISIBLE_CLUSTER_INDICES_SIZE = Value
-  val DRIVER_ID = Value
-  val DROPPED_PARTITIONS = Value
-  val DURATION = Value
-  val EFFECTIVE_STORAGE_LEVEL = Value
-  val ELAPSED_TIME = Value
-  val ENCODING = Value
-  val END_INDEX = Value
-  val END_POINT = Value
-  val ENGINE = Value
-  val ERROR = Value
-  val ESTIMATOR_PARAMETER_MAP = Value
-  val EVENT_LOOP = Value
-  val EVENT_QUEUE = Value
-  val EXECUTE_INFO = Value
-  val EXECUTE_KEY = Value
-  val EXECUTION_PLAN_LEAVES = Value
-  val EXECUTOR_DESIRED_COUNT = Value
-  val EXECUTOR_ENVS = Value
-  val EXECUTOR_ENV_REGEX = Value
-  val EXECUTOR_ID = Value
-  val EXECUTOR_IDS = Value
-  val EXECUTOR_LAUNCH_COMMANDS = Value
-  val EXECUTOR_LAUNCH_COUNT = Value
-  val EXECUTOR_RESOURCES = Value
-  val EXECUTOR_STATE = Value
-  val EXECUTOR_TARGET_COUNT = Value
-  val EXIT_CODE = Value
-  val EXPECTED_NUM_FILES = Value
-  val EXPECTED_PARTITION_COLUMN = Value
-  val EXPIRY_TIMESTAMP = Value
-  val EXPR = Value
-  val EXPR_TERMS = Value
-  val EXTENDED_EXPLAIN_GENERATOR = Value
-  val FAILURES = Value
-  val FALLBACK_VERSION = Value
-  val FEATURE_COLUMN = Value
-  val FEATURE_DIMENSION = Value
-  val FIELD_NAME = Value
-  val FILE_ABSOLUTE_PATH = Value
-  val FILE_FORMAT = Value
-  val FILE_FORMAT2 = Value
-  val FILE_NAME = Value
-  val FILE_VERSION = Value
-  val FINISH_TRIGGER_DURATION = Value
-  val FROM_OFFSET = Value
-  val FROM_TIME = Value
-  val FUNCTION_NAME = Value
-  val FUNCTION_PARAMETER = Value
-  val GROUP_ID = Value
-  val HADOOP_VERSION = Value
-  val HASH_JOIN_KEYS = Value
-  val HEARTBEAT_INTERVAL = Value
-  val HISTORY_DIR = Value
-  val HIVE_CLIENT_VERSION = Value
-  val HIVE_METASTORE_VERSION = Value
-  val HIVE_OPERATION_STATE = Value
-  val HIVE_OPERATION_TYPE = Value
-  val HOST = Value
-  val HOST_PORT = Value
-  val INCOMPATIBLE_TYPES = Value
-  val INDEX = Value
-  val INDEX_FILE_NUM = Value
-  val INDEX_NAME = Value
-  val INFERENCE_MODE = Value
-  val INITIAL_CAPACITY = Value
-  val INITIAL_HEARTBEAT_INTERVAL = Value
-  val INIT_MODE = Value
-  val INTERVAL = Value
-  val ISOLATION_L

Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


gengliangwang commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578640764


##
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##
@@ -16,393 +16,393 @@
  */
 package org.apache.spark.internal
 
+trait ILogKey
+
 /**
  * Various keys used for mapped diagnostic contexts(MDC) in logging.
  * All structured logging keys should be defined here for standardization.
  */
-object LogKey extends Enumeration {
-  val ACCUMULATOR_ID = Value
-  val ACTUAL_NUM_FILES = Value
-  val ACTUAL_PARTITION_COLUMN = Value
-  val ALPHA = Value
-  val ANALYSIS_ERROR = Value
-  val APP_ATTEMPT_ID = Value
-  val APP_DESC = Value
-  val APP_ID = Value
-  val APP_NAME = Value
-  val APP_STATE = Value
-  val ARGS = Value
-  val BACKUP_FILE = Value
-  val BATCH_ID = Value
-  val BLOCK_ID = Value
-  val BLOCK_MANAGER_ID = Value
-  val BROADCAST_ID = Value
-  val BUCKET = Value
-  val BYTECODE_SIZE = Value
-  val CACHED_TABLE_PARTITION_METADATA_SIZE = Value
-  val CACHE_AUTO_REMOVED_SIZE = Value
-  val CACHE_UNTIL_HIGHEST_CONSUMED_SIZE = Value
-  val CACHE_UNTIL_LAST_PRODUCED_SIZE = Value
-  val CALL_SITE_LONG_FORM = Value
-  val CATALOG_NAME = Value
-  val CATEGORICAL_FEATURES = Value
-  val CHECKPOINT_FILE = Value
-  val CHECKPOINT_TIME = Value
-  val CHECKSUM_FILE_NUM = Value
-  val CLASS_LOADER = Value
-  val CLASS_NAME = Value
-  val CLUSTER_CENTROIDS = Value
-  val CLUSTER_ID = Value
-  val CLUSTER_LABEL = Value
-  val CLUSTER_LEVEL = Value
-  val CLUSTER_WEIGHT = Value
-  val CODEC_LEVEL = Value
-  val CODEC_NAME = Value
-  val CODEGEN_STAGE_ID = Value
-  val COLUMN_DATA_TYPE_SOURCE = Value
-  val COLUMN_DATA_TYPE_TARGET = Value
-  val COLUMN_DEFAULT_VALUE = Value
-  val COLUMN_NAME = Value
-  val COMMAND = Value
-  val COMMAND_OUTPUT = Value
-  val COMPONENT = Value
-  val CONFIG = Value
-  val CONFIG2 = Value
-  val CONFIG3 = Value
-  val CONFIG4 = Value
-  val CONFIG5 = Value
-  val CONSUMER = Value
-  val CONTAINER = Value
-  val CONTAINER_ID = Value
-  val CONTAINER_STATE = Value
-  val COST = Value
-  val COUNT = Value
-  val CROSS_VALIDATION_METRIC = Value
-  val CROSS_VALIDATION_METRICS = Value
-  val CSV_HEADER_COLUMN_NAME = Value
-  val CSV_HEADER_COLUMN_NAMES = Value
-  val CSV_HEADER_LENGTH = Value
-  val CSV_SCHEMA_FIELD_NAME = Value
-  val CSV_SCHEMA_FIELD_NAMES = Value
-  val CSV_SOURCE = Value
-  val CURRENT_PATH = Value
-  val DATA = Value
-  val DATABASE_NAME = Value
-  val DATAFRAME_CACHE_ENTRY = Value
-  val DATAFRAME_ID = Value
-  val DATA_FILE_NUM = Value
-  val DATA_SOURCE = Value
-  val DATA_SOURCES = Value
-  val DATA_SOURCE_PROVIDER = Value
-  val DEFAULT_ISOLATION_LEVEL = Value
-  val DEFAULT_VALUE = Value
-  val DELAY = Value
-  val DELEGATE = Value
-  val DELTA = Value
-  val DESCRIPTION = Value
-  val DESIRED_PARTITIONS_SIZE = Value
-  val DIFF_DELTA = Value
-  val DIVISIBLE_CLUSTER_INDICES_SIZE = Value
-  val DRIVER_ID = Value
-  val DROPPED_PARTITIONS = Value
-  val DURATION = Value
-  val EFFECTIVE_STORAGE_LEVEL = Value
-  val ELAPSED_TIME = Value
-  val ENCODING = Value
-  val END_INDEX = Value
-  val END_POINT = Value
-  val ENGINE = Value
-  val ERROR = Value
-  val ESTIMATOR_PARAMETER_MAP = Value
-  val EVENT_LOOP = Value
-  val EVENT_QUEUE = Value
-  val EXECUTE_INFO = Value
-  val EXECUTE_KEY = Value
-  val EXECUTION_PLAN_LEAVES = Value
-  val EXECUTOR_DESIRED_COUNT = Value
-  val EXECUTOR_ENVS = Value
-  val EXECUTOR_ENV_REGEX = Value
-  val EXECUTOR_ID = Value
-  val EXECUTOR_IDS = Value
-  val EXECUTOR_LAUNCH_COMMANDS = Value
-  val EXECUTOR_LAUNCH_COUNT = Value
-  val EXECUTOR_RESOURCES = Value
-  val EXECUTOR_STATE = Value
-  val EXECUTOR_TARGET_COUNT = Value
-  val EXIT_CODE = Value
-  val EXPECTED_NUM_FILES = Value
-  val EXPECTED_PARTITION_COLUMN = Value
-  val EXPIRY_TIMESTAMP = Value
-  val EXPR = Value
-  val EXPR_TERMS = Value
-  val EXTENDED_EXPLAIN_GENERATOR = Value
-  val FAILURES = Value
-  val FALLBACK_VERSION = Value
-  val FEATURE_COLUMN = Value
-  val FEATURE_DIMENSION = Value
-  val FIELD_NAME = Value
-  val FILE_ABSOLUTE_PATH = Value
-  val FILE_FORMAT = Value
-  val FILE_FORMAT2 = Value
-  val FILE_NAME = Value
-  val FILE_VERSION = Value
-  val FINISH_TRIGGER_DURATION = Value
-  val FROM_OFFSET = Value
-  val FROM_TIME = Value
-  val FUNCTION_NAME = Value
-  val FUNCTION_PARAMETER = Value
-  val GROUP_ID = Value
-  val HADOOP_VERSION = Value
-  val HASH_JOIN_KEYS = Value
-  val HEARTBEAT_INTERVAL = Value
-  val HISTORY_DIR = Value
-  val HIVE_CLIENT_VERSION = Value
-  val HIVE_METASTORE_VERSION = Value
-  val HIVE_OPERATION_STATE = Value
-  val HIVE_OPERATION_TYPE = Value
-  val HOST = Value
-  val HOST_PORT = Value
-  val INCOMPATIBLE_TYPES = Value
-  val INDEX = Value
-  val INDEX_FILE_NUM = Value
-  val INDEX_NAME = Value
-  val INFERENCE_MODE = Value
-  val INITIAL_CAPACITY = Value
-  val INITIAL_HEARTBEAT_INTERVAL = Value
-  val INIT_MODE = Value
-  val INTERVAL = Value
-  val ISOLATIO

Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578638859


##
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##
@@ -16,393 +16,393 @@
  */
 package org.apache.spark.internal
 
+trait ILogKey
+
 /**
  * Various keys used for mapped diagnostic contexts(MDC) in logging.
  * All structured logging keys should be defined here for standardization.
  */
-object LogKey extends Enumeration {
-  val ACCUMULATOR_ID = Value
-  val ACTUAL_NUM_FILES = Value
-  val ACTUAL_PARTITION_COLUMN = Value
-  val ALPHA = Value
-  val ANALYSIS_ERROR = Value
-  val APP_ATTEMPT_ID = Value
-  val APP_DESC = Value
-  val APP_ID = Value
-  val APP_NAME = Value
-  val APP_STATE = Value
-  val ARGS = Value
-  val BACKUP_FILE = Value
-  val BATCH_ID = Value
-  val BLOCK_ID = Value
-  val BLOCK_MANAGER_ID = Value
-  val BROADCAST_ID = Value
-  val BUCKET = Value
-  val BYTECODE_SIZE = Value
-  val CACHED_TABLE_PARTITION_METADATA_SIZE = Value
-  val CACHE_AUTO_REMOVED_SIZE = Value
-  val CACHE_UNTIL_HIGHEST_CONSUMED_SIZE = Value
-  val CACHE_UNTIL_LAST_PRODUCED_SIZE = Value
-  val CALL_SITE_LONG_FORM = Value
-  val CATALOG_NAME = Value
-  val CATEGORICAL_FEATURES = Value
-  val CHECKPOINT_FILE = Value
-  val CHECKPOINT_TIME = Value
-  val CHECKSUM_FILE_NUM = Value
-  val CLASS_LOADER = Value
-  val CLASS_NAME = Value
-  val CLUSTER_CENTROIDS = Value
-  val CLUSTER_ID = Value
-  val CLUSTER_LABEL = Value
-  val CLUSTER_LEVEL = Value
-  val CLUSTER_WEIGHT = Value
-  val CODEC_LEVEL = Value
-  val CODEC_NAME = Value
-  val CODEGEN_STAGE_ID = Value
-  val COLUMN_DATA_TYPE_SOURCE = Value
-  val COLUMN_DATA_TYPE_TARGET = Value
-  val COLUMN_DEFAULT_VALUE = Value
-  val COLUMN_NAME = Value
-  val COMMAND = Value
-  val COMMAND_OUTPUT = Value
-  val COMPONENT = Value
-  val CONFIG = Value
-  val CONFIG2 = Value
-  val CONFIG3 = Value
-  val CONFIG4 = Value
-  val CONFIG5 = Value
-  val CONSUMER = Value
-  val CONTAINER = Value
-  val CONTAINER_ID = Value
-  val CONTAINER_STATE = Value
-  val COST = Value
-  val COUNT = Value
-  val CROSS_VALIDATION_METRIC = Value
-  val CROSS_VALIDATION_METRICS = Value
-  val CSV_HEADER_COLUMN_NAME = Value
-  val CSV_HEADER_COLUMN_NAMES = Value
-  val CSV_HEADER_LENGTH = Value
-  val CSV_SCHEMA_FIELD_NAME = Value
-  val CSV_SCHEMA_FIELD_NAMES = Value
-  val CSV_SOURCE = Value
-  val CURRENT_PATH = Value
-  val DATA = Value
-  val DATABASE_NAME = Value
-  val DATAFRAME_CACHE_ENTRY = Value
-  val DATAFRAME_ID = Value
-  val DATA_FILE_NUM = Value
-  val DATA_SOURCE = Value
-  val DATA_SOURCES = Value
-  val DATA_SOURCE_PROVIDER = Value
-  val DEFAULT_ISOLATION_LEVEL = Value
-  val DEFAULT_VALUE = Value
-  val DELAY = Value
-  val DELEGATE = Value
-  val DELTA = Value
-  val DESCRIPTION = Value
-  val DESIRED_PARTITIONS_SIZE = Value
-  val DIFF_DELTA = Value
-  val DIVISIBLE_CLUSTER_INDICES_SIZE = Value
-  val DRIVER_ID = Value
-  val DROPPED_PARTITIONS = Value
-  val DURATION = Value
-  val EFFECTIVE_STORAGE_LEVEL = Value
-  val ELAPSED_TIME = Value
-  val ENCODING = Value
-  val END_INDEX = Value
-  val END_POINT = Value
-  val ENGINE = Value
-  val ERROR = Value
-  val ESTIMATOR_PARAMETER_MAP = Value
-  val EVENT_LOOP = Value
-  val EVENT_QUEUE = Value
-  val EXECUTE_INFO = Value
-  val EXECUTE_KEY = Value
-  val EXECUTION_PLAN_LEAVES = Value
-  val EXECUTOR_DESIRED_COUNT = Value
-  val EXECUTOR_ENVS = Value
-  val EXECUTOR_ENV_REGEX = Value
-  val EXECUTOR_ID = Value
-  val EXECUTOR_IDS = Value
-  val EXECUTOR_LAUNCH_COMMANDS = Value
-  val EXECUTOR_LAUNCH_COUNT = Value
-  val EXECUTOR_RESOURCES = Value
-  val EXECUTOR_STATE = Value
-  val EXECUTOR_TARGET_COUNT = Value
-  val EXIT_CODE = Value
-  val EXPECTED_NUM_FILES = Value
-  val EXPECTED_PARTITION_COLUMN = Value
-  val EXPIRY_TIMESTAMP = Value
-  val EXPR = Value
-  val EXPR_TERMS = Value
-  val EXTENDED_EXPLAIN_GENERATOR = Value
-  val FAILURES = Value
-  val FALLBACK_VERSION = Value
-  val FEATURE_COLUMN = Value
-  val FEATURE_DIMENSION = Value
-  val FIELD_NAME = Value
-  val FILE_ABSOLUTE_PATH = Value
-  val FILE_FORMAT = Value
-  val FILE_FORMAT2 = Value
-  val FILE_NAME = Value
-  val FILE_VERSION = Value
-  val FINISH_TRIGGER_DURATION = Value
-  val FROM_OFFSET = Value
-  val FROM_TIME = Value
-  val FUNCTION_NAME = Value
-  val FUNCTION_PARAMETER = Value
-  val GROUP_ID = Value
-  val HADOOP_VERSION = Value
-  val HASH_JOIN_KEYS = Value
-  val HEARTBEAT_INTERVAL = Value
-  val HISTORY_DIR = Value
-  val HIVE_CLIENT_VERSION = Value
-  val HIVE_METASTORE_VERSION = Value
-  val HIVE_OPERATION_STATE = Value
-  val HIVE_OPERATION_TYPE = Value
-  val HOST = Value
-  val HOST_PORT = Value
-  val INCOMPATIBLE_TYPES = Value
-  val INDEX = Value
-  val INDEX_FILE_NUM = Value
-  val INDEX_NAME = Value
-  val INFERENCE_MODE = Value
-  val INITIAL_CAPACITY = Value
-  val INITIAL_HEARTBEAT_INTERVAL = Value
-  val INIT_MODE = Value
-  val INTERVAL = Value
-  val ISOLATION_L

Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


gengliangwang commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578624819


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -144,6 +146,22 @@ trait LoggingSuiteBase
   }
   }
 
+  // An enumeration value that is not defined in `LogKey`, you can define it 
anywhere
+  private val externalLogKey = LogKey.VALUE
+  private val externalLog = log"${MDC(externalLogKey, "External log 
message.")}"
+  test("Logging with external LogKey") {

Review Comment:
   Yeah creating a trait seems better. I have some preference in the naming. 
See my other comments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


gengliangwang commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578624300


##
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##
@@ -16,393 +16,393 @@
  */
 package org.apache.spark.internal
 
+trait ILogKey
+
 /**
  * Various keys used for mapped diagnostic contexts(MDC) in logging.
  * All structured logging keys should be defined here for standardization.
  */
-object LogKey extends Enumeration {
-  val ACCUMULATOR_ID = Value
-  val ACTUAL_NUM_FILES = Value
-  val ACTUAL_PARTITION_COLUMN = Value
-  val ALPHA = Value
-  val ANALYSIS_ERROR = Value
-  val APP_ATTEMPT_ID = Value
-  val APP_DESC = Value
-  val APP_ID = Value
-  val APP_NAME = Value
-  val APP_STATE = Value
-  val ARGS = Value
-  val BACKUP_FILE = Value
-  val BATCH_ID = Value
-  val BLOCK_ID = Value
-  val BLOCK_MANAGER_ID = Value
-  val BROADCAST_ID = Value
-  val BUCKET = Value
-  val BYTECODE_SIZE = Value
-  val CACHED_TABLE_PARTITION_METADATA_SIZE = Value
-  val CACHE_AUTO_REMOVED_SIZE = Value
-  val CACHE_UNTIL_HIGHEST_CONSUMED_SIZE = Value
-  val CACHE_UNTIL_LAST_PRODUCED_SIZE = Value
-  val CALL_SITE_LONG_FORM = Value
-  val CATALOG_NAME = Value
-  val CATEGORICAL_FEATURES = Value
-  val CHECKPOINT_FILE = Value
-  val CHECKPOINT_TIME = Value
-  val CHECKSUM_FILE_NUM = Value
-  val CLASS_LOADER = Value
-  val CLASS_NAME = Value
-  val CLUSTER_CENTROIDS = Value
-  val CLUSTER_ID = Value
-  val CLUSTER_LABEL = Value
-  val CLUSTER_LEVEL = Value
-  val CLUSTER_WEIGHT = Value
-  val CODEC_LEVEL = Value
-  val CODEC_NAME = Value
-  val CODEGEN_STAGE_ID = Value
-  val COLUMN_DATA_TYPE_SOURCE = Value
-  val COLUMN_DATA_TYPE_TARGET = Value
-  val COLUMN_DEFAULT_VALUE = Value
-  val COLUMN_NAME = Value
-  val COMMAND = Value
-  val COMMAND_OUTPUT = Value
-  val COMPONENT = Value
-  val CONFIG = Value
-  val CONFIG2 = Value
-  val CONFIG3 = Value
-  val CONFIG4 = Value
-  val CONFIG5 = Value
-  val CONSUMER = Value
-  val CONTAINER = Value
-  val CONTAINER_ID = Value
-  val CONTAINER_STATE = Value
-  val COST = Value
-  val COUNT = Value
-  val CROSS_VALIDATION_METRIC = Value
-  val CROSS_VALIDATION_METRICS = Value
-  val CSV_HEADER_COLUMN_NAME = Value
-  val CSV_HEADER_COLUMN_NAMES = Value
-  val CSV_HEADER_LENGTH = Value
-  val CSV_SCHEMA_FIELD_NAME = Value
-  val CSV_SCHEMA_FIELD_NAMES = Value
-  val CSV_SOURCE = Value
-  val CURRENT_PATH = Value
-  val DATA = Value
-  val DATABASE_NAME = Value
-  val DATAFRAME_CACHE_ENTRY = Value
-  val DATAFRAME_ID = Value
-  val DATA_FILE_NUM = Value
-  val DATA_SOURCE = Value
-  val DATA_SOURCES = Value
-  val DATA_SOURCE_PROVIDER = Value
-  val DEFAULT_ISOLATION_LEVEL = Value
-  val DEFAULT_VALUE = Value
-  val DELAY = Value
-  val DELEGATE = Value
-  val DELTA = Value
-  val DESCRIPTION = Value
-  val DESIRED_PARTITIONS_SIZE = Value
-  val DIFF_DELTA = Value
-  val DIVISIBLE_CLUSTER_INDICES_SIZE = Value
-  val DRIVER_ID = Value
-  val DROPPED_PARTITIONS = Value
-  val DURATION = Value
-  val EFFECTIVE_STORAGE_LEVEL = Value
-  val ELAPSED_TIME = Value
-  val ENCODING = Value
-  val END_INDEX = Value
-  val END_POINT = Value
-  val ENGINE = Value
-  val ERROR = Value
-  val ESTIMATOR_PARAMETER_MAP = Value
-  val EVENT_LOOP = Value
-  val EVENT_QUEUE = Value
-  val EXECUTE_INFO = Value
-  val EXECUTE_KEY = Value
-  val EXECUTION_PLAN_LEAVES = Value
-  val EXECUTOR_DESIRED_COUNT = Value
-  val EXECUTOR_ENVS = Value
-  val EXECUTOR_ENV_REGEX = Value
-  val EXECUTOR_ID = Value
-  val EXECUTOR_IDS = Value
-  val EXECUTOR_LAUNCH_COMMANDS = Value
-  val EXECUTOR_LAUNCH_COUNT = Value
-  val EXECUTOR_RESOURCES = Value
-  val EXECUTOR_STATE = Value
-  val EXECUTOR_TARGET_COUNT = Value
-  val EXIT_CODE = Value
-  val EXPECTED_NUM_FILES = Value
-  val EXPECTED_PARTITION_COLUMN = Value
-  val EXPIRY_TIMESTAMP = Value
-  val EXPR = Value
-  val EXPR_TERMS = Value
-  val EXTENDED_EXPLAIN_GENERATOR = Value
-  val FAILURES = Value
-  val FALLBACK_VERSION = Value
-  val FEATURE_COLUMN = Value
-  val FEATURE_DIMENSION = Value
-  val FIELD_NAME = Value
-  val FILE_ABSOLUTE_PATH = Value
-  val FILE_FORMAT = Value
-  val FILE_FORMAT2 = Value
-  val FILE_NAME = Value
-  val FILE_VERSION = Value
-  val FINISH_TRIGGER_DURATION = Value
-  val FROM_OFFSET = Value
-  val FROM_TIME = Value
-  val FUNCTION_NAME = Value
-  val FUNCTION_PARAMETER = Value
-  val GROUP_ID = Value
-  val HADOOP_VERSION = Value
-  val HASH_JOIN_KEYS = Value
-  val HEARTBEAT_INTERVAL = Value
-  val HISTORY_DIR = Value
-  val HIVE_CLIENT_VERSION = Value
-  val HIVE_METASTORE_VERSION = Value
-  val HIVE_OPERATION_STATE = Value
-  val HIVE_OPERATION_TYPE = Value
-  val HOST = Value
-  val HOST_PORT = Value
-  val INCOMPATIBLE_TYPES = Value
-  val INDEX = Value
-  val INDEX_FILE_NUM = Value
-  val INDEX_NAME = Value
-  val INFERENCE_MODE = Value
-  val INITIAL_CAPACITY = Value
-  val INITIAL_HEARTBEAT_INTERVAL = Value
-  val INIT_MODE = Value
-  val INTERVAL = Value
-  val ISOLATIO

Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


gengliangwang commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578623991


##
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##
@@ -16,393 +16,393 @@
  */
 package org.apache.spark.internal
 
+trait ILogKey

Review Comment:
   rename as LogKey?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578623558


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -144,6 +146,22 @@ trait LoggingSuiteBase
   }
   }
 
+  // An enumeration value that is not defined in `LogKey`, you can define it 
anywhere
+  private val externalLogKey = LogKey.VALUE
+  private val externalLog = log"${MDC(externalLogKey, "External log 
message.")}"
+  test("Logging with external LogKey") {

Review Comment:
   @gengliangwang 
   What do you think of the current modification approach?
   If we define it as `a string`, it feels like there is `no binding 
requirement` for `LogKey` at all
   Perhaps we need more eyes to review the changes above?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


gengliangwang commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1578608646


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -144,6 +146,22 @@ trait LoggingSuiteBase
   }
   }
 
+  // An enumeration value that is not defined in `LogKey`, you can define it 
anywhere
+  private val externalLogKey = LogKey.VALUE
+  private val externalLog = log"${MDC(externalLogKey, "External log 
message.")}"
+  test("Logging with external LogKey") {

Review Comment:
   We should definitely allow external libraries to extend the log keys. But I 
don't have a good suggestion for this one now. Either making it as a trait or 
making the keys as strings are ok.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1577686240


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -144,6 +146,22 @@ trait LoggingSuiteBase
   }
   }
 
+  // An enumeration value that is not defined in `LogKey`, you can define it 
anywhere
+  private val externalLogKey = LogKey.VALUE
+  private val externalLog = log"${MDC(externalLogKey, "External log 
message.")}"
+  test("Logging with external LogKey") {

Review Comment:
   If necessary in the future, I think it would be easier to implement based on 
the `current changes`, such as
   - `ILogKey`
   
   - `InternalLogKey` extends `ILogKey`
   - `LogKey` extends `InternalLogKey`
   
   - `ExternalLogKey` extends `ILogKey`
   - `ThirdPartyLogKey` extends `ExternalLogKey`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


LuciferYang commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1577602188


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -144,6 +146,22 @@ trait LoggingSuiteBase
   }
   }
 
+  // An enumeration value that is not defined in `LogKey`, you can define it 
anywhere
+  private val externalLogKey = LogKey.VALUE
+  private val externalLog = log"${MDC(externalLogKey, "External log 
message.")}"
+  test("Logging with external LogKey") {

Review Comment:
   Perhaps we also need to consider namespace isolation. Currently, we can 
define keys with the same name in multiple places, but we cannot distinguish 
from the `MessageWithContext` whether they are defined internally in Spark or 
externally.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


LuciferYang commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1577167593


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -144,6 +146,22 @@ trait LoggingSuiteBase
   }
   }
 
+  // An enumeration value that is not defined in `LogKey`, you can define it 
anywhere
+  private val externalLogKey = LogKey.VALUE
+  private val externalLog = log"${MDC(externalLogKey, "External log 
message.")}"
+  test("Logging with external LogKey") {

Review Comment:
   In https://github.com/apache/spark/pull/46186/files, I try to relax the type 
of `MDC.key` to `Enumeration#Value`, which allows us to extend the use of other 
enumeration types as `MDC.key` to enhance a bit of extensibility.
   
   
![image](https://github.com/apache/spark/assets/1475305/60b2fd34-61de-4f92-8f10-cba983bf6718)
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-24 Thread via GitHub


panbingkun commented on code in PR #46193:
URL: https://github.com/apache/spark/pull/46193#discussion_r1577397848


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -144,6 +146,22 @@ trait LoggingSuiteBase
   }
   }
 
+  // An enumeration value that is not defined in `LogKey`, you can define it 
anywhere
+  private val externalLogKey = LogKey.VALUE
+  private val externalLog = log"${MDC(externalLogKey, "External log 
message.")}"
+  test("Logging with external LogKey") {

Review Comment:
   @LuciferYang @gengliangwang 
   I have updated a version that removes `enumeration`. When using structured 
logging mechanisms in external Spark related ecosystems, the following format 
can be used:
   ```
   case object CUSTOME_LOG_KEY extends ILogKey
   ```
   At present, this form is more in line with the traditional Spark style, and 
we can `constrain` the `class` of MDC parameter `key`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org