Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-04-01 Thread via GitHub


gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1546593404


##
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##
@@ -141,6 +141,16 @@ package object config {
 "Ensure that memory overhead is a double greater than 0")
   .createWithDefault(0.1)
 
+  private[spark] val STRUCTURED_LOGGING_ENABLED =
+ConfigBuilder("spark.log.structuredLogging.enabled")

Review Comment:
   Thanks, I just created https://issues.apache.org/jira/browse/SPARK-47671 for 
this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-04-01 Thread via GitHub


tgravescs commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1546337511


##
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##
@@ -141,6 +141,16 @@ package object config {
 "Ensure that memory overhead is a double greater than 0")
   .createWithDefault(0.1)
 
+  private[spark] val STRUCTURED_LOGGING_ENABLED =
+ConfigBuilder("spark.log.structuredLogging.enabled")

Review Comment:
   this config needs to be documented on configuration.md



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-31 Thread via GitHub


panbingkun commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1545654750


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+extends AnyFunSuite // scalastyle:ignore funsuite
+with BeforeAndAfterAll
+with BeforeAndAfterEach
+with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+val teeStream = new TeeOutputStream(originalErr, outContent)
+System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+super.beforeAll()
+Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+val msg = "This is a log message"
+logError(msg)
+
+val logOutput = 
outContent.toString.split("\n").filter(_.contains(msg)).head
+assert(logOutput.nonEmpty)
+// scalastyle:off line.size.limit
+val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log 
message","logger":"$className"}""".r

Review Comment:
   @dtenedor @gengliangwang 
   Considering the `readability` of JSON format in UT, I submitted a PR to 
improve it
   https://github.com/apache/spark/pull/45784
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-29 Thread via GitHub


dtenedor commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1544763733


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+extends AnyFunSuite // scalastyle:ignore funsuite
+with BeforeAndAfterAll
+with BeforeAndAfterEach
+with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+val teeStream = new TeeOutputStream(originalErr, outContent)
+System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+super.beforeAll()
+Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+val msg = "This is a log message"
+logError(msg)
+
+val logOutput = 
outContent.toString.split("\n").filter(_.contains(msg)).head
+assert(logOutput.nonEmpty)
+// scalastyle:off line.size.limit
+val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log 
message","logger":"$className"}""".r

Review Comment:
   @gengliangwang I know, I was talking about in the unit test case, not the 
production JSON output itself. It is fine that the production JSON record is on 
one line. I am suggesting that we make the JSON in the unit tests formatted in 
a more readable way, and then we can ignore whitespace when comparing the 
expected JSON record against the expected one in each test case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-29 Thread via GitHub


gengliangwang closed pull request #45729: [SPARK-47574][INFRA] Introduce 
Structured Logging Framework
URL: https://github.com/apache/spark/pull/45729


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-29 Thread via GitHub


gengliangwang commented on PR #45729:
URL: https://github.com/apache/spark/pull/45729#issuecomment-2026713363

   @amaliujia @dtenedor @beliefer @HyukjinKwon @cloud-fan Thanks for the 
reviews!
   The PR has been open for 3 days. I am merging this one to master and moving 
forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-28 Thread via GitHub


gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1544104949


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.File
+import java.nio.file.Files
+
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase extends AnyFunSuite // scalastyle:ignore 
funsuite

Review Comment:
   This test suite is under common/utils module and can't import `SparkFunSuite`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-28 Thread via GitHub


gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1544105135


##
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##
@@ -228,6 +228,11 @@ private[spark] class SparkSubmit extends Logging {
 val childClasspath = new ArrayBuffer[String]()
 val sparkConf = args.toSparkConf()
 if (sparkConf.contains("spark.local.connect")) 
sparkConf.remove("spark.remote")
+if (sparkConf.getBoolean(STRUCTURED_LOGGING_ENABLED.key, defaultValue = 
true)) {

Review Comment:
   Yes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-28 Thread via GitHub


cloud-fan commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1544097323


##
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##
@@ -228,6 +228,11 @@ private[spark] class SparkSubmit extends Logging {
 val childClasspath = new ArrayBuffer[String]()
 val sparkConf = args.toSparkConf()
 if (sparkConf.contains("spark.local.connect")) 
sparkConf.remove("spark.remote")
+if (sparkConf.getBoolean(STRUCTURED_LOGGING_ENABLED.key, defaultValue = 
true)) {

Review Comment:
   does this work for spark-shell as well? And thriftserver?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-28 Thread via GitHub


cloud-fan commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1544095644


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.File
+import java.nio.file.Files
+
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase extends AnyFunSuite // scalastyle:ignore 
funsuite

Review Comment:
   why can't we use `SparkFunSuite`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-28 Thread via GitHub


gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1542391302


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+extends AnyFunSuite // scalastyle:ignore funsuite
+with BeforeAndAfterAll
+with BeforeAndAfterEach
+with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+val teeStream = new TeeOutputStream(originalErr, outContent)
+System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+super.beforeAll()
+Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+val msg = "This is a log message"
+logError(msg)
+
+val logOutput = 
outContent.toString.split("\n").filter(_.contains(msg)).head
+assert(logOutput.nonEmpty)
+// scalastyle:off line.size.limit
+val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log 
message","logger":"$className"}""".r
+// scalastyle:on
+assert(pattern.matches(logOutput))

Review Comment:
   I got the idea. Please check the latest test



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-28 Thread via GitHub


gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1542390853


##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -55,6 +91,43 @@ trait Logging {
 log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+def log(args: Any*): MessageWithContext = {
+  val processedParts = sc.parts.iterator
+  val sb = new StringBuilder(processedParts.next())
+  lazy val map = new java.util.HashMap[String, String]()
+
+  args.foreach { arg =>
+arg match {
+  case mdc: MDC =>
+sb.append(mdc.value)
+if (Logging.isStructuredLoggingEnabled) {
+  map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+}
+  case other =>
+// Note: all the arguments are supposed to be MDCs, but we only 
throw an exception
+//   if we are running in test mode. This is to avoid breaking 
existing code.

Review Comment:
   Thanks, this is a great suggestion!



##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -190,6 +281,7 @@ private[spark] object Logging {
   @volatile private var initialized = false
   @volatile private var defaultRootLevel: Level = null
   @volatile private var defaultSparkLog4jConfig = false
+  @volatile private var useStructuredLogging = true

Review Comment:
   Sure, updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-27 Thread via GitHub


amaliujia commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1542066261


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+extends AnyFunSuite // scalastyle:ignore funsuite
+with BeforeAndAfterAll
+with BeforeAndAfterEach
+with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+val teeStream = new TeeOutputStream(originalErr, outContent)
+System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+super.beforeAll()
+Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+val msg = "This is a log message"
+logError(msg)
+
+val logOutput = 
outContent.toString.split("\n").filter(_.contains(msg)).head
+assert(logOutput.nonEmpty)
+// scalastyle:off line.size.limit
+val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log 
message","logger":"$className"}""".r
+// scalastyle:on
+assert(pattern.matches(logOutput))

Review Comment:
   Regarding to the test case readability, I am wondering if we at last put the 
value of the `logOutput` as a comment here with newlines and whitespaces 
inserted to have better readability, so we can read the pattern and then 
quickly read the value in comment to understand what this test case does?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-27 Thread via GitHub


amaliujia commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1542066261


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+extends AnyFunSuite // scalastyle:ignore funsuite
+with BeforeAndAfterAll
+with BeforeAndAfterEach
+with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+val teeStream = new TeeOutputStream(originalErr, outContent)
+System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+super.beforeAll()
+Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+val msg = "This is a log message"
+logError(msg)
+
+val logOutput = 
outContent.toString.split("\n").filter(_.contains(msg)).head
+assert(logOutput.nonEmpty)
+// scalastyle:off line.size.limit
+val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log 
message","logger":"$className"}""".r
+// scalastyle:on
+assert(pattern.matches(logOutput))

Review Comment:
   Regarding to the test case readability, I am wondering if we at last put the 
value of the `logOutput` as a comment here with newlines and whitespaces 
inserted to have better readability, so we can read the pattern and the quickly 
read the value in comment to understand what this test case does?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-27 Thread via GitHub


amaliujia commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1542066757


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+extends AnyFunSuite // scalastyle:ignore funsuite
+with BeforeAndAfterAll
+with BeforeAndAfterEach
+with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+val teeStream = new TeeOutputStream(originalErr, outContent)
+System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+super.beforeAll()
+Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+val msg = "This is a log message"
+logError(msg)
+
+val logOutput = 
outContent.toString.split("\n").filter(_.contains(msg)).head
+assert(logOutput.nonEmpty)
+// scalastyle:off line.size.limit
+val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log 
message","logger":"$className"}""".r
+// scalastyle:on
+assert(pattern.matches(logOutput))

Review Comment:
   Please ignore if the value is not stable thus whenever we put it in the 
comment, it becomes stable very soon. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-27 Thread via GitHub


amaliujia commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1542066261


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+extends AnyFunSuite // scalastyle:ignore funsuite
+with BeforeAndAfterAll
+with BeforeAndAfterEach
+with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+val teeStream = new TeeOutputStream(originalErr, outContent)
+System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+super.beforeAll()
+Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+val msg = "This is a log message"
+logError(msg)
+
+val logOutput = 
outContent.toString.split("\n").filter(_.contains(msg)).head
+assert(logOutput.nonEmpty)
+// scalastyle:off line.size.limit
+val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log 
message","logger":"$className"}""".r
+// scalastyle:on
+assert(pattern.matches(logOutput))

Review Comment:
   Regarding to the test case readability, I am wondering if we at last put the 
value of the `logOutput` as a comment here so we can read the pattern and the 
quickly read the value in comment to understand what this test case does?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-26 Thread via GitHub


beliefer commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540500697


##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -55,6 +91,43 @@ trait Logging {
 log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+def log(args: Any*): MessageWithContext = {
+  val processedParts = sc.parts.iterator
+  val sb = new StringBuilder(processedParts.next())
+  lazy val map = new java.util.HashMap[String, String]()
+
+  args.foreach { arg =>
+arg match {
+  case mdc: MDC =>
+sb.append(mdc.value)
+if (Logging.isStructuredLoggingEnabled) {
+  map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+}
+  case other =>
+// Note: all the arguments are supposed to be MDCs, but we only 
throw an exception
+//   if we are running in test mode. This is to avoid breaking 
existing code.

Review Comment:
   Shall we change `args: Any*` to `args: MDC* ? So we can avoid the match path.



##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -55,6 +91,43 @@ trait Logging {
 log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+def log(args: Any*): MessageWithContext = {
+  val processedParts = sc.parts.iterator
+  val sb = new StringBuilder(processedParts.next())
+  lazy val map = new java.util.HashMap[String, String]()
+
+  args.foreach { arg =>
+arg match {
+  case mdc: MDC =>
+sb.append(mdc.value)
+if (Logging.isStructuredLoggingEnabled) {
+  map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+}
+  case other =>
+// Note: all the arguments are supposed to be MDCs, but we only 
throw an exception
+//   if we are running in test mode. This is to avoid breaking 
existing code.

Review Comment:
   Shall we change `args: Any*` to `args: MDC*` ? So we can avoid the match 
path.



##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -55,6 +91,43 @@ trait Logging {
 log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+def log(args: Any*): MessageWithContext = {
+  val processedParts = sc.parts.iterator
+  val sb = new StringBuilder(processedParts.next())
+  lazy val map = new java.util.HashMap[String, String]()
+
+  args.foreach { arg =>
+arg match {
+  case mdc: MDC =>
+sb.append(mdc.value)
+if (Logging.isStructuredLoggingEnabled) {
+  map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+}
+  case other =>
+// Note: all the arguments are supposed to be MDCs, but we only 
throw an exception
+//   if we are running in test mode. This is to avoid breaking 
existing code.

Review Comment:
   Shall we change `args: Any*` to `args: MDC* ? So we can avoid the match path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-26 Thread via GitHub


beliefer commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540498116


##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -190,6 +281,7 @@ private[spark] object Logging {
   @volatile private var initialized = false
   @volatile private var defaultRootLevel: Level = null
   @volatile private var defaultSparkLog4jConfig = false
+  @volatile private var useStructuredLogging = true

Review Comment:
   How about `structuredLoggingEnabled` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-26 Thread via GitHub


gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540335898


##
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##
@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.internal
+
+object LogKey extends Enumeration {

Review Comment:
   Thanks, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-26 Thread via GitHub


gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540335763


##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -17,17 +17,39 @@
 
 package org.apache.spark.internal
 
+import java.util.Locale
+
 import scala.jdk.CollectionConverters._
 
-import org.apache.logging.log4j.{Level, LogManager}
+import org.apache.logging.log4j.{CloseableThreadContext, Level, LogManager}
+import org.apache.logging.log4j.CloseableThreadContext.Instance
 import org.apache.logging.log4j.core.{Filter, LifeCycle, LogEvent, Logger => 
Log4jLogger, LoggerContext}
 import org.apache.logging.log4j.core.appender.ConsoleAppender
 import org.apache.logging.log4j.core.config.DefaultConfiguration
 import org.apache.logging.log4j.core.filter.AbstractFilter
 import org.slf4j.{Logger, LoggerFactory}
 
+import org.apache.spark.SparkException
 import org.apache.spark.internal.Logging.SparkShellLoggingFilter
-import org.apache.spark.util.SparkClassUtils
+import org.apache.spark.util.{SparkClassUtils, SparkEnvUtils}
+
+// Mapped Diagnostic Context (MDC) that will be used in log messages.
+// The values of the MDC will be inline in the log message, while the 
key-value pairs will be
+// part of the ThreadContext.
+case class MDC(key: LogKey.Value, value: String)
+
+class LogEntry(entry: => (String, Option[Instance])) {

Review Comment:
   Updated. I tried to avoid creating more internal classes. But the cost 
should be trivial anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-26 Thread via GitHub


gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540319742


##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+extends AnyFunSuite // scalastyle:ignore funsuite
+with BeforeAndAfterAll
+with BeforeAndAfterEach
+with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+val teeStream = new TeeOutputStream(originalErr, outContent)
+System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+super.beforeAll()
+Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+val msg = "This is a log message"
+logError(msg)
+
+val logOutput = 
outContent.toString.split("\n").filter(_.contains(msg)).head
+assert(logOutput.nonEmpty)
+// scalastyle:off line.size.limit
+val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log 
message","logger":"$className"}""".r

Review Comment:
   The output is always one line of json. 
https://logging.apache.org/log4j/2.x/manual/json-template-layout.html doesn't 
support pretty print. Also, parsing multiple-line json is slower than single 
line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-26 Thread via GitHub


gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540312938


##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -55,6 +77,43 @@ trait Logging {
 log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+def log(args: Any*): (String, Option[Instance]) = {
+  val processedParts = sc.parts.iterator
+  val sb = new StringBuilder(processedParts.next())
+  lazy val map = new java.util.HashMap[String, String]()
+
+  args.foreach { arg =>
+arg match {

Review Comment:
   There is another code block at the end of the loop:
   ```
   if (processedParts.hasNext) {
 sb.append(processedParts.next())
   }
   ```
   I am trying to build both the map and the string in one loop.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-26 Thread via GitHub


gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540294855


##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -17,17 +17,39 @@
 
 package org.apache.spark.internal
 
+import java.util.Locale
+
 import scala.jdk.CollectionConverters._
 
-import org.apache.logging.log4j.{Level, LogManager}
+import org.apache.logging.log4j.{CloseableThreadContext, Level, LogManager}
+import org.apache.logging.log4j.CloseableThreadContext.Instance
 import org.apache.logging.log4j.core.{Filter, LifeCycle, LogEvent, Logger => 
Log4jLogger, LoggerContext}
 import org.apache.logging.log4j.core.appender.ConsoleAppender
 import org.apache.logging.log4j.core.config.DefaultConfiguration
 import org.apache.logging.log4j.core.filter.AbstractFilter
 import org.slf4j.{Logger, LoggerFactory}
 
+import org.apache.spark.SparkException
 import org.apache.spark.internal.Logging.SparkShellLoggingFilter
-import org.apache.spark.util.SparkClassUtils
+import org.apache.spark.util.{SparkClassUtils, SparkEnvUtils}
+
+// Mapped Diagnostic Context (MDC) that will be used in log messages.
+// The values of the MDC will be inline in the log message, while the 
key-value pairs will be
+// part of the ThreadContext.
+case class MDC(key: LogKey.Value, value: String)

Review Comment:
   And we are putting the class name into the log message. I am trying to avoid 
a long class name here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-26 Thread via GitHub


gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540294626


##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -17,17 +17,39 @@
 
 package org.apache.spark.internal
 
+import java.util.Locale
+
 import scala.jdk.CollectionConverters._
 
-import org.apache.logging.log4j.{Level, LogManager}
+import org.apache.logging.log4j.{CloseableThreadContext, Level, LogManager}
+import org.apache.logging.log4j.CloseableThreadContext.Instance
 import org.apache.logging.log4j.core.{Filter, LifeCycle, LogEvent, Logger => 
Log4jLogger, LoggerContext}
 import org.apache.logging.log4j.core.appender.ConsoleAppender
 import org.apache.logging.log4j.core.config.DefaultConfiguration
 import org.apache.logging.log4j.core.filter.AbstractFilter
 import org.slf4j.{Logger, LoggerFactory}
 
+import org.apache.spark.SparkException
 import org.apache.spark.internal.Logging.SparkShellLoggingFilter
-import org.apache.spark.util.SparkClassUtils
+import org.apache.spark.util.{SparkClassUtils, SparkEnvUtils}
+
+// Mapped Diagnostic Context (MDC) that will be used in log messages.
+// The values of the MDC will be inline in the log message, while the 
key-value pairs will be
+// part of the ThreadContext.
+case class MDC(key: LogKey.Value, value: String)

Review Comment:
   This is a well-known term: 
https://logging.apache.org/log4j/1.x/apidocs/org/apache/log4j/MDC.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-26 Thread via GitHub


dtenedor commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540233859


##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -17,17 +17,39 @@
 
 package org.apache.spark.internal
 
+import java.util.Locale
+
 import scala.jdk.CollectionConverters._
 
-import org.apache.logging.log4j.{Level, LogManager}
+import org.apache.logging.log4j.{CloseableThreadContext, Level, LogManager}
+import org.apache.logging.log4j.CloseableThreadContext.Instance
 import org.apache.logging.log4j.core.{Filter, LifeCycle, LogEvent, Logger => 
Log4jLogger, LoggerContext}
 import org.apache.logging.log4j.core.appender.ConsoleAppender
 import org.apache.logging.log4j.core.config.DefaultConfiguration
 import org.apache.logging.log4j.core.filter.AbstractFilter
 import org.slf4j.{Logger, LoggerFactory}
 
+import org.apache.spark.SparkException
 import org.apache.spark.internal.Logging.SparkShellLoggingFilter
-import org.apache.spark.util.SparkClassUtils
+import org.apache.spark.util.{SparkClassUtils, SparkEnvUtils}
+
+// Mapped Diagnostic Context (MDC) that will be used in log messages.
+// The values of the MDC will be inline in the log message, while the 
key-value pairs will be
+// part of the ThreadContext.
+case class MDC(key: LogKey.Value, value: String)

Review Comment:
   This would probably be more readable fully spelled-out 
(`MappedDiagnosticContext`) rather than an acronym, no?



##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -55,6 +77,43 @@ trait Logging {
 log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+def log(args: Any*): (String, Option[Instance]) = {
+  val processedParts = sc.parts.iterator
+  val sb = new StringBuilder(processedParts.next())
+  lazy val map = new java.util.HashMap[String, String]()
+
+  args.foreach { arg =>
+arg match {

Review Comment:
   you can skip this line and the `arg =>` on the end of the previous line and 
just list the cases directly instead



##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -17,17 +17,39 @@
 
 package org.apache.spark.internal
 
+import java.util.Locale
+
 import scala.jdk.CollectionConverters._
 
-import org.apache.logging.log4j.{Level, LogManager}
+import org.apache.logging.log4j.{CloseableThreadContext, Level, LogManager}
+import org.apache.logging.log4j.CloseableThreadContext.Instance
 import org.apache.logging.log4j.core.{Filter, LifeCycle, LogEvent, Logger => 
Log4jLogger, LoggerContext}
 import org.apache.logging.log4j.core.appender.ConsoleAppender
 import org.apache.logging.log4j.core.config.DefaultConfiguration
 import org.apache.logging.log4j.core.filter.AbstractFilter
 import org.slf4j.{Logger, LoggerFactory}
 
+import org.apache.spark.SparkException
 import org.apache.spark.internal.Logging.SparkShellLoggingFilter
-import org.apache.spark.util.SparkClassUtils
+import org.apache.spark.util.{SparkClassUtils, SparkEnvUtils}
+
+// Mapped Diagnostic Context (MDC) that will be used in log messages.
+// The values of the MDC will be inline in the log message, while the 
key-value pairs will be
+// part of the ThreadContext.
+case class MDC(key: LogKey.Value, value: String)
+
+class LogEntry(entry: => (String, Option[Instance])) {

Review Comment:
   this pair of `(String, Option[Instance])` might be more readable using a 
helper case class, since we return it frequently.



##
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##
@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.internal
+
+object LogKey extends Enumeration {

Review Comment:
   let's add a comment saying what this represents?



##
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ 

Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-26 Thread via GitHub


gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540122993


##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -55,6 +76,37 @@ trait Logging {
 log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+def log(args: Any*): (String, Instance) = {
+  val processedParts = sc.parts.iterator
+  val sb = new StringBuilder(processedParts.next())
+  lazy val map = new java.util.HashMap[String, String]()
+
+  args.foreach { arg =>
+arg match {
+  case mdc: MDC =>
+sb.append(mdc.value)
+if (Logging.isStructuredLoggingEnabled) {
+  map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+}
+  case other =>
+throw new IllegalArgumentException(s"Argument $other is not a MDC")
+}
+if (processedParts.hasNext) {
+  sb.append(processedParts.next())
+}
+  }
+
+  // Create a CloseableThreadContext and apply the context map
+  val closeableContext = if (Logging.isStructuredLoggingEnabled) {
+CloseableThreadContext.putAll(map)
+  } else {
+null

Review Comment:
   This is to avoid the extra overhead of creating a CloseableThreadContext 
instance. I am changing this to None for safety. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-26 Thread via GitHub


gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540122275


##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -55,6 +76,37 @@ trait Logging {
 log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+def log(args: Any*): (String, Instance) = {
+  val processedParts = sc.parts.iterator
+  val sb = new StringBuilder(processedParts.next())
+  lazy val map = new java.util.HashMap[String, String]()
+
+  args.foreach { arg =>
+arg match {
+  case mdc: MDC =>
+sb.append(mdc.value)
+if (Logging.isStructuredLoggingEnabled) {
+  map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+}
+  case other =>
+throw new IllegalArgumentException(s"Argument $other is not a MDC")

Review Comment:
   Good point. I am changing it to an error which only happens during testing. 
So I don't think we need an error class here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-26 Thread via GitHub


amaliujia commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540084201


##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -55,6 +76,37 @@ trait Logging {
 log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+def log(args: Any*): (String, Instance) = {
+  val processedParts = sc.parts.iterator
+  val sb = new StringBuilder(processedParts.next())
+  lazy val map = new java.util.HashMap[String, String]()
+
+  args.foreach { arg =>
+arg match {
+  case mdc: MDC =>
+sb.append(mdc.value)
+if (Logging.isStructuredLoggingEnabled) {
+  map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+}
+  case other =>
+throw new IllegalArgumentException(s"Argument $other is not a MDC")

Review Comment:
   Do we need Spark Exception and error class here?



##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -55,6 +76,37 @@ trait Logging {
 log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+def log(args: Any*): (String, Instance) = {
+  val processedParts = sc.parts.iterator
+  val sb = new StringBuilder(processedParts.next())
+  lazy val map = new java.util.HashMap[String, String]()
+
+  args.foreach { arg =>
+arg match {
+  case mdc: MDC =>
+sb.append(mdc.value)
+if (Logging.isStructuredLoggingEnabled) {
+  map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+}
+  case other =>
+throw new IllegalArgumentException(s"Argument $other is not a MDC")
+}
+if (processedParts.hasNext) {
+  sb.append(processedParts.next())
+}
+  }
+
+  // Create a CloseableThreadContext and apply the context map
+  val closeableContext = if (Logging.isStructuredLoggingEnabled) {
+CloseableThreadContext.putAll(map)
+  } else {
+null

Review Comment:
   Why choose to return null but not an empty `CloseableThreadContext`, which 
could better mitigate NPE?
   
   How will the callers deal with this null in existing implementation? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

2024-03-26 Thread via GitHub


gengliangwang commented on PR #45729:
URL: https://github.com/apache/spark/pull/45729#issuecomment-2021250316

   cc @steveloughran @dtenedor @@bart-samwel as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org