[GitHub] [spark] LuciferYang commented on pull request #37999: [SPARK-39146][CORE][SQL][K8S] Introduce `JacksonUtils` to use singleton Jackson ObjectMapper

2022-09-27 Thread GitBox


LuciferYang commented on PR #37999:
URL: https://github.com/apache/spark/pull/37999#issuecomment-1259522776

   > I wonder if we can reuse ObjectMapper inside classes where it matters for 
perf and not try to share one instance so widely.
   
   According to this principle, it is enough to keep the changes in 
`RebaseDateTime`, `DataSourceV2Utils` and  `FileDataSourceV2`,  there is no 
possibility of reuse in other 3 files
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #37999: [SPARK-39146][CORE][SQL][K8S] Introduce `JacksonUtils` to use singleton Jackson ObjectMapper

2022-09-27 Thread GitBox


LuciferYang commented on PR #37999:
URL: https://github.com/apache/spark/pull/37999#issuecomment-1259478461

   https://github.com/FasterXML/jackson-core/issues/349#issuecomment-280794659
   
   
https://github.com/FasterXML/jackson-core/commit/67add8c6fdbb8f2968b21a7c54b33db151c0a950#diff-190cbec71e87394830d19fa2fea51b3bc324aa5fe694fc036ef85d1ad39d528f
   
   It seems that Jackson 2.8.7 has fixed the problems mentioned in post. If we 
have doubts about similar issue, I can only make limited changes as you said


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #37999: [SPARK-39146][CORE][SQL][K8S] Introduce `JacksonUtils` to use singleton Jackson ObjectMapper

2022-09-27 Thread GitBox


LuciferYang commented on PR #37999:
URL: https://github.com/apache/spark/pull/37999#issuecomment-1259031253

   @srowen From the above test results, there is no significant performance 
difference between using global and local singletons.
   
   From a code perspective, thread safety should not be guaranteed by locks, it 
seems guaranteed by using a new `JsonParser/JsonGenerator` every serialization 
and deserialization.
   
   
https://github.com/FasterXML/jackson-databind/blob/492334da383b0677c19b08964fddad18230546af/src/main/java/com/fasterxml/jackson/databind/ObjectMapper.java#L3838-L3851
   
   https://user-images.githubusercontent.com/1475305/192445439-80fc6740-767f-4848-b444-f6c22a40d1cc.png;>
   
   
https://github.com/FasterXML/jackson-databind/blob/492334da383b0677c19b08964fddad18230546af/src/main/java/com/fasterxml/jackson/databind/ObjectMapper.java#L3647-L3658
   
   https://user-images.githubusercontent.com/1475305/192445680-6c2065ff-7cf4-4d60-8c01-df219eb629ab.png;>
   
   So there seems to be no problem with the global instance. `ObjectMapper` is 
only used as a configuration template, so I am both ok to use a global 
singleton or local singleton
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #37999: [SPARK-39146][CORE][SQL][K8S] Introduce `JacksonUtils` to use singleton Jackson ObjectMapper

2022-09-26 Thread GitBox


LuciferYang commented on PR #37999:
URL: https://github.com/apache/spark/pull/37999#issuecomment-1259014121

   I write a multi thread test as follows:
   
   
https://github.com/LuciferYang/spark/blob/fe26455668a904400e4b81ec696b7cd3abb3c923/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/MJacksonBenchmark.scala
   
   ```scala
 def testWriteMapToJson(valuesPerIteration: Int, threads: Int): Unit = {
   
   val map = Map("intValue" -> 1,
 "longValue" -> 2L,
 "doubleValue" -> 3.0D,
 "stringValue" -> "4",
 "floatValue" -> 5.0F,
 "booleanValue" -> true)
   
   
   val benchmark = new Benchmark(s"Test $threads threads write map to json",
 valuesPerIteration, output = output)
   
   val multi = Array.fill(threads)({
 val ret = new ObjectMapper()
 ret.registerModule(DefaultScalaModule)
 ret
   })
   
   benchmark.addCase("Test use multi mapper") { _: Int =>
 val latch = new CountDownLatch(valuesPerIteration)
 val executor = ThreadUtils.newDaemonFixedThreadPool(threads, "multi")
 for (i <- 0 until valuesPerIteration) {
   executor.submit(new Runnable {
 override def run(): Unit = {
   val idx = i % threads
   multi(idx).writeValueAsString(map)
   latch.countDown()
 }
   })
 }
 latch.await()
 executor.shutdown()
   }
   
   val mapper = new ObjectMapper()
   mapper.registerModule(DefaultScalaModule)
   val singleton = Array.fill(threads)(mapper)
   benchmark.addCase("Test use singleton mapper") { _: Int =>
 val latch = new CountDownLatch(valuesPerIteration)
 val executor = ThreadUtils.newDaemonFixedThreadPool(threads, 
"singleton")
 for (i <- 0 until valuesPerIteration) {
   executor.submit(new Runnable {
 override def run(): Unit = {
   val idx = i % threads
   singleton(idx).writeValueAsString(map)
   latch.countDown()
 }
   })
 }
 latch.await()
 executor.shutdown()
   }
   
   benchmark.run()
 }
   
 def testReadJsonToMap(valuesPerIteration: Int, threads: Int): Unit = {
   
   val input = {
 val map = Map("intValue" -> 1,
   "longValue" -> 2L,
   "doubleValue" -> 3.0D,
   "stringValue" -> "4",
   "floatValue" -> 5.0F,
   "booleanValue" -> true)
 val mapper = new ObjectMapper()
 mapper.registerModule(DefaultScalaModule)
 mapper.writeValueAsString(map)
   }
   
   val benchmark = new Benchmark(s"Test $threads threads read json to map",
 valuesPerIteration, output = output)
   
   val multi = Array.fill(threads)({
 val ret = new ObjectMapper()
 ret.registerModule(DefaultScalaModule)
 ret
   })
   
   benchmark.addCase("Test use multi mapper") { _: Int =>
 val latch = new CountDownLatch(valuesPerIteration)
 val executor = ThreadUtils.newDaemonFixedThreadPool(threads, "multi")
 for (i <- 0 until valuesPerIteration) {
   executor.submit(new Runnable {
 override def run(): Unit = {
   val idx = i % threads
   multi(idx).readValue(input, classOf[mutable.HashMap[String, 
String]])
   latch.countDown()
 }
   })
 }
 latch.await()
 executor.shutdown()
   }
   
   val mapper = new ObjectMapper()
   mapper.registerModule(DefaultScalaModule)
   val singleton = Array.fill(threads)(mapper)
   
   benchmark.addCase("Test use singleton mapper") { _: Int =>
 val latch = new CountDownLatch(valuesPerIteration)
 val executor = ThreadUtils.newDaemonFixedThreadPool(threads, 
"singleton")
 for (i <- 0 until valuesPerIteration) {
   executor.submit(new Runnable {
 override def run(): Unit = {
   val idx = i % threads
   singleton(idx).readValue(input, classOf[mutable.HashMap[String, 
String]])
   latch.countDown()
 }
   })
 }
 latch.await()
 executor.shutdown()
   }
   
   benchmark.run()
 }
   ```
   
   The result from GA as follows:
   
   ```
   OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1020-azure
   Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
   Test 5 threads read json to map:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
   

   Test use multi mapper  1927   1959   
   46  0.51927.2   1.0X
   Test use singleton mapper  1727   1884   
  222  0.61727.0  

[GitHub] [spark] LuciferYang commented on pull request #37999: [SPARK-39146][CORE][SQL][K8S] Introduce `JacksonUtils` to use singleton Jackson ObjectMapper

2022-09-26 Thread GitBox


LuciferYang commented on PR #37999:
URL: https://github.com/apache/spark/pull/37999#issuecomment-1258914643

   In the serial r/w scenario, the benefits are obvious,
   
   - Reading scenario: using singleton is 1800+% faster than creating 
`ObjectMapper ` every time
   - Write scenario: using a single instance is 500+% faster than creating 
`ObjectMapper ` every time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #37999: [SPARK-39146][CORE][SQL][K8S] Introduce `JacksonUtils` to use singleton Jackson ObjectMapper

2022-09-26 Thread GitBox


LuciferYang commented on PR #37999:
URL: https://github.com/apache/spark/pull/37999#issuecomment-1258157326

   OK, let me add a multi thread comparison to check this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org