[GitHub] [spark] LuciferYang commented on pull request #37999: [SPARK-39146][CORE][SQL][K8S] Introduce `JacksonUtils` to use singleton Jackson ObjectMapper
LuciferYang commented on PR #37999: URL: https://github.com/apache/spark/pull/37999#issuecomment-1259522776 > I wonder if we can reuse ObjectMapper inside classes where it matters for perf and not try to share one instance so widely. According to this principle, it is enough to keep the changes in `RebaseDateTime`, `DataSourceV2Utils` and `FileDataSourceV2`, there is no possibility of reuse in other 3 files -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #37999: [SPARK-39146][CORE][SQL][K8S] Introduce `JacksonUtils` to use singleton Jackson ObjectMapper
LuciferYang commented on PR #37999: URL: https://github.com/apache/spark/pull/37999#issuecomment-1259478461 https://github.com/FasterXML/jackson-core/issues/349#issuecomment-280794659 https://github.com/FasterXML/jackson-core/commit/67add8c6fdbb8f2968b21a7c54b33db151c0a950#diff-190cbec71e87394830d19fa2fea51b3bc324aa5fe694fc036ef85d1ad39d528f It seems that Jackson 2.8.7 has fixed the problems mentioned in post. If we have doubts about similar issue, I can only make limited changes as you said -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #37999: [SPARK-39146][CORE][SQL][K8S] Introduce `JacksonUtils` to use singleton Jackson ObjectMapper
LuciferYang commented on PR #37999: URL: https://github.com/apache/spark/pull/37999#issuecomment-1259031253 @srowen From the above test results, there is no significant performance difference between using global and local singletons. From a code perspective, thread safety should not be guaranteed by locks, it seems guaranteed by using a new `JsonParser/JsonGenerator` every serialization and deserialization. https://github.com/FasterXML/jackson-databind/blob/492334da383b0677c19b08964fddad18230546af/src/main/java/com/fasterxml/jackson/databind/ObjectMapper.java#L3838-L3851 https://user-images.githubusercontent.com/1475305/192445439-80fc6740-767f-4848-b444-f6c22a40d1cc.png;> https://github.com/FasterXML/jackson-databind/blob/492334da383b0677c19b08964fddad18230546af/src/main/java/com/fasterxml/jackson/databind/ObjectMapper.java#L3647-L3658 https://user-images.githubusercontent.com/1475305/192445680-6c2065ff-7cf4-4d60-8c01-df219eb629ab.png;> So there seems to be no problem with the global instance. `ObjectMapper` is only used as a configuration template, so I am both ok to use a global singleton or local singleton -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #37999: [SPARK-39146][CORE][SQL][K8S] Introduce `JacksonUtils` to use singleton Jackson ObjectMapper
LuciferYang commented on PR #37999: URL: https://github.com/apache/spark/pull/37999#issuecomment-1259014121 I write a multi thread test as follows: https://github.com/LuciferYang/spark/blob/fe26455668a904400e4b81ec696b7cd3abb3c923/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/MJacksonBenchmark.scala ```scala def testWriteMapToJson(valuesPerIteration: Int, threads: Int): Unit = { val map = Map("intValue" -> 1, "longValue" -> 2L, "doubleValue" -> 3.0D, "stringValue" -> "4", "floatValue" -> 5.0F, "booleanValue" -> true) val benchmark = new Benchmark(s"Test $threads threads write map to json", valuesPerIteration, output = output) val multi = Array.fill(threads)({ val ret = new ObjectMapper() ret.registerModule(DefaultScalaModule) ret }) benchmark.addCase("Test use multi mapper") { _: Int => val latch = new CountDownLatch(valuesPerIteration) val executor = ThreadUtils.newDaemonFixedThreadPool(threads, "multi") for (i <- 0 until valuesPerIteration) { executor.submit(new Runnable { override def run(): Unit = { val idx = i % threads multi(idx).writeValueAsString(map) latch.countDown() } }) } latch.await() executor.shutdown() } val mapper = new ObjectMapper() mapper.registerModule(DefaultScalaModule) val singleton = Array.fill(threads)(mapper) benchmark.addCase("Test use singleton mapper") { _: Int => val latch = new CountDownLatch(valuesPerIteration) val executor = ThreadUtils.newDaemonFixedThreadPool(threads, "singleton") for (i <- 0 until valuesPerIteration) { executor.submit(new Runnable { override def run(): Unit = { val idx = i % threads singleton(idx).writeValueAsString(map) latch.countDown() } }) } latch.await() executor.shutdown() } benchmark.run() } def testReadJsonToMap(valuesPerIteration: Int, threads: Int): Unit = { val input = { val map = Map("intValue" -> 1, "longValue" -> 2L, "doubleValue" -> 3.0D, "stringValue" -> "4", "floatValue" -> 5.0F, "booleanValue" -> true) val mapper = new ObjectMapper() mapper.registerModule(DefaultScalaModule) mapper.writeValueAsString(map) } val benchmark = new Benchmark(s"Test $threads threads read json to map", valuesPerIteration, output = output) val multi = Array.fill(threads)({ val ret = new ObjectMapper() ret.registerModule(DefaultScalaModule) ret }) benchmark.addCase("Test use multi mapper") { _: Int => val latch = new CountDownLatch(valuesPerIteration) val executor = ThreadUtils.newDaemonFixedThreadPool(threads, "multi") for (i <- 0 until valuesPerIteration) { executor.submit(new Runnable { override def run(): Unit = { val idx = i % threads multi(idx).readValue(input, classOf[mutable.HashMap[String, String]]) latch.countDown() } }) } latch.await() executor.shutdown() } val mapper = new ObjectMapper() mapper.registerModule(DefaultScalaModule) val singleton = Array.fill(threads)(mapper) benchmark.addCase("Test use singleton mapper") { _: Int => val latch = new CountDownLatch(valuesPerIteration) val executor = ThreadUtils.newDaemonFixedThreadPool(threads, "singleton") for (i <- 0 until valuesPerIteration) { executor.submit(new Runnable { override def run(): Unit = { val idx = i % threads singleton(idx).readValue(input, classOf[mutable.HashMap[String, String]]) latch.countDown() } }) } latch.await() executor.shutdown() } benchmark.run() } ``` The result from GA as follows: ``` OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1020-azure Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz Test 5 threads read json to map: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Test use multi mapper 1927 1959 46 0.51927.2 1.0X Test use singleton mapper 1727 1884 222 0.61727.0
[GitHub] [spark] LuciferYang commented on pull request #37999: [SPARK-39146][CORE][SQL][K8S] Introduce `JacksonUtils` to use singleton Jackson ObjectMapper
LuciferYang commented on PR #37999: URL: https://github.com/apache/spark/pull/37999#issuecomment-1258914643 In the serial r/w scenario, the benefits are obvious, - Reading scenario: using singleton is 1800+% faster than creating `ObjectMapper ` every time - Write scenario: using a single instance is 500+% faster than creating `ObjectMapper ` every time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #37999: [SPARK-39146][CORE][SQL][K8S] Introduce `JacksonUtils` to use singleton Jackson ObjectMapper
LuciferYang commented on PR #37999: URL: https://github.com/apache/spark/pull/37999#issuecomment-1258157326 OK, let me add a multi thread comparison to check this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org