Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-21 Thread via GitHub


MaxGekk closed pull request #48501: [SPARK-49490][SQL] Add benchmarks for 
initCap
URL: https://github.com/apache/spark/pull/48501


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-21 Thread via GitHub


MaxGekk commented on PR #48501:
URL: https://github.com/apache/spark/pull/48501#issuecomment-2490340634

   +1, LGTM. Merging to master.
   Thank you, @mrk-andreev and @stevomitric @uros-db for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-20 Thread via GitHub


mrk-andreev commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1851026518


##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala:
##
@@ -185,6 +185,48 @@ abstract class CollationBenchmarkBase extends 
BenchmarkBase {
 }
 benchmark.run(relativeTime = true)
   }
+
+  def benchmarkInitCap(
+  collationTypes: Seq[String],
+  utf8Strings: Seq[UTF8String]): Unit = {
+type collationType = Int
+type InitCapEstimator = (UTF8String, collationType) => Unit
+def skipCollationTypeFilter: Any => Boolean = _ => true
+def createBenchmark(
+implName: String,
+impl: InitCapEstimator,
+collationTypeFilter: String => Boolean): Unit = {
+  val benchmark = new Benchmark(
+s"collation unit benchmarks - initCap using impl ${implName}",

Review Comment:
   Fixed



##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala:
##
@@ -185,6 +185,48 @@ abstract class CollationBenchmarkBase extends 
BenchmarkBase {
 }
 benchmark.run(relativeTime = true)
   }
+
+  def benchmarkInitCap(
+  collationTypes: Seq[String],
+  utf8Strings: Seq[UTF8String]): Unit = {
+type collationType = Int
+type InitCapEstimator = (UTF8String, collationType) => Unit
+def skipCollationTypeFilter: Any => Boolean = _ => true
+def createBenchmark(
+implName: String,
+impl: InitCapEstimator,
+collationTypeFilter: String => Boolean): Unit = {
+  val benchmark = new Benchmark(
+s"collation unit benchmarks - initCap using impl ${implName}",
+utf8Strings.size * 10,
+warmupTime = 10.seconds,
+output = output)
+  collationTypes.filter(collationTypeFilter).foreach { collationType => {
+val collationId = CollationFactory.collationNameToId(collationType)
+benchmark.addCase(s"$collationType") { _ =>

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-20 Thread via GitHub


mrk-andreev commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1851026876


##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala:
##
@@ -185,6 +185,48 @@ abstract class CollationBenchmarkBase extends 
BenchmarkBase {
 }
 benchmark.run(relativeTime = true)
   }
+
+  def benchmarkInitCap(
+  collationTypes: Seq[String],
+  utf8Strings: Seq[UTF8String]): Unit = {
+type collationType = Int

Review Comment:
   Thank you. Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-20 Thread via GitHub


MaxGekk commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1850057061


##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala:
##
@@ -185,6 +185,48 @@ abstract class CollationBenchmarkBase extends 
BenchmarkBase {
 }
 benchmark.run(relativeTime = true)
   }
+
+  def benchmarkInitCap(
+  collationTypes: Seq[String],
+  utf8Strings: Seq[UTF8String]): Unit = {
+type collationType = Int
+type InitCapEstimator = (UTF8String, collationType) => Unit
+def skipCollationTypeFilter: Any => Boolean = _ => true
+def createBenchmark(
+implName: String,
+impl: InitCapEstimator,
+collationTypeFilter: String => Boolean): Unit = {
+  val benchmark = new Benchmark(
+s"collation unit benchmarks - initCap using impl ${implName}",

Review Comment:
   nit: the enclosing braces are redundant:
   ```suggestion
   s"collation unit benchmarks - initCap using $implName",
   ```



##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala:
##
@@ -185,6 +185,48 @@ abstract class CollationBenchmarkBase extends 
BenchmarkBase {
 }
 benchmark.run(relativeTime = true)
   }
+
+  def benchmarkInitCap(
+  collationTypes: Seq[String],
+  utf8Strings: Seq[UTF8String]): Unit = {
+type collationType = Int

Review Comment:
   It is a collation id, and types should begin from an upper case letter.



##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala:
##
@@ -185,6 +185,48 @@ abstract class CollationBenchmarkBase extends 
BenchmarkBase {
 }
 benchmark.run(relativeTime = true)
   }
+
+  def benchmarkInitCap(
+  collationTypes: Seq[String],
+  utf8Strings: Seq[UTF8String]): Unit = {
+type collationType = Int
+type InitCapEstimator = (UTF8String, collationType) => Unit
+def skipCollationTypeFilter: Any => Boolean = _ => true
+def createBenchmark(
+implName: String,
+impl: InitCapEstimator,
+collationTypeFilter: String => Boolean): Unit = {
+  val benchmark = new Benchmark(
+s"collation unit benchmarks - initCap using impl ${implName}",
+utf8Strings.size * 10,
+warmupTime = 10.seconds,
+output = output)
+  collationTypes.filter(collationTypeFilter).foreach { collationType => {
+val collationId = CollationFactory.collationNameToId(collationType)
+benchmark.addCase(s"$collationType") { _ =>

Review Comment:
   nit:
   ```suggestion
   benchmark.addCase(collationType) { _ =>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-19 Thread via GitHub


mrk-andreev commented on PR #48501:
URL: https://github.com/apache/spark/pull/48501#issuecomment-2486766722

   Hi @MaxGekk, @stevomitric, 
   
   Does this PR need any additional changes? Are there any blockers we should 
address? Let me know how I can help to move it forward!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-16 Thread via GitHub


mrk-andreev commented on PR #48501:
URL: https://github.com/apache/spark/pull/48501#issuecomment-2480651570

   cc: @MaxGekk
   
   # Related work
   
   This is not related to my code changes but rather to the benchmarks we are 
modifying. It might be worth starting a separate thread in the dev mailing list 
or creating an additional ticket in Jira, which I would be happy to handle.
   
   ## Blackhole
   
   I would like to point out that the current implementation of 
org.apache.spark.benchmark.Benchmark::addCase does not use any form of 
Blackhole ([Blackhole in 
JMH](https://github.com/openjdk/jmh/blob/master/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#L155)),
 which could lead to dead-code elimination. However, I have not observed this 
issue in the existing tests. This is likely due to the complexity and side 
effects of the code being benchmarked, which prevents such elimination.
   
   Would it be a good idea to consider adding this as a feature in the future?
   
   ### Context
   
   `org.apache.spark.benchmark.Benchmark::addCase`
   
   ```
 def addCase(name: String, numIters: Int = 0)(f: Int => Unit): Unit = {
   addTimerCase(name, numIters) { timer =>
 timer.startTiming()
 f(timer.iteration)
 timer.stopTiming()
   }
 }
   ```
   
   ## Async-profiler
   
   I suggest adding [Async 
Profiler](https://github.com/async-profiler/async-profiler), a low-overhead 
sampling profiler, to all benchmark runs. This will help us identify the causes 
of performance degradation.
   
   Would it also be worth considering adding this as a feature in the future?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-16 Thread via GitHub


mrk-andreev commented on PR #48501:
URL: https://github.com/apache/spark/pull/48501#issuecomment-2480647831

   # Related work
   
   This is not related to my code changes but rather to the benchmarks we are 
modifying. It might be worth starting a separate thread in the dev mailing list 
or creating an additional ticket in Jira, which I would be happy to handle.
   
   ## Blackhole
   
   I would like to point out that the current implementation of 
org.apache.spark.benchmark.Benchmark::addCase does not use any form of 
Blackhole ([Blackhole in 
JMH](https://github.com/openjdk/jmh/blob/master/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#L155)),
 which could lead to dead-code elimination. However, I have not observed this 
issue in the existing tests. This is likely due to the complexity and side 
effects of the code being benchmarked, which prevents such elimination.
   
   Would it be a good idea to consider adding this as a feature in the future?
   
   ### Context
   
   `org.apache.spark.benchmark.Benchmark::addCase`
   
   ```
 def addCase(name: String, numIters: Int = 0)(f: Int => Unit): Unit = {
   addTimerCase(name, numIters) { timer =>
 timer.startTiming()
 f(timer.iteration)
 timer.stopTiming()
   }
 }
   ```
   
   ## Async-profiler
   
   I suggest adding [Async 
Profiler](https://github.com/async-profiler/async-profiler), a low-overhead 
sampling profiler, to all benchmark runs. This will help us identify the causes 
of performance degradation.
   
   Would it also be worth considering adding this as a feature in the future?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-16 Thread via GitHub


mrk-andreev commented on PR #48501:
URL: https://github.com/apache/spark/pull/48501#issuecomment-2480474218

   I would like to point out that the current implementation of 
org.apache.spark.benchmark.Benchmark::addCase does not use any form of 
Blackhole ([Blackhole in 
JMH](https://github.com/openjdk/jmh/blob/master/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#L155)),
 which could lead to dead-code elimination. However, I have not observed this 
issue in the existing tests. This is likely due to the complexity and side 
effects of the code being benchmarked, which prevents such elimination.
   
   Would it be a good idea to consider adding this as a feature in the future?
   
   ---
   # Context
   
   `org.apache.spark.benchmark.Benchmark::addCase`
   
   ```
 def addCase(name: String, numIters: Int = 0)(f: Int => Unit): Unit = {
   addTimerCase(name, numIters) { timer =>
 timer.startTiming()
 f(timer.iteration)
 timer.stopTiming()
   }
 }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-15 Thread via GitHub


mrk-andreev commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1844404602


##
sql/core/benchmarks/CollationBenchmark-jdk21-results.txt:
##
@@ -1,54 +1,88 @@
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - equalsFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
--
-UTF8_BINARY  1353   1357   
5  0.1   13532.2   1.0X
-UTF8_LCASE   2601   2602   
2  0.0   26008.0   1.9X
-UNICODE 16745  16756   
   16  0.0  167450.9  12.4X
-UNICODE_CI  16590  16627   
   52  0.0  165904.8  12.3X
+UTF8_BINARY  2220   2223   
5  0.0   22197.0   1.0X
+UTF8_LCASE   4949   4950   
2  0.0   49488.1   2.2X
+UNICODE 28172  28198   
   36  0.0  281721.0  12.7X
+UNICODE_CI  28233  28308   
  106  0.0  282328.2  12.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - compareFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
---
-UTF8_BINARY   1746   1746  
 0  0.1   17462.6   1.0X
-UTF8_LCASE2629   2630  
 1  0.0   26294.8   1.5X
-UNICODE  16744  16744  
 0  0.0  167438.6   9.6X
-UNICODE_CI   16518  16521  
 4  0.0  165180.2   9.5X
+UTF8_BINARY   2731   2733  
 2  0.0   27313.6   1.0X
+UTF8_LCASE4611   4619  
11  0.0   46111.4   1.7X
+UNICODE  28149  28211  
88  0.0  281486.8  10.3X
+UNICODE_CI   27535  27597  
89  0.0  275348.4  10.1X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - hashFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

-UTF8_BINARY2808   2808 
  1  0.0   28076.2   1.0X
-UTF8_LCASE 5409   5410 
  0  0.0   54093.0   1.9X
-UNICODE   67930  67957 
 38  0.0  679296.7  24.2X
-UNICODE_CI56004  56005 
  1  0.0  560044.2  19.9X
+UTF8_BINARY4603   4618 
 22  0.0   46031.3   1.0X
+UTF8_LCASE 9510   9518 
 11  0.0   95097.7   2.1X
+UNICODE  135718 135786 
 97  0.0 1357176.2  29.5X
+UNICODE_CI   113715 113819 
148  0.0 1137145.8  24.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - contains: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-15 Thread via GitHub


mrk-andreev commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1844407932


##
sql/core/benchmarks/CollationBenchmark-jdk21-results.txt:
##
@@ -1,54 +1,88 @@
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - equalsFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
--
-UTF8_BINARY  1349   1349   
0  0.1   13485.4   1.0X
-UTF8_LCASE   3559   3561   
3  0.0   35594.3   2.6X
-UNICODE 17580  17589   
   12  0.0  175803.6  13.0X
-UNICODE_CI  17210  17212   
2  0.0  172100.2  12.8X
+UTF8_BINARY  2220   2223   
5  0.0   22197.0   1.0X
+UTF8_LCASE   4949   4950   
2  0.0   49488.1   2.2X
+UNICODE 28172  28198   
   36  0.0  281721.0  12.7X
+UNICODE_CI  28233  28308   
  106  0.0  282328.2  12.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - compareFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
---
-UTF8_BINARY   1740   1741  
 1  0.1   17398.8   1.0X
-UTF8_LCASE2630   2632  
 3  0.0   26301.0   1.5X
-UNICODE  16732  16743  
16  0.0  167319.7   9.6X
-UNICODE_CI   16482  16492  
14  0.0  164819.7   9.5X
+UTF8_BINARY   2731   2733  
 2  0.0   27313.6   1.0X
+UTF8_LCASE4611   4619  
11  0.0   46111.4   1.7X
+UNICODE  28149  28211  
88  0.0  281486.8  10.3X
+UNICODE_CI   27535  27597  
89  0.0  275348.4  10.1X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - hashFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

-UTF8_BINARY2808   2808 
  0  0.0   28082.3   1.0X
-UTF8_LCASE 5412   5413 
  1  0.0   54123.5   1.9X
-UNICODE   70755  70787 
 44  0.0  707553.4  25.2X
-UNICODE_CI57639  57669 
 43  0.0  576390.0  20.5X
+UTF8_BINARY4603   4618 
 22  0.0   46031.3   1.0X
+UTF8_LCASE 9510   9518 
 11  0.0   95097.7   2.1X
+UNICODE  135718 135786 
 97  0.0 1357176.2  29.5X
+UNICODE_CI   113715 113819 
148  0.0 1137145.8  24.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - contains: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-15 Thread via GitHub


stevomitric commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1843598536


##
sql/core/benchmarks/CollationBenchmark-jdk21-results.txt:
##
@@ -1,54 +1,88 @@
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - equalsFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
--
-UTF8_BINARY  1353   1357   
5  0.1   13532.2   1.0X
-UTF8_LCASE   2601   2602   
2  0.0   26008.0   1.9X
-UNICODE 16745  16756   
   16  0.0  167450.9  12.4X
-UNICODE_CI  16590  16627   
   52  0.0  165904.8  12.3X
+UTF8_BINARY  2220   2223   
5  0.0   22197.0   1.0X
+UTF8_LCASE   4949   4950   
2  0.0   49488.1   2.2X
+UNICODE 28172  28198   
   36  0.0  281721.0  12.7X
+UNICODE_CI  28233  28308   
  106  0.0  282328.2  12.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - compareFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
---
-UTF8_BINARY   1746   1746  
 0  0.1   17462.6   1.0X
-UTF8_LCASE2629   2630  
 1  0.0   26294.8   1.5X
-UNICODE  16744  16744  
 0  0.0  167438.6   9.6X
-UNICODE_CI   16518  16521  
 4  0.0  165180.2   9.5X
+UTF8_BINARY   2731   2733  
 2  0.0   27313.6   1.0X
+UTF8_LCASE4611   4619  
11  0.0   46111.4   1.7X
+UNICODE  28149  28211  
88  0.0  281486.8  10.3X
+UNICODE_CI   27535  27597  
89  0.0  275348.4  10.1X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - hashFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

-UTF8_BINARY2808   2808 
  1  0.0   28076.2   1.0X
-UTF8_LCASE 5409   5410 
  0  0.0   54093.0   1.9X
-UNICODE   67930  67957 
 38  0.0  679296.7  24.2X
-UNICODE_CI56004  56005 
  1  0.0  560044.2  19.9X
+UTF8_BINARY4603   4618 
 22  0.0   46031.3   1.0X
+UTF8_LCASE 9510   9518 
 11  0.0   95097.7   2.1X
+UNICODE  135718 135786 
 97  0.0 1357176.2  29.5X
+UNICODE_CI   113715 113819 
148  0.0 1137145.8  24.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - contains: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-15 Thread via GitHub


stevomitric commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1843598536


##
sql/core/benchmarks/CollationBenchmark-jdk21-results.txt:
##
@@ -1,54 +1,88 @@
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - equalsFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
--
-UTF8_BINARY  1353   1357   
5  0.1   13532.2   1.0X
-UTF8_LCASE   2601   2602   
2  0.0   26008.0   1.9X
-UNICODE 16745  16756   
   16  0.0  167450.9  12.4X
-UNICODE_CI  16590  16627   
   52  0.0  165904.8  12.3X
+UTF8_BINARY  2220   2223   
5  0.0   22197.0   1.0X
+UTF8_LCASE   4949   4950   
2  0.0   49488.1   2.2X
+UNICODE 28172  28198   
   36  0.0  281721.0  12.7X
+UNICODE_CI  28233  28308   
  106  0.0  282328.2  12.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - compareFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
---
-UTF8_BINARY   1746   1746  
 0  0.1   17462.6   1.0X
-UTF8_LCASE2629   2630  
 1  0.0   26294.8   1.5X
-UNICODE  16744  16744  
 0  0.0  167438.6   9.6X
-UNICODE_CI   16518  16521  
 4  0.0  165180.2   9.5X
+UTF8_BINARY   2731   2733  
 2  0.0   27313.6   1.0X
+UTF8_LCASE4611   4619  
11  0.0   46111.4   1.7X
+UNICODE  28149  28211  
88  0.0  281486.8  10.3X
+UNICODE_CI   27535  27597  
89  0.0  275348.4  10.1X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - hashFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

-UTF8_BINARY2808   2808 
  1  0.0   28076.2   1.0X
-UTF8_LCASE 5409   5410 
  0  0.0   54093.0   1.9X
-UNICODE   67930  67957 
 38  0.0  679296.7  24.2X
-UNICODE_CI56004  56005 
  1  0.0  560044.2  19.9X
+UTF8_BINARY4603   4618 
 22  0.0   46031.3   1.0X
+UTF8_LCASE 9510   9518 
 11  0.0   95097.7   2.1X
+UNICODE  135718 135786 
 97  0.0 1357176.2  29.5X
+UNICODE_CI   113715 113819 
148  0.0 1137145.8  24.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - contains: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-15 Thread via GitHub


stevomitric commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1843598536


##
sql/core/benchmarks/CollationBenchmark-jdk21-results.txt:
##
@@ -1,54 +1,88 @@
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - equalsFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
--
-UTF8_BINARY  1353   1357   
5  0.1   13532.2   1.0X
-UTF8_LCASE   2601   2602   
2  0.0   26008.0   1.9X
-UNICODE 16745  16756   
   16  0.0  167450.9  12.4X
-UNICODE_CI  16590  16627   
   52  0.0  165904.8  12.3X
+UTF8_BINARY  2220   2223   
5  0.0   22197.0   1.0X
+UTF8_LCASE   4949   4950   
2  0.0   49488.1   2.2X
+UNICODE 28172  28198   
   36  0.0  281721.0  12.7X
+UNICODE_CI  28233  28308   
  106  0.0  282328.2  12.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - compareFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
---
-UTF8_BINARY   1746   1746  
 0  0.1   17462.6   1.0X
-UTF8_LCASE2629   2630  
 1  0.0   26294.8   1.5X
-UNICODE  16744  16744  
 0  0.0  167438.6   9.6X
-UNICODE_CI   16518  16521  
 4  0.0  165180.2   9.5X
+UTF8_BINARY   2731   2733  
 2  0.0   27313.6   1.0X
+UTF8_LCASE4611   4619  
11  0.0   46111.4   1.7X
+UNICODE  28149  28211  
88  0.0  281486.8  10.3X
+UNICODE_CI   27535  27597  
89  0.0  275348.4  10.1X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - hashFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

-UTF8_BINARY2808   2808 
  1  0.0   28076.2   1.0X
-UTF8_LCASE 5409   5410 
  0  0.0   54093.0   1.9X
-UNICODE   67930  67957 
 38  0.0  679296.7  24.2X
-UNICODE_CI56004  56005 
  1  0.0  560044.2  19.9X
+UTF8_BINARY4603   4618 
 22  0.0   46031.3   1.0X
+UTF8_LCASE 9510   9518 
 11  0.0   95097.7   2.1X
+UNICODE  135718 135786 
 97  0.0 1357176.2  29.5X
+UNICODE_CI   113715 113819 
148  0.0 1137145.8  24.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - contains: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-14 Thread via GitHub


mrk-andreev commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1843013932


##
sql/core/benchmarks/CollationBenchmark-jdk21-results.txt:
##
@@ -1,54 +1,88 @@
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - equalsFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
--
-UTF8_BINARY  1349   1349   
0  0.1   13485.4   1.0X
-UTF8_LCASE   3559   3561   
3  0.0   35594.3   2.6X
-UNICODE 17580  17589   
   12  0.0  175803.6  13.0X
-UNICODE_CI  17210  17212   
2  0.0  172100.2  12.8X
+UTF8_BINARY  2220   2223   
5  0.0   22197.0   1.0X
+UTF8_LCASE   4949   4950   
2  0.0   49488.1   2.2X
+UNICODE 28172  28198   
   36  0.0  281721.0  12.7X
+UNICODE_CI  28233  28308   
  106  0.0  282328.2  12.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - compareFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
---
-UTF8_BINARY   1740   1741  
 1  0.1   17398.8   1.0X
-UTF8_LCASE2630   2632  
 3  0.0   26301.0   1.5X
-UNICODE  16732  16743  
16  0.0  167319.7   9.6X
-UNICODE_CI   16482  16492  
14  0.0  164819.7   9.5X
+UTF8_BINARY   2731   2733  
 2  0.0   27313.6   1.0X
+UTF8_LCASE4611   4619  
11  0.0   46111.4   1.7X
+UNICODE  28149  28211  
88  0.0  281486.8  10.3X
+UNICODE_CI   27535  27597  
89  0.0  275348.4  10.1X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - hashFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

-UTF8_BINARY2808   2808 
  0  0.0   28082.3   1.0X
-UTF8_LCASE 5412   5413 
  1  0.0   54123.5   1.9X
-UNICODE   70755  70787 
 44  0.0  707553.4  25.2X
-UNICODE_CI57639  57669 
 43  0.0  576390.0  20.5X
+UTF8_BINARY4603   4618 
 22  0.0   46031.3   1.0X
+UTF8_LCASE 9510   9518 
 11  0.0   95097.7   2.1X
+UNICODE  135718 135786 
 97  0.0 1357176.2  29.5X
+UNICODE_CI   113715 113819 
148  0.0 1137145.8  24.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - contains: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-13 Thread via GitHub


MaxGekk commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1840658353


##
sql/core/benchmarks/CollationBenchmark-jdk21-results.txt:
##
@@ -1,54 +1,88 @@
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - equalsFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
--
-UTF8_BINARY  1349   1349   
0  0.1   13485.4   1.0X
-UTF8_LCASE   3559   3561   
3  0.0   35594.3   2.6X
-UNICODE 17580  17589   
   12  0.0  175803.6  13.0X
-UNICODE_CI  17210  17212   
2  0.0  172100.2  12.8X
+UTF8_BINARY  2220   2223   
5  0.0   22197.0   1.0X
+UTF8_LCASE   4949   4950   
2  0.0   49488.1   2.2X
+UNICODE 28172  28198   
   36  0.0  281721.0  12.7X
+UNICODE_CI  28233  28308   
  106  0.0  282328.2  12.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - compareFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 
---
-UTF8_BINARY   1740   1741  
 1  0.1   17398.8   1.0X
-UTF8_LCASE2630   2632  
 3  0.0   26301.0   1.5X
-UNICODE  16732  16743  
16  0.0  167319.7   9.6X
-UNICODE_CI   16482  16492  
14  0.0  164819.7   9.5X
+UTF8_BINARY   2731   2733  
 2  0.0   27313.6   1.0X
+UTF8_LCASE4611   4619  
11  0.0   46111.4   1.7X
+UNICODE  28149  28211  
88  0.0  281486.8  10.3X
+UNICODE_CI   27535  27597  
89  0.0  275348.4  10.1X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - hashFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

-UTF8_BINARY2808   2808 
  0  0.0   28082.3   1.0X
-UTF8_LCASE 5412   5413 
  1  0.0   54123.5   1.9X
-UNICODE   70755  70787 
 44  0.0  707553.4  25.2X
-UNICODE_CI57639  57669 
 43  0.0  576390.0  20.5X
+UTF8_BINARY4603   4618 
 22  0.0   46031.3   1.0X
+UTF8_LCASE 9510   9518 
 11  0.0   95097.7   2.1X
+UNICODE  135718 135786 
 97  0.0 1357176.2  29.5X
+UNICODE_CI   113715 113819 
148  0.0 1137145.8  24.7X
 
-OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-aws
+Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 collation unit benchmarks - contains: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns) Relative time
 

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-13 Thread via GitHub


MaxGekk commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1839787864


##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala:
##
@@ -185,6 +185,49 @@ abstract class CollationBenchmarkBase extends 
BenchmarkBase {
 }
 benchmark.run(relativeTime = true)
   }
+
+  def benchmarkInitCap(
+  collationTypes: Seq[String],
+  utf8Strings: Seq[UTF8String]): Unit = {
+type collationType = Int
+type InitCapEstimator = (UTF8String, collationType) => Unit
+def skipCollationTypeFilter: Any => Boolean = _ => true
+def createBenchmark(
+ implName: String,
+ impl: InitCapEstimator,
+ collationTypeFilter: String => Boolean): Unit = {

Review Comment:
   Could you fix indentations here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-12 Thread via GitHub


mrk-andreev commented on PR #48501:
URL: https://github.com/apache/spark/pull/48501#issuecomment-2471613278

   > @mrk-andreev Could you intergrate your benchmark into CollationBenchmark, 
please, as @uros-db pointed out 
https://github.com/apache/spark/pull/48501#pullrequestreview-2385543767. 
Otherwise we might forget to re-run your benchmark while benchmarking collation 
related code.
   
   @MaxGekk , done. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-12 Thread via GitHub


mrk-andreev commented on PR #48501:
URL: https://github.com/apache/spark/pull/48501#issuecomment-2471612662

   > @mrk-andreev Could you intergrate your benchmark into CollationBenchmark, 
please, as @uros-db pointed out 
https://github.com/apache/spark/pull/48501#pullrequestreview-2385543767. 
Otherwise we might forget to re-run your benchmark while benchmarking collation 
related code.
   
   Done. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-07 Thread via GitHub


MaxGekk commented on PR #48501:
URL: https://github.com/apache/spark/pull/48501#issuecomment-2463069933

   @mrk-andreev Could you intergrate your benchmark into `CollationBenchmark`, 
please, as @uros-db pointed out 
https://github.com/apache/spark/pull/48501#pullrequestreview-2385543767. 
Otherwise we might forget to re-run your benchmark while benchmarking collation 
related code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-03 Thread via GitHub


mrk-andreev commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1827021066


##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InitCapBenchmark.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.catalyst.util.CollationFactory
+import org.apache.spark.sql.catalyst.util.CollationSupport.InitCap
+import org.apache.spark.unsafe.types.UTF8String
+
+/**
+ * A benchmark that compares the performance of different ways to evaluate SQL 
initcap expressions.
+ *
+ * Specifically, this class compares the execICU, execBinaryICU, execBinary, 
execLowercase
+ * approaches. This class compares for string of different lengths with 
different words count.
+ *
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class 
+ *--jars , 
+ *   2. build/sbt "sql/Test/runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/Test/runMain "
+ *  Results will be written to "benchmarks/InitCapBenchmark-results.txt".
+ * }}}
+ */
+object InitCapBenchmark extends BenchmarkBase {
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+def generateString(wordsCount: Int, wordLen: Int, firstLetterUpper: 
Boolean): UTF8String = {
+  val sb = new StringBuilder(wordsCount * wordLen + wordLen)
+  for (_ <- 0 until wordsCount) {
+for (pos <- 0 until wordLen) {
+  if (pos == 0 && firstLetterUpper) {
+sb.append("X")
+  } else {
+sb.append("x")
+  }
+}
+sb.append(" ")
+  }
+  UTF8String.fromString(sb.toString())
+}
+
+def addCases(benchmark: Benchmark,
+ text: UTF8String): Unit = {
+  // collation that contains collator
+  val collationId = CollationFactory.collationNameToId("he_ISR")

Review Comment:
   Updated 
   
   - `InitCapBenchmark-results.txt`
   - `InitCapBenchmark-jdk21-results.txt`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-10-29 Thread via GitHub


mrk-andreev commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1821582236


##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InitCapBenchmark.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.catalyst.util.CollationFactory
+import org.apache.spark.sql.catalyst.util.CollationSupport.InitCap
+import org.apache.spark.unsafe.types.UTF8String
+
+/**
+ * A benchmark that compares the performance of different ways to evaluate SQL 
initcap expressions.
+ *
+ * Specifically, this class compares the execICU, execBinaryICU, execBinary, 
execLowercase
+ * approaches. This class compares for string of different lengths with 
different words count.
+ *
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class 
+ *--jars , 
+ *   2. build/sbt "sql/Test/runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/Test/runMain "
+ *  Results will be written to "benchmarks/InitCapBenchmark-results.txt".
+ * }}}
+ */
+object InitCapBenchmark extends BenchmarkBase {
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+def generateString(wordsCount: Int, wordLen: Int, firstLetterUpper: 
Boolean): UTF8String = {
+  val sb = new StringBuilder(wordsCount * wordLen + wordLen)
+  for (_ <- 0 until wordsCount) {
+for (pos <- 0 until wordLen) {
+  if (pos == 0 && firstLetterUpper) {
+sb.append("X")
+  } else {
+sb.append("x")
+  }
+}
+sb.append(" ")
+  }
+  UTF8String.fromString(sb.toString())
+}
+
+def addCases(benchmark: Benchmark,
+ text: UTF8String): Unit = {

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-10-29 Thread via GitHub


mrk-andreev commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1821583528


##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InitCapBenchmark.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.catalyst.util.CollationFactory
+import org.apache.spark.sql.catalyst.util.CollationSupport.InitCap
+import org.apache.spark.unsafe.types.UTF8String
+
+/**
+ * A benchmark that compares the performance of different ways to evaluate SQL 
initcap expressions.
+ *
+ * Specifically, this class compares the execICU, execBinaryICU, execBinary, 
execLowercase
+ * approaches. This class compares for string of different lengths with 
different words count.
+ *
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class 
+ *--jars , 
+ *   2. build/sbt "sql/Test/runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/Test/runMain "
+ *  Results will be written to "benchmarks/InitCapBenchmark-results.txt".
+ * }}}
+ */
+object InitCapBenchmark extends BenchmarkBase {
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+def generateString(wordsCount: Int, wordLen: Int, firstLetterUpper: 
Boolean): UTF8String = {
+  val sb = new StringBuilder(wordsCount * wordLen + wordLen)
+  for (_ <- 0 until wordsCount) {
+for (pos <- 0 until wordLen) {
+  if (pos == 0 && firstLetterUpper) {
+sb.append("X")
+  } else {
+sb.append("x")
+  }
+}
+sb.append(" ")
+  }
+  UTF8String.fromString(sb.toString())
+}
+
+def addCases(benchmark: Benchmark,
+ text: UTF8String): Unit = {
+  // collation that contains collator
+  val collationId = CollationFactory.collationNameToId("he_ISR")

Review Comment:
   Extended with 
   
   ```java
   for (collationName <- List("he_ISR", "UNICODE", "UNICODE_CI")) {
   val collationId = CollationFactory.collationNameToId(collationName)
   assert(CollationFactory.fetchCollation(collationId).collator != null)
   val caseName = s"execICU[collationName=${collationName}]"
   benchmark.addCase(caseName)(_ => InitCap.execICU(text, collationId))
   }
   ```
   
   The primary requirement for `collationId` in `InitCap.execICU` is that 
`CollationFactory.fetchCollation(collationId).collator` must not be null; 
otherwise, the function will throw an NPE.
   



##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InitCapBenchmark.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.catalyst.util.CollationFactory
+import org.apache.spark.sql.catalyst.util.CollationSupport.InitCap
+import org.apache.spark.unsafe.types.UTF8String
+
+/**
+ * A benchmark that compares the performance of different ways to evaluate SQL 
initcap expressions.
+ *
+ * Specifically, this class compares the execICU, execBinaryICU, execBinary, 
execLowercase
+ * approaches. This class compares for string of different lengths with 
different words count.
+ *
+ * To run this benchmark:
+ * {{{
+ * 

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-10-29 Thread via GitHub


mrk-andreev commented on PR #48501:
URL: https://github.com/apache/spark/pull/48501#issuecomment-2445410112

   TODO: Update benchmark outputs. However, I'll wait for additional comments 
since reevaluation may take some time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-10-29 Thread via GitHub


mrk-andreev commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1821583528


##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InitCapBenchmark.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.catalyst.util.CollationFactory
+import org.apache.spark.sql.catalyst.util.CollationSupport.InitCap
+import org.apache.spark.unsafe.types.UTF8String
+
+/**
+ * A benchmark that compares the performance of different ways to evaluate SQL 
initcap expressions.
+ *
+ * Specifically, this class compares the execICU, execBinaryICU, execBinary, 
execLowercase
+ * approaches. This class compares for string of different lengths with 
different words count.
+ *
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class 
+ *--jars , 
+ *   2. build/sbt "sql/Test/runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/Test/runMain "
+ *  Results will be written to "benchmarks/InitCapBenchmark-results.txt".
+ * }}}
+ */
+object InitCapBenchmark extends BenchmarkBase {
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+def generateString(wordsCount: Int, wordLen: Int, firstLetterUpper: 
Boolean): UTF8String = {
+  val sb = new StringBuilder(wordsCount * wordLen + wordLen)
+  for (_ <- 0 until wordsCount) {
+for (pos <- 0 until wordLen) {
+  if (pos == 0 && firstLetterUpper) {
+sb.append("X")
+  } else {
+sb.append("x")
+  }
+}
+sb.append(" ")
+  }
+  UTF8String.fromString(sb.toString())
+}
+
+def addCases(benchmark: Benchmark,
+ text: UTF8String): Unit = {
+  // collation that contains collator
+  val collationId = CollationFactory.collationNameToId("he_ISR")

Review Comment:
   Extended with 
   
   ```
   for (collationName <- List("he_ISR", "UNICODE", "UNICODE_CI")) {
   val collationId = CollationFactory.collationNameToId(collationName)
   assert(CollationFactory.fetchCollation(collationId).collator != null)
   val caseName = s"execICU[collationName=${collationName}]"
   benchmark.addCase(caseName)(_ => InitCap.execICU(text, collationId))
 }
   ```
   
   The primary requirement for collationId in InitCap.execICU is that 
CollationFactory.fetchCollation(collationId).collator must not be null; 
otherwise, the function will throw an NPE.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-10-26 Thread via GitHub


MaxGekk commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1817929449


##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InitCapBenchmark.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.catalyst.util.CollationFactory
+import org.apache.spark.sql.catalyst.util.CollationSupport.InitCap
+import org.apache.spark.unsafe.types.UTF8String
+
+/**
+ * A benchmark that compares the performance of different ways to evaluate SQL 
initcap expressions.
+ *
+ * Specifically, this class compares the execICU, execBinaryICU, execBinary, 
execLowercase
+ * approaches. This class compares for string of different lengths with 
different words count.
+ *
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class 
+ *--jars , 
+ *   2. build/sbt "sql/Test/runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/Test/runMain "
+ *  Results will be written to "benchmarks/InitCapBenchmark-results.txt".
+ * }}}
+ */
+object InitCapBenchmark extends BenchmarkBase {
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+def generateString(wordsCount: Int, wordLen: Int, firstLetterUpper: 
Boolean): UTF8String = {
+  val sb = new StringBuilder(wordsCount * wordLen + wordLen)
+  for (_ <- 0 until wordsCount) {
+for (pos <- 0 until wordLen) {
+  if (pos == 0 && firstLetterUpper) {
+sb.append("X")
+  } else {
+sb.append("x")
+  }
+}
+sb.append(" ")
+  }
+  UTF8String.fromString(sb.toString())
+}
+
+def addCases(benchmark: Benchmark,
+ text: UTF8String): Unit = {
+  // collation that contains collator
+  val collationId = CollationFactory.collationNameToId("he_ISR")

Review Comment:
   Could you benchmark more collations, see
   
https://github.com/apache/spark/blob/9909817aef9198fd88058b4c8fec292de2797b8d/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala#L27



##
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InitCapBenchmark.scala:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.catalyst.util.CollationFactory
+import org.apache.spark.sql.catalyst.util.CollationSupport.InitCap
+import org.apache.spark.unsafe.types.UTF8String
+
+/**
+ * A benchmark that compares the performance of different ways to evaluate SQL 
initcap expressions.
+ *
+ * Specifically, this class compares the execICU, execBinaryICU, execBinary, 
execLowercase
+ * approaches. This class compares for string of different lengths with 
different words count.
+ *
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class 
+ *--jars , 
+ *   2. build/sbt "sql/Test/runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/Test/runMain "
+ *  Results will be written to "benchmarks/InitCapBenchmark-results.txt".
+ * }}}
+ */
+object InitCapBenchmark extends BenchmarkBase {
+  override def runBenchmarkSuite(mainArgs: Arr

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-10-22 Thread via GitHub


MaxGekk commented on PR #48501:
URL: https://github.com/apache/spark/pull/48501#issuecomment-2429155377

   @uros-db @mihailom-db @viktorluc-db Could you review this PR, please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-10-21 Thread via GitHub


mrk-andreev commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1809129420


##
sql/core/benchmarks/InitCapBenchmark-results.txt:
##
@@ -0,0 +1,168 @@
+
+[wc=1, wl=1, capitalized=true]
+
+
+OpenJDK 64-Bit Server VM 17.0.11+10-LTS on Linux 5.15.0-122-generic
+Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
+InitCap evaluation [wc=1, wl=1, capitalized=true]:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+-
+execICU0  
0   0  371177345.1   0.0   1.0X

Review Comment:
   I adjusted the word count for my `Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz`, 
but encountered issues with local evaluation. This led to a remote evaluation 
on an `Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz`, where the performance was 
noticeably less impressive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-10-20 Thread via GitHub


MaxGekk commented on code in PR #48501:
URL: https://github.com/apache/spark/pull/48501#discussion_r1807893787


##
sql/core/benchmarks/InitCapBenchmark-results.txt:
##
@@ -0,0 +1,168 @@
+
+[wc=1, wl=1, capitalized=true]
+
+
+OpenJDK 64-Bit Server VM 17.0.11+10-LTS on Linux 5.15.0-122-generic
+Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
+InitCap evaluation [wc=1, wl=1, capitalized=true]:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+-
+execICU0  
0   0  371177345.1   0.0   1.0X

Review Comment:
   Let's bump number of iterations to see seconds in Best/Avg Time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-10-19 Thread via GitHub


mrk-andreev commented on PR #48501:
URL: https://github.com/apache/spark/pull/48501#issuecomment-2423747579

   > Let's place the backmark at the SQL level so far.
   
   Done
   
   > Can we include the benchmark result files too?
   
   Done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-10-16 Thread via GitHub


HyukjinKwon commented on PR #48501:
URL: https://github.com/apache/spark/pull/48501#issuecomment-2418241753

   Can we include the benchmark result files too? See also "Testing with GitHub 
Actions workflow" at https://spark.apache.org/developer-tools.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-10-16 Thread via GitHub


mrk-andreev commented on PR #48501:
URL: https://github.com/apache/spark/pull/48501#issuecomment-2417414974

   Results of local run  
[InitCapBenchmark-local.txt](https://github.com/user-attachments/files/17399973/InitCapBenchmark-local.txt)
   
   ## Sample
   
   ```
   Running benchmark: InitCap evaluation [wc=1000, wl=16, capitalized=false]
 Running case: execICU
 Stopped after 8978 iterations, 2000 ms
 Running case: execBinaryICU
 Stopped after 6235 iterations, 2000 ms
 Running case: execBinary
 Stopped after 28374 iterations, 2000 ms
 Running case: execLowercase
 Stopped after 8839 iterations, 2000 ms
   
   OpenJDK 64-Bit Server VM 17.0.2+8-86 on Linux 5.15.0-122-generic
   Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
   InitCap evaluation [wc=1000, wl=16, capitalized=false]:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
   
--
   execICU 0
  0   0 432768.3   0.0   1.0X
   execBinaryICU   0
  0   0 285450.1   0.0   0.7X
   execBinary  0
  0   01494256.8   0.0   3.5X
   execLowercase   0
  0   0 415082.4   0.0   1.0X
   ```
   
   ## Open questions
   
   1. Should we place the benchmark code in the same package, 'unsafe,' or at 
the 'SQL level'? If it's in 'unsafe,' should we extract the shared code for 
benchmarks into a shared library?
   2. The benchmark output expects each measurement to be at least 1 ms, but 
this isn't the case here. Should we align the rounding to the first non-zero 
digit after the decimal point?
   4. How detailed do we expect the benchmarks to be? Do we want different axes 
of variation, or should we stick to defaults like parameters?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org