[jira] [Commented] (SPARK-25317) MemoryBlock performance regression
[ https://issues.apache.org/jira/browse/SPARK-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607543#comment-16607543 ] Apache Spark commented on SPARK-25317: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/22361 > MemoryBlock performance regression > -- > > Key: SPARK-25317 > URL: https://issues.apache.org/jira/browse/SPARK-25317 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Assignee: Marco Gaido >Priority: Blocker > Fix For: 2.4.0 > > > eThere is a performance regression when calculating hash code for UTF8String: > {code:java} > test("hashing") { > import org.apache.spark.unsafe.hash.Murmur3_x86_32 > import org.apache.spark.unsafe.types.UTF8String > val hasher = new Murmur3_x86_32(0) > val str = UTF8String.fromString("b" * 10001) > val numIter = 10 > val start = System.nanoTime > for (i <- 0 until numIter) { > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > } > val duration = (System.nanoTime() - start) / 1000 / numIter > println(s"duration $duration us") > } > {code} > To run this test in 2.3, we need to add > {code:java} > public static int hashUTF8String(UTF8String str, int seed) { > return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), > str.numBytes(), seed); > } > {code} > to `Murmur3_x86_32` > In my laptop, the result for master vs 2.3 is: 120 us vs 40 us -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25317) MemoryBlock performance regression
[ https://issues.apache.org/jira/browse/SPARK-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604146#comment-16604146 ] Apache Spark commented on SPARK-25317: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/22338 > MemoryBlock performance regression > -- > > Key: SPARK-25317 > URL: https://issues.apache.org/jira/browse/SPARK-25317 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Priority: Blocker > > eThere is a performance regression when calculating hash code for UTF8String: > {code:java} > test("hashing") { > import org.apache.spark.unsafe.hash.Murmur3_x86_32 > import org.apache.spark.unsafe.types.UTF8String > val hasher = new Murmur3_x86_32(0) > val str = UTF8String.fromString("b" * 10001) > val numIter = 10 > val start = System.nanoTime > for (i <- 0 until numIter) { > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > } > val duration = (System.nanoTime() - start) / 1000 / numIter > println(s"duration $duration us") > } > {code} > To run this test in 2.3, we need to add > {code:java} > public static int hashUTF8String(UTF8String str, int seed) { > return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), > str.numBytes(), seed); > } > {code} > to `Murmur3_x86_32` > In my laptop, the result for master vs 2.3 is: 120 us vs 40 us -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25317) MemoryBlock performance regression
[ https://issues.apache.org/jira/browse/SPARK-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604145#comment-16604145 ] Apache Spark commented on SPARK-25317: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/22338 > MemoryBlock performance regression > -- > > Key: SPARK-25317 > URL: https://issues.apache.org/jira/browse/SPARK-25317 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Priority: Blocker > > eThere is a performance regression when calculating hash code for UTF8String: > {code:java} > test("hashing") { > import org.apache.spark.unsafe.hash.Murmur3_x86_32 > import org.apache.spark.unsafe.types.UTF8String > val hasher = new Murmur3_x86_32(0) > val str = UTF8String.fromString("b" * 10001) > val numIter = 10 > val start = System.nanoTime > for (i <- 0 until numIter) { > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > } > val duration = (System.nanoTime() - start) / 1000 / numIter > println(s"duration $duration us") > } > {code} > To run this test in 2.3, we need to add > {code:java} > public static int hashUTF8String(UTF8String str, int seed) { > return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), > str.numBytes(), seed); > } > {code} > to `Murmur3_x86_32` > In my laptop, the result for master vs 2.3 is: 120 us vs 40 us -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25317) MemoryBlock performance regression
[ https://issues.apache.org/jira/browse/SPARK-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604133#comment-16604133 ] Marco Gaido commented on SPARK-25317: - [~kiszk] sure, we can investigate further in the PR the root cause. Thanks. > MemoryBlock performance regression > -- > > Key: SPARK-25317 > URL: https://issues.apache.org/jira/browse/SPARK-25317 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Priority: Blocker > > eThere is a performance regression when calculating hash code for UTF8String: > {code:java} > test("hashing") { > import org.apache.spark.unsafe.hash.Murmur3_x86_32 > import org.apache.spark.unsafe.types.UTF8String > val hasher = new Murmur3_x86_32(0) > val str = UTF8String.fromString("b" * 10001) > val numIter = 10 > val start = System.nanoTime > for (i <- 0 until numIter) { > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > } > val duration = (System.nanoTime() - start) / 1000 / numIter > println(s"duration $duration us") > } > {code} > To run this test in 2.3, we need to add > {code:java} > public static int hashUTF8String(UTF8String str, int seed) { > return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), > str.numBytes(), seed); > } > {code} > to `Murmur3_x86_32` > In my laptop, the result for master vs 2.3 is: 120 us vs 40 us -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25317) MemoryBlock performance regression
[ https://issues.apache.org/jira/browse/SPARK-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604120#comment-16604120 ] Kazuaki Ishizaki commented on SPARK-25317: -- When I have been investigating this issue, I realized that # of Javabyte code size in a method can change performance. I guess that this issue is related to method inlining. However, I have not found the root cause yet. [~mgaido] Would it be possible to submit a PR to fix this issue if possible? > MemoryBlock performance regression > -- > > Key: SPARK-25317 > URL: https://issues.apache.org/jira/browse/SPARK-25317 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Priority: Blocker > > eThere is a performance regression when calculating hash code for UTF8String: > {code:java} > test("hashing") { > import org.apache.spark.unsafe.hash.Murmur3_x86_32 > import org.apache.spark.unsafe.types.UTF8String > val hasher = new Murmur3_x86_32(0) > val str = UTF8String.fromString("b" * 10001) > val numIter = 10 > val start = System.nanoTime > for (i <- 0 until numIter) { > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > } > val duration = (System.nanoTime() - start) / 1000 / numIter > println(s"duration $duration us") > } > {code} > To run this test in 2.3, we need to add > {code:java} > public static int hashUTF8String(UTF8String str, int seed) { > return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), > str.numBytes(), seed); > } > {code} > to `Murmur3_x86_32` > In my laptop, the result for master vs 2.3 is: 120 us vs 40 us -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25317) MemoryBlock performance regression
[ https://issues.apache.org/jira/browse/SPARK-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603174#comment-16603174 ] Marco Gaido commented on SPARK-25317: - I think I have a fix for this. I can submit a PR if you want, but I am still not sure about the root cause of the regression. My best guess is that there are more than one reason and the perf improvement happens iff all the reasons are fixed, which is rather strange to me. > MemoryBlock performance regression > -- > > Key: SPARK-25317 > URL: https://issues.apache.org/jira/browse/SPARK-25317 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Priority: Blocker > > eThere is a performance regression when calculating hash code for UTF8String: > {code:java} > test("hashing") { > import org.apache.spark.unsafe.hash.Murmur3_x86_32 > import org.apache.spark.unsafe.types.UTF8String > val hasher = new Murmur3_x86_32(0) > val str = UTF8String.fromString("b" * 10001) > val numIter = 10 > val start = System.nanoTime > for (i <- 0 until numIter) { > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > } > val duration = (System.nanoTime() - start) / 1000 / numIter > println(s"duration $duration us") > } > {code} > To run this test in 2.3, we need to add > {code:java} > public static int hashUTF8String(UTF8String str, int seed) { > return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), > str.numBytes(), seed); > } > {code} > to `Murmur3_x86_32` > In my laptop, the result for master vs 2.3 is: 120 us vs 40 us -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25317) MemoryBlock performance regression
[ https://issues.apache.org/jira/browse/SPARK-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602567#comment-16602567 ] Kazuaki Ishizaki commented on SPARK-25317: -- I confirmed this performance difference even after adding warmup. Let me investigate furthermore. > MemoryBlock performance regression > -- > > Key: SPARK-25317 > URL: https://issues.apache.org/jira/browse/SPARK-25317 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Priority: Blocker > > eThere is a performance regression when calculating hash code for UTF8String: > {code:java} > test("hashing") { > import org.apache.spark.unsafe.hash.Murmur3_x86_32 > import org.apache.spark.unsafe.types.UTF8String > val hasher = new Murmur3_x86_32(0) > val str = UTF8String.fromString("b" * 10001) > val numIter = 10 > val start = System.nanoTime > for (i <- 0 until numIter) { > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > } > val duration = (System.nanoTime() - start) / 1000 / numIter > println(s"duration $duration us") > } > {code} > To run this test in 2.3, we need to add > {code:java} > public static int hashUTF8String(UTF8String str, int seed) { > return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), > str.numBytes(), seed); > } > {code} > to `Murmur3_x86_32` > In my laptop, the result for master vs 2.3 is: 120 us vs 40 us -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25317) MemoryBlock performance regression
[ https://issues.apache.org/jira/browse/SPARK-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602512#comment-16602512 ] Jungtaek Lim commented on SPARK-25317: -- Why not running test with JMH, applying warmup and iteration? Not sure it can be applied to scala test, but the Java test code should be simple if these Spark classes are aware of interop. > MemoryBlock performance regression > -- > > Key: SPARK-25317 > URL: https://issues.apache.org/jira/browse/SPARK-25317 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Priority: Blocker > > eThere is a performance regression when calculating hash code for UTF8String: > {code:java} > test("hashing") { > import org.apache.spark.unsafe.hash.Murmur3_x86_32 > import org.apache.spark.unsafe.types.UTF8String > val hasher = new Murmur3_x86_32(0) > val str = UTF8String.fromString("b" * 10001) > val numIter = 10 > val start = System.nanoTime > for (i <- 0 until numIter) { > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > } > val duration = (System.nanoTime() - start) / 1000 / numIter > println(s"duration $duration us") > } > {code} > To run this test in 2.3, we need to add > {code:java} > public static int hashUTF8String(UTF8String str, int seed) { > return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), > str.numBytes(), seed); > } > {code} > to `Murmur3_x86_32` > In my laptop, the result for master vs 2.3 is: 120 us vs 40 us -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25317) MemoryBlock performance regression
[ https://issues.apache.org/jira/browse/SPARK-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602506#comment-16602506 ] Kazuaki Ishizaki commented on SPARK-25317: -- Let me run this on 2.3 and master. One question. This benchmark does not have an warm up loop. In other words, this benchmark may include execution time on an interpreter, too. Is this behavior intentional? > MemoryBlock performance regression > -- > > Key: SPARK-25317 > URL: https://issues.apache.org/jira/browse/SPARK-25317 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Priority: Blocker > > eThere is a performance regression when calculating hash code for UTF8String: > {code:java} > test("hashing") { > import org.apache.spark.unsafe.hash.Murmur3_x86_32 > import org.apache.spark.unsafe.types.UTF8String > val hasher = new Murmur3_x86_32(0) > val str = UTF8String.fromString("b" * 10001) > val numIter = 10 > val start = System.nanoTime > for (i <- 0 until numIter) { > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > } > val duration = (System.nanoTime() - start) / 1000 / numIter > println(s"duration $duration us") > } > {code} > To run this test in 2.3, we need to add > {code:java} > public static int hashUTF8String(UTF8String str, int seed) { > return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), > str.numBytes(), seed); > } > {code} > to `Murmur3_x86_32` > In my laptop, the result for master vs 2.3 is: 120 us vs 40 us -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25317) MemoryBlock performance regression
[ https://issues.apache.org/jira/browse/SPARK-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602503#comment-16602503 ] Wenchen Fan commented on SPARK-25317: - cc [~kiszk] > MemoryBlock performance regression > -- > > Key: SPARK-25317 > URL: https://issues.apache.org/jira/browse/SPARK-25317 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Priority: Blocker > > There is a performance regression when calculating hash code for UTF8String: > {code} > test("hashing") { > import org.apache.spark.unsafe.hash.Murmur3_x86_32 > import org.apache.spark.unsafe.types.UTF8String > val hasher = new Murmur3_x86_32(0) > val str = UTF8String.fromString("b" * 10001) > val numIter = 10 > val start = System.nanoTime > for (i <- 0 until numIter) { > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > } > val duration = (System.nanoTime() - start) / 1000 / numIter > println(s"duration $duration us") > } > {code} > To run this test in 2.3, we need to add > {code} > public static int hashUTF8String(UTF8String str, int seed) { > return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), > str.numBytes(), seed); > } > {code} > to `Murmur3_x86_32` > In my laptop, the result for master vs 2.3 is: 120 us vs 40 us -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org