[GitHub] [spark] beliefer commented on pull request #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static

GitBox Mon, 22 Jun 2020 20:03:28 -0700


beliefer commented on pull request #26875:
URL: https://github.com/apache/spark/pull/26875#issuecomment-647878505



   ---- test2 ----
   
   ```
   // 17+1 length stirngs
   val df1 = spark.range(0, 20000, 1, 
200).selectExpr("concat('aaaaaaaaaaaaaaaaa', id%2) as c1")
   val df2 = spark.range(0, 20000, 1, 
200).selectExpr("concat('bbbbbbbbbbbbbbbbb', id%2) as c2")
   val start = System.currentTimeMillis
   df1.join(df2).where("c2 like c1").count()
   // 3 times test
   // before  90054, 90350, 90283
   // after   13077, 10097, 9770
   println(System.currentTimeMillis - start)
   ```
   You use `%2` is the extreme best case scenarios are used, so 10x time 
performance improvement cannot be demonstrated


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] beliefer commented on pull request #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static

Reply via email to