GitHub user javadba opened a pull request: https://github.com/apache/spark/pull/1586
SPARK-2686 Add Length support to Spark SQL and HQL and Strlen support to SQL Syntactic, parsing, and operational support have been added for LEN(GTH) and STRLEN functions. Examples: SQL: import org.apache.spark.sql._ case class TestData(key: Int, value: String) val sqlc = new SQLContext(sc) import sqlc._ val testData: SchemaRDD = sqlc.sparkContext.parallelize( (1 to 100).map(i => TestData(i, i.toString))) testData.registerAsTable("testData") sqlc.sql("select length(key) as key_len from testData order by key_len desc limit 5").collect res12: Array[org.apache.spark.sql.Row] = Array([3], [2], [2], [2], [2]) HQL: val hc = new org.apache.spark.sql.hive.HiveContext(sc) import hc._ hc.hql hql("select length(grp) from simplex").collect res14: Array[org.apache.spark.sql.Row] = Array([6], [6], [6], [6]) As far as codebase changes: they have been purposefully made similar to the ones made for for adding SUBSTR(ING) from July 17: SQLParser, Optimizer, Expression, stringOperations, and HiveQL were the main classes changed. The testing suites affected are ConstantFolding and ExpressionEvaluation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/javadba/spark strlen Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1586.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1586 ---- commit bb252380399c4146bb63b5d6cbc66234609bab11 Author: Stephen Boesch <java...@gmail.com> Date: 2014-07-12T12:34:58Z Support hbase-0.96-1.1 in SparkBuild commit 947007305cb03515daa8738d3ad2063bcd226a3d Author: Stephen Boesch <java...@gmail.com> Date: 2014-07-12T12:56:38Z overwrote sparkbuild commit 9b6a6471e3c1f087c186a7597c63c7ef2707eaa3 Author: Stephen Boesch <java...@gmail.com> Date: 2014-07-16T13:24:32Z update pom.xml for hadoop-2.3-cdh50.0 and hbase 0.96.1.1 commit b04c4cbef3ecb5a6f13297391b55a36317ce957a Author: Stephen Boesch <java...@gmail.com> Date: 2014-07-16T13:24:40Z Merge branch 'master' of https://github.com/apache/spark commit 5d1cb0a449bbf1ea95272a45f2d030d5cad0195c Author: Stephen Boesch <java...@gmail.com> Date: 2014-07-23T04:33:25Z SPARK-2638 MapOutputTracker concurrency improvement commit 483479ac8ccb0c937da5d306fc4591aa974ed37b Author: Stephen Boesch <java...@gmail.com> Date: 2014-07-23T16:09:26Z Mesos workaround commit 30910b2daac974cd2dac82e8a1b20cd60348a632 Author: Stephen Boesch <java...@gmail.com> Date: 2014-07-23T19:43:59Z Merge remote-tracking branch 'upstream/master' commit 7c675f8d8fc63c5f602c5a767e1215118e0f768c Author: Stephen Boesch <java...@gmail.com> Date: 2014-07-23T20:03:18Z Merge branch 'master' of https://github.com/javadba/spark commit d646a2e1113252d1955185e355da06ddb690b75f Author: Stephen Boesch <java...@gmail.com> Date: 2014-07-25T06:26:11Z SPARK-2686 Add Length support to Spark SQL and HQL and Strlen support to SQL ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---