GitHub user javadba opened a pull request:

    https://github.com/apache/spark/pull/1586

    SPARK-2686 Add Length support to Spark SQL and HQL and Strlen support to SQL

    Syntactic, parsing, and operational support have been added for LEN(GTH) 
and STRLEN functions.
    Examples:
    SQL:
    import org.apache.spark.sql._
    case class TestData(key: Int, value: String)
    val sqlc = new SQLContext(sc)
    import sqlc._
    val testData: SchemaRDD = sqlc.sparkContext.parallelize(
    (1 to 100).map(i => TestData(i, i.toString)))
    testData.registerAsTable("testData")
    sqlc.sql("select length(key) as key_len from testData order by key_len desc 
limit 5").collect
    res12: Array[org.apache.spark.sql.Row] = Array([3], [2], [2], [2], [2])
    HQL:
    val hc = new org.apache.spark.sql.hive.HiveContext(sc)
    import hc._
    hc.hql
    hql("select length(grp) from simplex").collect
    res14: Array[org.apache.spark.sql.Row] = Array([6], [6], [6], [6])
    As far as codebase changes: they have been purposefully made similar to the 
ones made for for adding SUBSTR(ING) from July 17:
    SQLParser, Optimizer, Expression, stringOperations, and HiveQL were the 
main classes changed. The testing suites affected are ConstantFolding and 
ExpressionEvaluation.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/javadba/spark strlen

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1586.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1586
    
----
commit bb252380399c4146bb63b5d6cbc66234609bab11
Author: Stephen Boesch <java...@gmail.com>
Date:   2014-07-12T12:34:58Z

    Support hbase-0.96-1.1 in SparkBuild

commit 947007305cb03515daa8738d3ad2063bcd226a3d
Author: Stephen Boesch <java...@gmail.com>
Date:   2014-07-12T12:56:38Z

    overwrote sparkbuild

commit 9b6a6471e3c1f087c186a7597c63c7ef2707eaa3
Author: Stephen Boesch <java...@gmail.com>
Date:   2014-07-16T13:24:32Z

    update pom.xml for hadoop-2.3-cdh50.0 and hbase 0.96.1.1

commit b04c4cbef3ecb5a6f13297391b55a36317ce957a
Author: Stephen Boesch <java...@gmail.com>
Date:   2014-07-16T13:24:40Z

    Merge branch 'master' of https://github.com/apache/spark

commit 5d1cb0a449bbf1ea95272a45f2d030d5cad0195c
Author: Stephen Boesch <java...@gmail.com>
Date:   2014-07-23T04:33:25Z

    SPARK-2638 MapOutputTracker concurrency improvement

commit 483479ac8ccb0c937da5d306fc4591aa974ed37b
Author: Stephen Boesch <java...@gmail.com>
Date:   2014-07-23T16:09:26Z

    Mesos workaround

commit 30910b2daac974cd2dac82e8a1b20cd60348a632
Author: Stephen Boesch <java...@gmail.com>
Date:   2014-07-23T19:43:59Z

    Merge remote-tracking branch 'upstream/master'

commit 7c675f8d8fc63c5f602c5a767e1215118e0f768c
Author: Stephen Boesch <java...@gmail.com>
Date:   2014-07-23T20:03:18Z

    Merge branch 'master' of https://github.com/javadba/spark

commit d646a2e1113252d1955185e355da06ddb690b75f
Author: Stephen Boesch <java...@gmail.com>
Date:   2014-07-25T06:26:11Z

    SPARK-2686 Add Length support to Spark SQL and HQL and Strlen support to SQL

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to