GitHub user javadba opened a pull request:
https://github.com/apache/spark/pull/1586
SPARK-2686 Add Length support to Spark SQL and HQL and Strlen support to SQL
Syntactic, parsing, and operational support have been added for LEN(GTH)
and STRLEN functions.
Examples:
SQL:
import org.apache.spark.sql._
case class TestData(key: Int, value: String)
val sqlc = new SQLContext(sc)
import sqlc._
val testData: SchemaRDD = sqlc.sparkContext.parallelize(
(1 to 100).map(i => TestData(i, i.toString)))
testData.registerAsTable("testData")
sqlc.sql("select length(key) as key_len from testData order by key_len desc
limit 5").collect
res12: Array[org.apache.spark.sql.Row] = Array([3], [2], [2], [2], [2])
HQL:
val hc = new org.apache.spark.sql.hive.HiveContext(sc)
import hc._
hc.hql
hql("select length(grp) from simplex").collect
res14: Array[org.apache.spark.sql.Row] = Array([6], [6], [6], [6])
As far as codebase changes: they have been purposefully made similar to the
ones made for for adding SUBSTR(ING) from July 17:
SQLParser, Optimizer, Expression, stringOperations, and HiveQL were the
main classes changed. The testing suites affected are ConstantFolding and
ExpressionEvaluation.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/javadba/spark strlen
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1586.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1586
----
commit bb252380399c4146bb63b5d6cbc66234609bab11
Author: Stephen Boesch <[email protected]>
Date: 2014-07-12T12:34:58Z
Support hbase-0.96-1.1 in SparkBuild
commit 947007305cb03515daa8738d3ad2063bcd226a3d
Author: Stephen Boesch <[email protected]>
Date: 2014-07-12T12:56:38Z
overwrote sparkbuild
commit 9b6a6471e3c1f087c186a7597c63c7ef2707eaa3
Author: Stephen Boesch <[email protected]>
Date: 2014-07-16T13:24:32Z
update pom.xml for hadoop-2.3-cdh50.0 and hbase 0.96.1.1
commit b04c4cbef3ecb5a6f13297391b55a36317ce957a
Author: Stephen Boesch <[email protected]>
Date: 2014-07-16T13:24:40Z
Merge branch 'master' of https://github.com/apache/spark
commit 5d1cb0a449bbf1ea95272a45f2d030d5cad0195c
Author: Stephen Boesch <[email protected]>
Date: 2014-07-23T04:33:25Z
SPARK-2638 MapOutputTracker concurrency improvement
commit 483479ac8ccb0c937da5d306fc4591aa974ed37b
Author: Stephen Boesch <[email protected]>
Date: 2014-07-23T16:09:26Z
Mesos workaround
commit 30910b2daac974cd2dac82e8a1b20cd60348a632
Author: Stephen Boesch <[email protected]>
Date: 2014-07-23T19:43:59Z
Merge remote-tracking branch 'upstream/master'
commit 7c675f8d8fc63c5f602c5a767e1215118e0f768c
Author: Stephen Boesch <[email protected]>
Date: 2014-07-23T20:03:18Z
Merge branch 'master' of https://github.com/javadba/spark
commit d646a2e1113252d1955185e355da06ddb690b75f
Author: Stephen Boesch <[email protected]>
Date: 2014-07-25T06:26:11Z
SPARK-2686 Add Length support to Spark SQL and HQL and Strlen support to SQL
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---