GitHub user janewangfb opened a pull request:
https://github.com/apache/spark/pull/18023
Fix SPARK-12139: REGEX Column Specification for Hive Queries
## What changes were proposed in this pull request?
Hive interprets regular expression, e.g., `(a)?+.+` in query specification.
This PR enables spark to support this feature when
hive.support.quoted.identifiers is set to true.
## How was this patch tested?
- Add unittests in SQLQuerySuite.scala
- Iin spark-shell tested the original failed query:
scala> hc.sql("SELECT `(appid|ds|host|instance|offset|ts)?+.+`,
IF(FB_IS_VALID_HIVE_PARTITION_VALUE(appid), appid, 'BAD_APPID'),
IF(FB_IS_VALID_HIVE_PARTITION_VALUE(ts), ts, 'BAD_TS') FROM
time_spent_bit_array_mobile_current WHERE ds='2017-05-14' AND
instance='cc_deterministic_loader' AND ts='2017-05-14+15:00:99' limit
100").collect.foreach(println)
result:
[1.4947744605006E9,Map(delta -> 803, ip -> 84.16.234.63, ig_id ->
1928710114, hces_extra ->
{"radio_type":"wifi-none","auth_flag":"unable_to_verify"}),0.0,1494774434,1.494774459676E9,WrappedArray(517867,
0),26,0,lncny1,e46e8616-9763-475a-b80f-a46094b263a6,9,188,10.20.0,4C0175EC-B421-4676-ACFF-8E1E353D53E5,,57944460,null,6f72336f74c9f85c6e1b7b16c64e9dec,,567067343352427,2017-05-14+15:00:99]
....
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/janewangfb/spark support_select_regex
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18023.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18023
----
commit af55afd8d6839e38337f67e19a614ea3eae9a2cf
Author: Jane Wang <[email protected]>
Date: 2017-05-18T00:21:14Z
Fix SPARK-12139: REGEX Column Specification for Hive Queries
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]