This post -- 
https://rud.is/b/2018/07/22/new-apache-drill-udf-for-processing-twitter-tweet-text/
 -- introduces a UDF package for Drill with 5 functions for extracting 
[meta]data from Twitter tweet text, including:

- hashtag extraction
- URL extraction
- @-mentions extraction
- reply-to (if it's a reply)
- tweet metadata, including if the tweet is "valid"

Tested with Drill 1.13.0.

Get it via:

- GitLab: https://gitlab.com/hrbrmstr/drill-twitter-text or
- GitHub: https://github.com/hrbrmstr/drill-twitter-text

Hopefully it's useful for some folks and don't hesitate to file issues or PRs 
for problems/new features.

-Bob

Reply via email to