This post -- https://rud.is/b/2018/07/22/new-apache-drill-udf-for-processing-twitter-tweet-text/ -- introduces a UDF package for Drill with 5 functions for extracting [meta]data from Twitter tweet text, including:
- hashtag extraction - URL extraction - @-mentions extraction - reply-to (if it's a reply) - tweet metadata, including if the tweet is "valid" Tested with Drill 1.13.0. Get it via: - GitLab: https://gitlab.com/hrbrmstr/drill-twitter-text or - GitHub: https://github.com/hrbrmstr/drill-twitter-text Hopefully it's useful for some folks and don't hesitate to file issues or PRs for problems/new features. -Bob
