[
https://issues.apache.org/jira/browse/DRILL-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983103#comment-14983103
]
ASF GitHub Bot commented on DRILL-3747:
---
GitHub user k255 opened a pull request:
https://github.com/apache/drill/pull/224
DRILL-3747: basic similarity search with simmetric
Helps handling i.e. typos in search queries with popular algorithms like
levenshtein.
Sample query:
```
select levenshtein('foo', 'boo') from (VALUES(1)); //gives 0.67
```
and
```
select levenshtein('foo', 'bar') from (VALUES(1)); //not similar - gives 0
```
More:
https://github.com/k255/drill-fuzzy-search
https://en.wikipedia.org/wiki/Levenshtein_distance
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/k255/drill drill-fuzzysearch
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/224.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #224
commit 51248358adf7ee71a744cccb7a22b45850f192a8
Author: potocki
Date: 2015-10-30T18:54:41Z
basic similarity search with simmetric
> UDF for "fuzzy" string and similarity matching
> --
>
> Key: DRILL-3747
> URL: https://issues.apache.org/jira/browse/DRILL-3747
> Project: Apache Drill
> Issue Type: New Feature
> Components: Functions - Drill
>Affects Versions: Future
>Reporter: Edmon Begoli
>Priority: Minor
> Labels: features
> Fix For: Future
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> I propose implementation of string/distance or distance matching functions
> similar to what one finds in most of other databases - soundex, metaphone,
> levenshtein (and more advanced variants such as levenshtein-damerau,
> jaro-winkler, etc.).
> See fuzzystrmatch
> http://www.postgresql.org/docs/9.5/static/fuzzystrmatch.html,
> and pg_similarity http://pgsimilarity.projects.pgfoundry.org/
> for inspiration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)