On Thu, May 14, 2015 at 11:58 AM, Cory Tucker <cory.tuc...@gmail.com> wrote:
> [pg version 9.3 or 9.4] > > Suppose I have a simple table: > > create table data ( > my_value TEXT NOT NULL > ); > CREATE INDEX idx_my_value ON data USING gin(my_value gin_trgm_ops); > > > Now I would like to essentially do group by to get a count of all the > values that are sufficiently similar. I can do it using something like a > CROSS JOIN to join the table on itself, but then I still am getting all the > rows with duplicate counts. > > Is there a way to do a group by query and only return a single "my_value" > column and a count of the number of times other values are similar while > also not returning the included similar values in the output, too? > > Concept below - not bothering to lookup the functions/operators for pg_trgm: SELECT my_value_src, count(*) FROM (SELECT my_value AS my_value_src FROM data) src JOIN (SELECT my_value AS my_value_compareto FROM data) comparedto ON ( func(my_value_src, my_value_compareto) < # ) GROUP BY my_value_src David J.