Re: [HACKERS] new function for tsquery creartion
On 2017-10-13 16:37, Alexey Chernyshov wrote: Hi all, I am extending phrase operator is such way that it will havesyntax that means from n to m words, so I will use such syntax ( ) further. I found that a AROUND(N) b is exactly the same as a <-N,N> b and it can be replaced while parsing. So, what do you think of such idea? In this patch I have noticed some unobvious behavior. Thank you for the interest and review! # select to_tsvector('Hello, cat world!') @@ queryto_tsquery('cat AROUND(1) cat') as match; match --- t cat AROUND(1) cat is the same is "cat <1> cat || cat <0> cat" and: # select to_tsvector('Hello, cat world!') @@ to_tsquery('cat <0> cat'); ?column? --- t It seems to be a proper logic behavior but it is a possible pitfall, maybe it should be documented? It is a tricky question. I think that this interpretation is confusing, so better to make it as <-N, -1> and <1, N>. But more important question is how AROUND() operator should handle stop words? Now it works as: # select queryto_tsquery('cat <2> (a AROUND(10) rat)'); queryto_tsquery -- 'cat' <12> 'rat' (1 row) # select queryto_tsquery('cat <2> a AROUND(10) rat'); queryto_tsquery 'cat' AROUND(12) 'rat' (1 row) In my opinion it should be like: cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-8,12> rat I think that correct version is: cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-2,12> rat. cat <2> a AROUND(10) rat == cat <2,2> a <-10,10> rat = cat <-8, 12> rat It is a problem indeed. I did not catch it during implementation. Thank you for pointing it out. Now operator can be replaced with combination of phrase operator , AROUND(), and logical operators, but with operator it will be much painless. Correct me, please, if I am wrong. I think that operator is more general than around(n) so the last one should be based on yours. However, i think, that taking negative parameters is not the best idea because it is confusing. On top of that it is not so necessary and i think it won`t be popular among users. It seems to me that AROUND operator can be easily implemented with , also, it helps to avoid problems, that you showed above. -- Victor Drobny Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new function for tsquery creartion
Hi all, I am extending phrase operator is such way that it will havesyntax that means from n to m words, so I will use such syntax ( ) further. I found that a AROUND(N) b is exactly the same as a <-N,N> b and it can be replaced while parsing. So, what do you think of such idea? In this patch I have noticed some unobvious behavior. # select to_tsvector('Hello, cat world!') @@ queryto_tsquery('cat AROUND(1) cat') as match; match --- t cat AROUND(1) cat is the same is "cat <1> cat || cat <0> cat" and: # select to_tsvector('Hello, cat world!') @@ to_tsquery('cat <0> cat'); ?column? --- t It seems to be a proper logic behavior but it is a possible pitfall, maybe it should be documented? But more important question is how AROUND() operator should handle stop words? Now it works as: # select queryto_tsquery('cat <2> (a AROUND(10) rat)'); queryto_tsquery -- 'cat' <12> 'rat' (1 row) # select queryto_tsquery('cat <2> a AROUND(10) rat'); queryto_tsquery 'cat' AROUND(12) 'rat' (1 row) In my opinion it should be like: cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-8,12> rat cat <2> a AROUND(10) rat == cat <2,2> a <-10,10> rat = cat <-8, 12> rat Now operator can be replaced with combination of phrase operator , AROUND(), and logical operators, but with operator it will be much painless. Correct me, please, if I am wrong. -- Alexey Chernyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new function for tsquery creartion
On 2017-09-09 06:03, Thomas Munro wrote: Please send a rebased version of the patch for people to review and test as that one has bit-rotted. Hello, Thank you for interest. In the attachment you can find rebased version(based on 69835bc8988812c960f4ed5aeee86b62ac73602a commit) -- Victor Drobny Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 641b3b8..a694801 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -9523,6 +9523,18 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple + queryto_tsquery + + queryto_tsquery( config regconfig , query text) + +tsquery +produce tsquery from google like query +queryto_tsquery('english', 'The Fat Rats') +'fat' 'rat' + + + + querytree querytree(query tsquery) diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index fe630a6..999e4ad 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -797,13 +797,15 @@ UPDATE tt SET ti = PostgreSQL provides the functions to_tsquery, -plainto_tsquery, and -phraseto_tsquery +plainto_tsquery, +phraseto_tsquery and +queryto_tsquery for converting a query to the tsquery data type. to_tsquery offers access to more features than either plainto_tsquery or phraseto_tsquery, but it is less forgiving -about its input. +about its input. queryto_tsquery provides a +different, Google like syntax to create tsquery. @@ -960,8 +962,68 @@ SELECT phraseto_tsquery('english', 'The Fat Rats:C'); - 'fat' - 'rat' - 'c' + + + +queryto_tsquery( config regconfig, querytext text) returns tsquery + + + +queryto_tsquery creates a tsquery from a unformated text. +But instead of plainto_tsquery and phraseto_tsquery it won't +ignore already placed operations. This function supports following operators: + + + + '"some text" - any text inside quote signs will be treated as a phrase and will be +performed like in phraseto_tsquery. + + + + + 'OR' - standard logical operator. It is just an alias for '|'' sign. + + + + + 'terma AROUND(N) termb' - this operation will match if the distance between + terma and termb is less than N. + + + + + '-' - standard logical negation sign. It is an alias for '!' sign. + + + +Other missing operators will be replaced by AND like in plainto_tsquery. + +Examples: + + select queryto_tsquery('The fat rats'); + queryto_tsquery + - + 'fat' & 'rat' + (1 row) + + + select queryto_tsquery('"supernovae stars" AND -crab'); + queryto_tsquery + -- + 'supernova' <-> 'star' & !'crab' +(1 row) + + + select queryto_tsquery('-run AROUND(5) "gnu debugger" OR "I like bananas"'); +queryto_tsquery + --- + !'run' AROUND(5) 'gnu' <-> 'debugg' | 'like' <-> 'banana' + (1 row) + + + diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c index 35d9ab2..e820042 100644 --- a/src/backend/tsearch/to_tsany.c +++ b/src/backend/tsearch/to_tsany.c @@ -390,7 +390,8 @@ add_to_tsvector(void *_state, char *elem_value, int elem_len) * and different variants are ORed together. */ static void -pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval, int16 weight, bool prefix) +pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval, + int16 weight, bool prefix, bool isphrase) { int32 count = 0; ParsedText prs; @@ -423,7 +424,12 @@ pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval, /* put placeholders for each missing stop word */ pushStop(state); if (cntpos) - pushOperator(state, data->qoperator, 1); + { + if (isphrase) + pushOperator(state, OP_PHRASE, 1); + else + pushOperator(state, data->qoperator, 1); + } cntpos++; pos++; } @@ -464,7 +470,10 @@ pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval, if (cntpos) { /* distance may be useful */ -pushOperator(state, data->qoperator, 1); +if (isphrase) + pushOperator(state, OP_PHRASE, 1); +else + pushOperator(state, data->qoperator, 1); } cntpos++; @@ -490,6 +499,7 @@ to_tsquery_byid(PG_FUNCTION_ARGS) query = parse_tsquery(text_to_cstring(in), pushval_morph,
Re: [HACKERS] new function for tsquery creartion
On Thu, Jul 20, 2017 at 4:58 AM, Robert Haaswrote: > On Wed, Jul 19, 2017 at 12:43 PM, Victor Drobny > wrote: >> Let me introduce new function for full text search query creation(which is >> called 'queryto_tsquery'). It takes 'google like' query string and >> translates it to tsquery. > > I haven't looked at the code, but that sounds like a neat idea. +1 This is a very cool feature making tsquery much more accessible. Many people know that sort of defacto search engine query language that many websites accept using quotes, AND, OR, - etc. Calling this search syntax just "query" seems too general and overloaded. "Simple search", "simple query", "web search", "web syntax", "web query", "Google-style query", "Poogle" (kidding!) ... well I'm not sure, but I feel like it deserves a proper name. websearch_to_tsquery()? I see that your AROUND(n) is an undocumented Google search syntax. That's a good trick to know. Please send a rebased version of the patch for people to review and test as that one has bit-rotted. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new function for tsquery creartion
On Wed, Jul 19, 2017 at 12:43 PM, Victor Drobnywrote: > Let me introduce new function for full text search query creation(which is > called 'queryto_tsquery'). It takes 'google like' query string and > translates it to tsquery. I haven't looked at the code, but that sounds like a neat idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] new function for tsquery creartion
Dear all, Now Postgres has a few functions to create tsqueries for full text search. The main one is the to_tsquery function that allows to make query with any operation. But to make correct query all of the operators should be specified explicitly. In order to make it easier postgres has functions like plainto_tsquery and phraseto_tsquery which allow to make tsqueries from strings. But they are not flexible enough. Let me introduce new function for full text search query creation(which is called 'queryto_tsquery'). It takes 'google like' query string and translates it to tsquery. The main features are the following: All the text inside double quotes would be treated as a phrase("a b c" -> 'a <-> b <-> c') New operator AROUND(N). It matches if the distance between words(or maybe phrases) is less than or equal to N. Alias for !('-rat' is the same as '!rat') Alias for |('dog OR cat' is the same as 'dog | cat') As a plainto_tsquery and phraseto_tsquery it will fill operators by itself, but already placed operations won't be ignored. It allows to combine two approaches. In the attachment you can find patch with the new features, tests and documentation for it. What do you think about it? Thank you very much for the attention! -- -- Victor Drobny Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e073f7b..d6fb4ce 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -9494,6 +9494,18 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple + queryto_tsquery + + queryto_tsquery( config regconfig , query text) + +tsquery +produce tsquery from google like query +queryto_tsquery('english', 'The Fat Rats') +'fat' 'rat' + + + + querytree querytree(query tsquery) diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index fe630a6..999e4ad 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -797,13 +797,15 @@ UPDATE tt SET ti = PostgreSQL provides the functions to_tsquery, -plainto_tsquery, and -phraseto_tsquery +plainto_tsquery, +phraseto_tsquery and +queryto_tsquery for converting a query to the tsquery data type. to_tsquery offers access to more features than either plainto_tsquery or phraseto_tsquery, but it is less forgiving -about its input. +about its input. queryto_tsquery provides a +different, Google like syntax to create tsquery. @@ -960,8 +962,68 @@ SELECT phraseto_tsquery('english', 'The Fat Rats:C'); - 'fat' - 'rat' - 'c' + + + +queryto_tsquery( config regconfig, querytext text) returns tsquery + + + +queryto_tsquery creates a tsquery from a unformated text. +But instead of plainto_tsquery and phraseto_tsquery it won't +ignore already placed operations. This function supports following operators: + + + + '"some text" - any text inside quote signs will be treated as a phrase and will be +performed like in phraseto_tsquery. + + + + + 'OR' - standard logical operator. It is just an alias for '|'' sign. + + + + + 'terma AROUND(N) termb' - this operation will match if the distance between + terma and termb is less than N. + + + + + '-' - standard logical negation sign. It is an alias for '!' sign. + + + +Other missing operators will be replaced by AND like in plainto_tsquery. + +Examples: + + select queryto_tsquery('The fat rats'); + queryto_tsquery + - + 'fat' & 'rat' + (1 row) + + + select queryto_tsquery('"supernovae stars" AND -crab'); + queryto_tsquery + -- + 'supernova' <-> 'star' & !'crab' +(1 row) + + + select queryto_tsquery('-run AROUND(5) "gnu debugger" OR "I like bananas"'); +queryto_tsquery + --- + !'run' AROUND(5) 'gnu' <-> 'debugg' | 'like' <-> 'banana' + (1 row) + + + diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c index 18368d1..10fd8c3 100644 --- a/src/backend/tsearch/to_tsany.c +++ b/src/backend/tsearch/to_tsany.c @@ -414,7 +414,8 @@ add_to_tsvector(void *_state, char *elem_value, int elem_len) * and different variants are ORed together. */ static void -pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval, int16 weight, bool prefix) +pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval, +