Re: [HACKERS] new function for tsquery creartion

2017-10-13 Thread Victor Drobny

On 2017-10-13 16:37, Alexey Chernyshov wrote:

Hi all,
I am extending phrase operator  is such way that it will have 
syntax that means from n to m words, so I will use such syntax ()
further. I found that a AROUND(N) b is exactly the same as a <-N,N> b
and it can be replaced while parsing. So, what do you think of such
idea? In this patch I have noticed some unobvious behavior.


Thank you for the interest and review!


# select to_tsvector('Hello, cat world!') @@ queryto_tsquery('cat
AROUND(1) cat') as match;
match
---
 t

cat AROUND(1) cat is the same is "cat <1> cat || cat <0> cat" and:

# select to_tsvector('Hello, cat world!') @@ to_tsquery('cat <0> cat');
 ?column?
---
 t

It seems to be a proper logic behavior but it is a possible pitfall,
maybe it should be documented?


It is a tricky question. I think that this interpretation is confusing, 
so

better to make it as <-N, -1> and <1, N>.


But more important question is how AROUND() operator should handle stop
words? Now it works as:

# select queryto_tsquery('cat <2> (a AROUND(10) rat)');
 queryto_tsquery
--
 'cat' <12> 'rat'
(1 row)

# select queryto_tsquery('cat <2> a AROUND(10) rat');
queryto_tsquery

 'cat' AROUND(12) 'rat'
(1 row)

In my opinion it should be like:
cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-8,12>
rat


I think that correct version is:
cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-2,12> 
rat.



cat <2> a AROUND(10) rat == cat <2,2> a <-10,10> rat = cat <-8, 12>
rat


It is a problem indeed. I did not catch it during implementation. Thank 
you

for pointing it out.


Now  operator can be replaced with combination of phrase
operator , AROUND(), and logical operators, but with  operator
it will be much painless. Correct me, please, if I am wrong.


I think that  operator is more general than around(n) so the last 
one
should be based on yours. However, i think, that taking negative 
parameters
is not the best idea because it is confusing. On top of that it is not 
so

necessary and i think it won`t be popular among users.
It seems to me that AROUND operator can be easily implemented with 
,

also, it helps to avoid problems, that you showed above.

--
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new function for tsquery creartion

2017-10-13 Thread Alexey Chernyshov
Hi all,
I am extending phrase operator  is such way that it will have 
syntax that means from n to m words, so I will use such syntax ()
further. I found that a AROUND(N) b is exactly the same as a <-N,N> b
and it can be replaced while parsing. So, what do you think of such
idea? In this patch I have noticed some unobvious behavior.

# select to_tsvector('Hello, cat world!') @@ queryto_tsquery('cat
AROUND(1) cat') as match;
match 
---
 t

cat AROUND(1) cat is the same is "cat <1> cat || cat <0> cat" and:

# select to_tsvector('Hello, cat world!') @@ to_tsquery('cat <0> cat');
 ?column? 
---
 t

It seems to be a proper logic behavior but it is a possible pitfall,
maybe it should be documented?

But more important question is how AROUND() operator should handle stop
words? Now it works as:

# select queryto_tsquery('cat <2> (a AROUND(10) rat)');
 queryto_tsquery  
--
 'cat' <12> 'rat'
(1 row)

# select queryto_tsquery('cat <2> a AROUND(10) rat');
queryto_tsquery 

 'cat' AROUND(12) 'rat'
(1 row)

In my opinion it should be like:
cat <2> (a AROUND(10) rat) == cat <2,2> (a <-10,10> rat) == cat <-8,12>
rat 

cat <2> a AROUND(10) rat == cat <2,2> a <-10,10> rat = cat <-8, 12>
rat

Now  operator can be replaced with combination of phrase
operator , AROUND(), and logical operators, but with  operator
it will be much painless. Correct me, please, if I am wrong.

-- 
Alexey Chernyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new function for tsquery creartion

2017-09-13 Thread Victor Drobny

On 2017-09-09 06:03, Thomas Munro wrote:

Please send a rebased version of the patch for people to review and
test as that one has bit-rotted.

Hello,
Thank you for interest. In the attachment you can find rebased
version(based on 69835bc8988812c960f4ed5aeee86b62ac73602a commit)
--
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 641b3b8..a694801 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9523,6 +9523,18 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple

 
  
+  queryto_tsquery
+
+ queryto_tsquery( config regconfig ,  query text)
+
+tsquery
+produce tsquery from google like query
+queryto_tsquery('english', 'The Fat Rats')
+'fat'  'rat'
+   
+   
+
+ 
   querytree
  
  querytree(query tsquery)
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index fe630a6..999e4ad 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -797,13 +797,15 @@ UPDATE tt SET ti =

 PostgreSQL provides the
 functions to_tsquery,
-plainto_tsquery, and
-phraseto_tsquery
+plainto_tsquery,
+phraseto_tsquery and
+queryto_tsquery
 for converting a query to the tsquery data type.
 to_tsquery offers access to more features
 than either plainto_tsquery or
 phraseto_tsquery, but it is less forgiving
-about its input.
+about its input. queryto_tsquery provides a 
+different, Google like syntax to create tsquery.

 

@@ -960,8 +962,68 @@ SELECT phraseto_tsquery('english', 'The Fat  Rats:C');
 -
  'fat' - 'rat' - 'c'
 
+
+
+
+queryto_tsquery( config regconfig,  querytext text) returns tsquery
+
+
+   
+queryto_tsquery creates a tsquery from a unformated text.
+But instead of plainto_tsquery and phraseto_tsquery it won't
+ignore already placed operations. This function supports following operators:
+
+ 
+  
+   '"some text" - any text inside quote signs will be treated as a phrase and will be
+performed like in phraseto_tsquery.
+  
+ 
+ 
+  
+   'OR' - standard logical operator. It is just an alias for '|'' sign.
+  
+ 
+ 
+  
+   'terma AROUND(N) termb' - this operation will match if the distance between 
+   terma and termb is less than N.
+  
+ 
+ 
+  
+   '-' - standard logical negation sign. It is an alias for '!' sign.
+  
+ 
+
+Other missing operators will be replaced by AND like in plainto_tsquery.

 
+   
+Examples:
+
+  select queryto_tsquery('The fat rats');
+   queryto_tsquery 
+  -
+   'fat' & 'rat'
+  (1 row)
+
+
+  select queryto_tsquery('"supernovae stars" AND -crab');
+ queryto_tsquery  
+  --
+   'supernova' <-> 'star' & !'crab'
+(1 row)
+
+
+  select queryto_tsquery('-run AROUND(5) "gnu debugger" OR "I like bananas"');
+queryto_tsquery  
+  ---
+   !'run' AROUND(5) 'gnu' <-> 'debugg' | 'like' <-> 'banana'
+  (1 row)
+
+
+
   
 
   
diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index 35d9ab2..e820042 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -390,7 +390,8 @@ add_to_tsvector(void *_state, char *elem_value, int elem_len)
  * and different variants are ORed together.
  */
 static void
-pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval, int16 weight, bool prefix)
+pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval,
+			int16 weight, bool prefix, bool isphrase)
 {
 	int32		count = 0;
 	ParsedText	prs;
@@ -423,7 +424,12 @@ pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval,
 	/* put placeholders for each missing stop word */
 	pushStop(state);
 	if (cntpos)
-		pushOperator(state, data->qoperator, 1);
+	{
+		if (isphrase)
+			pushOperator(state, OP_PHRASE, 1);
+		else
+			pushOperator(state, data->qoperator, 1);
+	}
 	cntpos++;
 	pos++;
 }
@@ -464,7 +470,10 @@ pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval,
 			if (cntpos)
 			{
 /* distance may be useful */
-pushOperator(state, data->qoperator, 1);
+if (isphrase)
+	pushOperator(state, OP_PHRASE, 1);
+else
+	pushOperator(state, data->qoperator, 1);
 			}
 
 			cntpos++;
@@ -490,6 +499,7 @@ to_tsquery_byid(PG_FUNCTION_ARGS)
 	query = parse_tsquery(text_to_cstring(in),
 		  pushval_morph,
 		  

Re: [HACKERS] new function for tsquery creartion

2017-09-08 Thread Thomas Munro
On Thu, Jul 20, 2017 at 4:58 AM, Robert Haas  wrote:
> On Wed, Jul 19, 2017 at 12:43 PM, Victor Drobny  
> wrote:
>> Let me introduce new function for full text search query creation(which is
>> called 'queryto_tsquery'). It takes 'google like' query string and
>> translates it to tsquery.
>
> I haven't looked at the code, but that sounds like a neat idea.

+1

This is a very cool feature making tsquery much more accessible.  Many
people know that sort of defacto search engine query language that
many websites accept using quotes, AND, OR, - etc.

Calling this search syntax just "query" seems too general and
overloaded.  "Simple search", "simple query", "web search", "web
syntax", "web query", "Google-style query", "Poogle" (kidding!) ...
well I'm not sure, but I feel like it deserves a proper name.
websearch_to_tsquery()?

I see that your AROUND(n) is an undocumented Google search syntax.
That's a good trick to know.

Please send a rebased version of the patch for people to review and
test as that one has bit-rotted.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new function for tsquery creartion

2017-07-19 Thread Robert Haas
On Wed, Jul 19, 2017 at 12:43 PM, Victor Drobny  wrote:
> Let me introduce new function for full text search query creation(which is
> called 'queryto_tsquery'). It takes 'google like' query string and
> translates it to tsquery.

I haven't looked at the code, but that sounds like a neat idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] new function for tsquery creartion

2017-07-19 Thread Victor Drobny

Dear all,

Now Postgres has a few functions to create tsqueries for full text 
search. The main one is the to_tsquery function that allows to make 
query with any operation. But to make correct query all of the operators 
should be specified explicitly. In order to make it easier postgres has 
functions like plainto_tsquery and phraseto_tsquery which allow to make 
tsqueries from strings. But they are not flexible enough.


Let me introduce new function for full text search query creation(which 
is called 'queryto_tsquery'). It takes 'google like' query string and 
translates it to tsquery.

The main features are the following:
All the text inside double quotes would be treated as a phrase("a b c" 
-> 'a <-> b  <-> c')
New operator AROUND(N). It matches if the distance between words(or 
maybe phrases) is less than or equal to N.

Alias for !('-rat' is the same as '!rat')
Alias for |('dog OR cat' is the same as 'dog | cat')

As a plainto_tsquery and phraseto_tsquery it will fill operators by 
itself, but already placed operations won't be ignored. It allows to 
combine two approaches.


In the attachment you can find patch with the new features, tests and 
documentation for it.

What do you think about it?

Thank you very much for the attention!

--
--
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e073f7b..d6fb4ce 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9494,6 +9494,18 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple

 
  
+  queryto_tsquery
+
+ queryto_tsquery( config regconfig ,  query text)
+
+tsquery
+produce tsquery from google like query
+queryto_tsquery('english', 'The Fat Rats')
+'fat'  'rat'
+   
+   
+
+ 
   querytree
  
  querytree(query tsquery)
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index fe630a6..999e4ad 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -797,13 +797,15 @@ UPDATE tt SET ti =

 PostgreSQL provides the
 functions to_tsquery,
-plainto_tsquery, and
-phraseto_tsquery
+plainto_tsquery,
+phraseto_tsquery and
+queryto_tsquery
 for converting a query to the tsquery data type.
 to_tsquery offers access to more features
 than either plainto_tsquery or
 phraseto_tsquery, but it is less forgiving
-about its input.
+about its input. queryto_tsquery provides a 
+different, Google like syntax to create tsquery.

 

@@ -960,8 +962,68 @@ SELECT phraseto_tsquery('english', 'The Fat  Rats:C');
 -
  'fat' - 'rat' - 'c'
 
+
+
+
+queryto_tsquery( config regconfig,  querytext text) returns tsquery
+
+
+   
+queryto_tsquery creates a tsquery from a unformated text.
+But instead of plainto_tsquery and phraseto_tsquery it won't
+ignore already placed operations. This function supports following operators:
+
+ 
+  
+   '"some text" - any text inside quote signs will be treated as a phrase and will be
+performed like in phraseto_tsquery.
+  
+ 
+ 
+  
+   'OR' - standard logical operator. It is just an alias for '|'' sign.
+  
+ 
+ 
+  
+   'terma AROUND(N) termb' - this operation will match if the distance between 
+   terma and termb is less than N.
+  
+ 
+ 
+  
+   '-' - standard logical negation sign. It is an alias for '!' sign.
+  
+ 
+
+Other missing operators will be replaced by AND like in plainto_tsquery.

 
+   
+Examples:
+
+  select queryto_tsquery('The fat rats');
+   queryto_tsquery 
+  -
+   'fat' & 'rat'
+  (1 row)
+
+
+  select queryto_tsquery('"supernovae stars" AND -crab');
+ queryto_tsquery  
+  --
+   'supernova' <-> 'star' & !'crab'
+(1 row)
+
+
+  select queryto_tsquery('-run AROUND(5) "gnu debugger" OR "I like bananas"');
+queryto_tsquery  
+  ---
+   !'run' AROUND(5) 'gnu' <-> 'debugg' | 'like' <-> 'banana'
+  (1 row)
+
+
+
   
 
   
diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index 18368d1..10fd8c3 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -414,7 +414,8 @@ add_to_tsvector(void *_state, char *elem_value, int elem_len)
  * and different variants are ORed together.
  */
 static void
-pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval, int16 weight, bool prefix)
+pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval,
+