Re: json(b)_to_tsvector with numeric values

2018-04-07 Thread Teodor Sigaev

Thank you, pushed with some editorization

Dmitry Dolgov wrote:

On 7 April 2018 at 17:09, Teodor Sigaev  wrote:

See workable sketch for parsing jsonb flags and new worker variant.



Yep, thanks for the sketch. Here is the new version of patch, does it look
close to what you have in mind?



Patch looks good except error messaging, you took it directly from sketch
where I didn't spend time for it. Please, improve. elog() should be used
only for impossible error, whereas user input could contins mistakes.


I assume what you mean is that for user input errors we need to use ereport.
Indeed, thanks for noticing. I've replaced all elog except the last one, since
it actually describes an impossible situation, when we started to read an
array, but ended up having something else instead WJB_END_ARRAY.



--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/



Re: json(b)_to_tsvector with numeric values

2018-04-07 Thread Dmitry Dolgov
> On 7 April 2018 at 17:09, Teodor Sigaev  wrote:
>>> See workable sketch for parsing jsonb flags and new worker variant.
>>
>>
>> Yep, thanks for the sketch. Here is the new version of patch, does it look
>> close to what you have in mind?
>
>
> Patch looks good except error messaging, you took it directly from sketch
> where I didn't spend time for it. Please, improve. elog() should be used
> only for impossible error, whereas user input could contins mistakes.

I assume what you mean is that for user input errors we need to use ereport.
Indeed, thanks for noticing. I've replaced all elog except the last one, since
it actually describes an impossible situation, when we started to read an
array, but ended up having something else instead WJB_END_ARRAY.


jsonb_to_tsvector_numeric_v5.patch
Description: Binary data


Re: json(b)_to_tsvector with numeric values

2018-04-07 Thread Teodor Sigaev

See workable sketch for parsing jsonb flags and new worker variant.


Yep, thanks for the sketch. Here is the new version of patch, does it look
close to what you have in mind?


Patch looks good except error messaging, you took it directly from 
sketch where I didn't spend time for it. Please, improve. elog() should 
be used only for impossible error, whereas user input could contins 
mistakes.



--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/



Re: json(b)_to_tsvector with numeric values

2018-04-07 Thread Dmitry Dolgov
> On 6 April 2018 at 18:55, Teodor Sigaev  wrote:
>
>
> Dmitry Dolgov wrote:
>>>
>>> On 6 April 2018 at 16:25, Teodor Sigaev  wrote:
>>> 1) I don't like jsonb_all_to_tsvector too.. What if we will accept new
>>> variant to index? Let me suggest:
>>>
>>> tsvector jsonb_to_tsvector([regclass,] jsonb, text[])
>>>
>>> where text[] arg is actually a flags, array contains any combination of
>>> literals 'numeric', 'string', 'boolean' (and even 'key' to index keys_ to
>>> point which types should be indexed. More than it, may be, it should a
>>> jsonb
>>> type for possible improvements in future. For now, it shouldbe a jsonb
>>> array
>>> type with string elements described above, example:
>>>
>>> select jsonb_to_tsvector('{"a": "aaa in bbb ddd ccc", "b":123}',
>>>  '["numeric", "boolean"]');
>>>
>>>
>>> Form jsonb_to_tsvector('...', '["string"]) is effectively the same as
>>> current to_tsvector(jsonb)
>>
>>
>> Thank you for the suggestion, this sounds appealing. But I have two
>> questions:
>>
>> * why it should be a jsonb array, not a regular array?
>
> To simplify extension of this array in future, for example to add limitation
> of recursion level in converted jsonb, etc.
>
>
>>
>> * it would introduce the idea of jsonb element type expressed in text
>> format,
>>so "string", "numeric", "boolean" etc - are there any consequences of
>> that?
>>As far as I understand so far there was only jsonb_typeof.
>
> Good catch, jsonb_typeof strings are okay: "string", "number", "boolean"
> also  "all", "key", "value"
>
> See workable sketch for parsing jsonb flags and new worker variant.

Yep, thanks for the sketch. Here is the new version of patch, does it look
close to what you have in mind?


jsonb_to_tsvector_numeric_v4.patch
Description: Binary data


Re: json(b)_to_tsvector with numeric values

2018-04-06 Thread Teodor Sigaev



Dmitry Dolgov wrote:

On 6 April 2018 at 16:25, Teodor Sigaev  wrote:
1) I don't like jsonb_all_to_tsvector too.. What if we will accept new
variant to index? Let me suggest:

tsvector jsonb_to_tsvector([regclass,] jsonb, text[])

where text[] arg is actually a flags, array contains any combination of
literals 'numeric', 'string', 'boolean' (and even 'key' to index keys_ to
point which types should be indexed. More than it, may be, it should a jsonb
type for possible improvements in future. For now, it shouldbe a jsonb array
type with string elements described above, example:

select jsonb_to_tsvector('{"a": "aaa in bbb ddd ccc", "b":123}',
 '["numeric", "boolean"]');


Form jsonb_to_tsvector('...', '["string"]) is effectively the same as
current to_tsvector(jsonb)


Thank you for the suggestion, this sounds appealing. But I have two questions:

* why it should be a jsonb array, not a regular array?
To simplify extension of this array in future, for example to add limitation of 
recursion level in converted jsonb, etc.





* it would introduce the idea of jsonb element type expressed in text format,
   so "string", "numeric", "boolean" etc - are there any consequences of that?
   As far as I understand so far there was only jsonb_typeof.

Good catch, jsonb_typeof strings are okay: "string", "number", "boolean"
also  "all", "key", "value"

See workable sketch for parsing jsonb flags and new worker variant.




2) Now it fails, and I see something strange in resuling tsvector


Oh, sorry, stupid copy-paste mistake in the condition. Just for the records,
I've attached fixed version of the previous patch (without any changes about an
array instead of a boolean flag).



by the way:
Datum
jsonb_to_tsvector(PG_FUNCTION_ARGS)
{
Jsonb  *jb = PG_GETARG_JSONB_P(0);
...
PG_FREE_IF_COPY(jb, 1); //wrong arg number
}

jsonb_all_to_tsvector and json variants too




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
typedef enum JsonToIndex {
	jtiKey		= 0x01,
	jtiString	= 0x02,
	jtiNumeric	= 0x04,
	jtiBool		= 0x08,
	jtiAll		= jtiKey | jtiString | jtiNumeric | jtiBool
} JsonToIndex;

static uint32
parse_jsonb_index_flags(Jsonb *jb)
{
	JsonbIterator *it;
	JsonbValue  v;
	JsonbIteratorToken type;
	uint32flags = 0;

	it = JsonbIteratorInit(&jb->root);

	type = JsonbIteratorNext(&it, &v, false);

	if (type != WJB_BEGIN_ARRAY)
		elog(ERROR, "wrong flag type");

	while ((type = JsonbIteratorNext(&it, &v, false)) == WJB_ELEM)
	{
		if (v.type != jbvString)
			elog(ERROR, "text is only accepted");

		if (v.val.string.len == 3 &&
 pg_strncasecmp(v.val.string.val, "all", 3) == 0)
			flags |= jtiAll;
		else if (v.val.string.len == 3 &&
 pg_strncasecmp(v.val.string.val, "key", 3) == 0)
			flags |= jtiKey;
		else if (v.val.string.len == 6 &&
 pg_strncasecmp(v.val.string.val, "string", 5) == 0)
			flags |= jtiString;
		else if (v.val.string.len == 7 &&
 pg_strncasecmp(v.val.string.val, "numeric", 7) == 0)
			flags |= jtiNumeric;
		else if (v.val.string.len == 7 &&
 pg_strncasecmp(v.val.string.val, "boolean", 7) == 0)
			flags |= jtiBool;
		else
			elog(ERROR, "string, numeric, boolean, key and all are only accepted");
	}

	if (type != WJB_END_ARRAY)
		elog(ERROR, "wrong flag type");

	JsonbIteratorNext(&it, &v, false);

	return flags;
}


/*
 * Worker function for jsonb(_all)_to_tsvector(_byid)
 */
static TSVector
jsonb_to_tsvector_worker(Oid cfgId, Jsonb *jb, uint32 flags)
{
	TSVectorBuildState state;
	ParsedText	prs;
	JsonbIterator *it;
	JsonbValue  v;
	JsonbIteratorToken type;

	prs.words = NULL;
	prs.curwords = 0;
	state.prs = &prs;
	state.cfgId = cfgId;


	it = JsonbIteratorInit(&jb->root);

	type = JsonbIteratorNext(&it, &v, false);

	while ((type = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
	{
		if (type == WJB_KEY && (flags & jtiKey))
			add_to_tsvector(&state, v.val.string.val, v.val.string.len);
		else if (type == WJB_VALUE || type == WJB_ELEM)
		{
			switch(v.type)
			{
case jbvString:
	if (flags & jtiString)
		add_to_tsvector(&state, v.val.string.val,
		v.val.string.len);
	break;
case jbvNumeric:
	if (flags & jtiNumeric)
	{
		char *val;

		val = DatumGetCString(DirectFunctionCall1(numeric_out,
			  NumericGetDatum(v.val.numeric)));

		add_to_tsvector(&state, val, strlen(val));
	}
	break;
case jbvBool:
	if (flags & jtiBool)
	{
		if (v.val.boolean)
			add_to_tsvector(&state, pstrdup("true"), 4);
		else
			add_to_tsvector(&state, pstrdup("false"), 5);
	}
	break;
default:
	break;
			}
		}
	}

	return make_tsvector(&prs);
}


Re: json(b)_to_tsvector with numeric values

2018-04-06 Thread Dmitry Dolgov
> On 6 April 2018 at 16:25, Teodor Sigaev  wrote:
> 1) I don't like jsonb_all_to_tsvector too.. What if we will accept new
> variant to index? Let me suggest:
>
> tsvector jsonb_to_tsvector([regclass,] jsonb, text[])
>
> where text[] arg is actually a flags, array contains any combination of
> literals 'numeric', 'string', 'boolean' (and even 'key' to index keys_ to
> point which types should be indexed. More than it, may be, it should a jsonb
> type for possible improvements in future. For now, it shouldbe a jsonb array
> type with string elements described above, example:
>
> select jsonb_to_tsvector('{"a": "aaa in bbb ddd ccc", "b":123}',
> '["numeric", "boolean"]');
>
>
> Form jsonb_to_tsvector('...', '["string"]) is effectively the same as
> current to_tsvector(jsonb)

Thank you for the suggestion, this sounds appealing. But I have two questions:

* why it should be a jsonb array, not a regular array?

* it would introduce the idea of jsonb element type expressed in text format,
  so "string", "numeric", "boolean" etc - are there any consequences of that?
  As far as I understand so far there was only jsonb_typeof.

> 2) Now it fails, and I see something strange in resuling tsvector

Oh, sorry, stupid copy-paste mistake in the condition. Just for the records,
I've attached fixed version of the previous patch (without any changes about an
array instead of a boolean flag).
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5abb1c4..895b60a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9696,6 +9696,18 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple


 
+ json(b)_all_to_tsvector( config regconfig ,  document json(b))
+
+tsvector
+
+  reduce each string, numeric or boolean value in the document to a tsvector,
+  and then concatenate those in document order to produce a single tsvector
+
+json_all_to_tsvector('english', '{"a": "The Fat Rats", "b": 123}'::json)
+'123':5 'fat':2 'rat':3
+   
+   
+
  
   ts_delete
  
diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index ea5947a..02c2b00 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -267,12 +267,12 @@ to_tsvector(PG_FUNCTION_ARGS)
 		PointerGetDatum(in)));
 }
 
-Datum
-jsonb_to_tsvector_byid(PG_FUNCTION_ARGS)
+/*
+ * Worker function for jsonb(_all)_to_tsvector(_byid)
+ */
+static TSVector
+jsonb_to_tsvector_worker(Oid cfgId, Jsonb *jb, bool allTypes)
 {
-	Oid			cfgId = PG_GETARG_OID(0);
-	Jsonb	   *jb = PG_GETARG_JSONB_P(1);
-	TSVector	result;
 	TSVectorBuildState state;
 	ParsedText	prs;
 
@@ -281,11 +281,24 @@ jsonb_to_tsvector_byid(PG_FUNCTION_ARGS)
 	state.prs = &prs;
 	state.cfgId = cfgId;
 
-	iterate_jsonb_string_values(jb, &state, add_to_tsvector);
+	if (allTypes)
+		iterate_jsonb_all_values(jb, &state, add_to_tsvector);
+	else
+		iterate_jsonb_string_values(jb, &state, add_to_tsvector);
 
-	PG_FREE_IF_COPY(jb, 1);
 
-	result = make_tsvector(&prs);
+	return make_tsvector(&prs);
+}
+
+Datum
+jsonb_to_tsvector_byid(PG_FUNCTION_ARGS)
+{
+	Oid			cfgId = PG_GETARG_OID(0);
+	Jsonb	   *jb = PG_GETARG_JSONB_P(1);
+	TSVector	result;
+
+	result = jsonb_to_tsvector_worker(cfgId, jb, false);
+	PG_FREE_IF_COPY(jb, 1);
 
 	PG_RETURN_TSVECTOR(result);
 }
@@ -295,19 +308,48 @@ jsonb_to_tsvector(PG_FUNCTION_ARGS)
 {
 	Jsonb	   *jb = PG_GETARG_JSONB_P(0);
 	Oid			cfgId;
+	TSVector	result;
 
 	cfgId = getTSCurrentConfig(true);
-	PG_RETURN_DATUM(DirectFunctionCall2(jsonb_to_tsvector_byid,
-		ObjectIdGetDatum(cfgId),
-		JsonbPGetDatum(jb)));
+	result = jsonb_to_tsvector_worker(cfgId, jb, false);
+	PG_FREE_IF_COPY(jb, 1);
+
+	PG_RETURN_TSVECTOR(result);
 }
 
 Datum
-json_to_tsvector_byid(PG_FUNCTION_ARGS)
+jsonb_all_to_tsvector_byid(PG_FUNCTION_ARGS)
 {
 	Oid			cfgId = PG_GETARG_OID(0);
-	text	   *json = PG_GETARG_TEXT_P(1);
+	Jsonb	   *jb = PG_GETARG_JSONB_P(1);
 	TSVector	result;
+
+	result = jsonb_to_tsvector_worker(cfgId, jb, true);
+	PG_FREE_IF_COPY(jb, 1);
+
+	PG_RETURN_TSVECTOR(result);
+}
+
+Datum
+jsonb_all_to_tsvector(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *jb = PG_GETARG_JSONB_P(0);
+	Oid			cfgId;
+	TSVector	result;
+
+	cfgId = getTSCurrentConfig(true);
+	result = jsonb_to_tsvector_worker(cfgId, jb, true);
+	PG_FREE_IF_COPY(jb, 1);
+
+	PG_RETURN_TSVECTOR(result);
+}
+
+/*
+ * Worker function for json(_all)_to_tsvector(_byid)
+ */
+static TSVector
+json_to_tsvector_worker(Oid cfgId, text *json, bool allTypes)
+{
 	TSVectorBuildState state;
 	ParsedText	prs;
 
@@ -316,11 +358,20 @@ json_to_tsvector_byid(PG_FUNCTION_ARGS)
 	state.prs = &prs;
 	state.cfgId = cfgId;
 
-	iterate_json_string_values(json, &state, add_to_tsvector);
+	iterate_json_values(json, allTypes, &state, add_to_tsvector);
 
-	PG_FREE_IF_COPY(json, 1);
+	return make_tsvector(&prs)

Re: json(b)_to_tsvector with numeric values

2018-04-06 Thread Teodor Sigaev
1) I don't like jsonb_all_to_tsvector too.. What if we will accept new variant 
to index? Let me suggest:


tsvector jsonb_to_tsvector([regclass,] jsonb, text[])

where text[] arg is actually a flags, array contains any combination of literals 
'numeric', 'string', 'boolean' (and even 'key' to index keys_ to point which 
types should be indexed. More than it, may be, it should a jsonb type for 
possible improvements in future. For now, it shouldbe a jsonb array type with 
string elements described above, example:


select jsonb_to_tsvector('{"a": "aaa in bbb ddd ccc", "b":123}',
'["numeric", "boolean"]');


Form jsonb_to_tsvector('...', '["string"]) is effectively the same as current 
to_tsvector(jsonb)


2)
Now it fails, and I see something strange in resuling tsvector: 'true':9,13 and
'fals':9,13 - I don't see any bool keys in input json.

% more /home/teodor/pgsql/src/test/regress/regression.diffs
*** /home/teodor/pgsql/src/test/regress/expected/jsonb.out  2018-04-06 
16:34:59.424481000 +0300
--- /home/teodor/pgsql/src/test/regress/results/jsonb.out   2018-04-06 
16:36:48.095411000 +0300

***
*** 4132,4138 
  select jsonb_all_to_tsvector('english', '{"a": "aaa in bbb ddd ccc", "b": 
123, "c": 456}'::jsonb);

  jsonb_all_to_tsvector
  --
!  '123':7 '456':11 'aaa':1 'bbb':3 'ccc':5 'ddd':4 'true':9,13
  (1 row)

  -- ts_vector corner cases
--- 4132,4138 
  select jsonb_all_to_tsvector('english', '{"a": "aaa in bbb ddd ccc", "b": 
123, "c": 456}'::jsonb);

  jsonb_all_to_tsvector
  --
!  '123':7 '456':11 'aaa':1 'bbb':3 'ccc':5 'ddd':4 'fals':9,13
  (1 row)

  -- ts_vector corner cases


Dmitry Dolgov wrote:

On 4 April 2018 at 16:09, Teodor Sigaev  wrote:


Hm, seems, it's useful feature, but I suggest to make separate function
jsonb_any_to_tsvector and add support for boolean too (if you know better
name for function, do not hide it). Changing behavior of existing
function
is not obvious for users and, seems, should not backpatched.



What do you think about having not a separate function, but a flag
argument to
the existing one (like `create` in `jsonb_set`), that will have false as
default value? The result would be the same, but without an extra function
with
almost the same implementation.



tsvector jsonb_to_tsvector(jsonb[, bool]) ?
Agreed. Second arg should be optional.


Unfortunately, this idea with a flag argument can't be implemented easily
(related discussion is here [1]). So I've modified the patch accordingly to
your original suggestion about having separate functions
`json(b)_all_to_tsvector`.

1: 
https://www.postgresql.org/message-id/flat/CA%2Bq6zcVJ%2BWx%2B-%3DkkN5UC0T-LtsJWnx0g9S0xSnn3jUWkriufDA%40mail.gmail.com



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: json(b)_to_tsvector with numeric values

2018-04-05 Thread Dmitry Dolgov
> On 4 April 2018 at 16:09, Teodor Sigaev  wrote:
>
>>> Hm, seems, it's useful feature, but I suggest to make separate function
>>> jsonb_any_to_tsvector and add support for boolean too (if you know better
>>> name for function, do not hide it). Changing behavior of existing
>>> function
>>> is not obvious for users and, seems, should not backpatched.
>>
>>
>> What do you think about having not a separate function, but a flag
>> argument to
>> the existing one (like `create` in `jsonb_set`), that will have false as
>> default value? The result would be the same, but without an extra function
>> with
>> almost the same implementation.
>
>
> tsvector jsonb_to_tsvector(jsonb[, bool]) ?
> Agreed. Second arg should be optional.

Unfortunately, this idea with a flag argument can't be implemented easily
(related discussion is here [1]). So I've modified the patch accordingly to
your original suggestion about having separate functions
`json(b)_all_to_tsvector`.

1: 
https://www.postgresql.org/message-id/flat/CA%2Bq6zcVJ%2BWx%2B-%3DkkN5UC0T-LtsJWnx0g9S0xSnn3jUWkriufDA%40mail.gmail.com
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5abb1c4..895b60a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9696,6 +9696,18 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple


 
+ json(b)_all_to_tsvector( config regconfig ,  document json(b))
+
+tsvector
+
+  reduce each string, numeric or boolean value in the document to a tsvector,
+  and then concatenate those in document order to produce a single tsvector
+
+json_all_to_tsvector('english', '{"a": "The Fat Rats", "b": 123}'::json)
+'123':5 'fat':2 'rat':3
+   
+   
+
  
   ts_delete
  
diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index ea5947a..02c2b00 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -267,12 +267,12 @@ to_tsvector(PG_FUNCTION_ARGS)
 		PointerGetDatum(in)));
 }
 
-Datum
-jsonb_to_tsvector_byid(PG_FUNCTION_ARGS)
+/*
+ * Worker function for jsonb(_all)_to_tsvector(_byid)
+ */
+static TSVector
+jsonb_to_tsvector_worker(Oid cfgId, Jsonb *jb, bool allTypes)
 {
-	Oid			cfgId = PG_GETARG_OID(0);
-	Jsonb	   *jb = PG_GETARG_JSONB_P(1);
-	TSVector	result;
 	TSVectorBuildState state;
 	ParsedText	prs;
 
@@ -281,11 +281,24 @@ jsonb_to_tsvector_byid(PG_FUNCTION_ARGS)
 	state.prs = &prs;
 	state.cfgId = cfgId;
 
-	iterate_jsonb_string_values(jb, &state, add_to_tsvector);
+	if (allTypes)
+		iterate_jsonb_all_values(jb, &state, add_to_tsvector);
+	else
+		iterate_jsonb_string_values(jb, &state, add_to_tsvector);
 
-	PG_FREE_IF_COPY(jb, 1);
 
-	result = make_tsvector(&prs);
+	return make_tsvector(&prs);
+}
+
+Datum
+jsonb_to_tsvector_byid(PG_FUNCTION_ARGS)
+{
+	Oid			cfgId = PG_GETARG_OID(0);
+	Jsonb	   *jb = PG_GETARG_JSONB_P(1);
+	TSVector	result;
+
+	result = jsonb_to_tsvector_worker(cfgId, jb, false);
+	PG_FREE_IF_COPY(jb, 1);
 
 	PG_RETURN_TSVECTOR(result);
 }
@@ -295,19 +308,48 @@ jsonb_to_tsvector(PG_FUNCTION_ARGS)
 {
 	Jsonb	   *jb = PG_GETARG_JSONB_P(0);
 	Oid			cfgId;
+	TSVector	result;
 
 	cfgId = getTSCurrentConfig(true);
-	PG_RETURN_DATUM(DirectFunctionCall2(jsonb_to_tsvector_byid,
-		ObjectIdGetDatum(cfgId),
-		JsonbPGetDatum(jb)));
+	result = jsonb_to_tsvector_worker(cfgId, jb, false);
+	PG_FREE_IF_COPY(jb, 1);
+
+	PG_RETURN_TSVECTOR(result);
 }
 
 Datum
-json_to_tsvector_byid(PG_FUNCTION_ARGS)
+jsonb_all_to_tsvector_byid(PG_FUNCTION_ARGS)
 {
 	Oid			cfgId = PG_GETARG_OID(0);
-	text	   *json = PG_GETARG_TEXT_P(1);
+	Jsonb	   *jb = PG_GETARG_JSONB_P(1);
 	TSVector	result;
+
+	result = jsonb_to_tsvector_worker(cfgId, jb, true);
+	PG_FREE_IF_COPY(jb, 1);
+
+	PG_RETURN_TSVECTOR(result);
+}
+
+Datum
+jsonb_all_to_tsvector(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *jb = PG_GETARG_JSONB_P(0);
+	Oid			cfgId;
+	TSVector	result;
+
+	cfgId = getTSCurrentConfig(true);
+	result = jsonb_to_tsvector_worker(cfgId, jb, true);
+	PG_FREE_IF_COPY(jb, 1);
+
+	PG_RETURN_TSVECTOR(result);
+}
+
+/*
+ * Worker function for json(_all)_to_tsvector(_byid)
+ */
+static TSVector
+json_to_tsvector_worker(Oid cfgId, text *json, bool allTypes)
+{
 	TSVectorBuildState state;
 	ParsedText	prs;
 
@@ -316,11 +358,20 @@ json_to_tsvector_byid(PG_FUNCTION_ARGS)
 	state.prs = &prs;
 	state.cfgId = cfgId;
 
-	iterate_json_string_values(json, &state, add_to_tsvector);
+	iterate_json_values(json, allTypes, &state, add_to_tsvector);
 
-	PG_FREE_IF_COPY(json, 1);
+	return make_tsvector(&prs);
+}
+
+Datum
+json_to_tsvector_byid(PG_FUNCTION_ARGS)
+{
+	Oid			cfgId = PG_GETARG_OID(0);
+	text	   *json = PG_GETARG_TEXT_P(1);
+	TSVector	result;
 
-	result = make_tsvector(&prs);
+	result = json_to_tsvector_worker(cfgId, json, false);
+	PG_FREE_IF_COPY(json, 1);
 
 	PG_RETURN_TSVECTOR(result);
 }
@@ -330,11 +381,40 @@ json_to_tsvec

Re: json(b)_to_tsvector with numeric values

2018-04-04 Thread Teodor Sigaev



Hm, seems, it's useful feature, but I suggest to make separate function
jsonb_any_to_tsvector and add support for boolean too (if you know better
name for function, do not hide it). Changing behavior of existing function
is not obvious for users and, seems, should not backpatched.


What do you think about having not a separate function, but a flag argument to
the existing one (like `create` in `jsonb_set`), that will have false as
default value? The result would be the same, but without an extra function with
almost the same implementation.


tsvector jsonb_to_tsvector(jsonb[, bool]) ?
Agreed. Second arg should be optional.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: json(b)_to_tsvector with numeric values

2018-04-04 Thread Dmitry Dolgov
> On 4 April 2018 at 11:52, Teodor Sigaev  wrote:
 the consistency of FTS.I think this is a bug, which needs to be fixed,
 else inconsistency with existing full text search  will be gets
 deeper.
>
> Hm, seems, it's useful feature, but I suggest to make separate function
> jsonb_any_to_tsvector and add support for boolean too (if you know better
> name for function, do not hide it). Changing behavior of existing function
> is not obvious for users and, seems, should not backpatched.

What do you think about having not a separate function, but a flag argument to
the existing one (like `create` in `jsonb_set`), that will have false as
default value? The result would be the same, but without an extra function with
almost the same implementation.



Re: json(b)_to_tsvector with numeric values

2018-04-04 Thread Teodor Sigaev

the consistency of FTS.I think this is a bug, which needs to be fixed,
else inconsistency with existing full text search  will be gets
deeper.
Hm, seems, it's useful feature, but I suggest to make separate function 
jsonb_any_to_tsvector and add support for boolean too (if you know better name 
for function, do not hide it). Changing behavior of existing function is not 
obvious for users and, seems, should not backpatched.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: json(b)_to_tsvector with numeric values

2018-04-02 Thread Dmitry Dolgov
> On 2 April 2018 at 11:27, Arthur Zakirov  wrote:
> On Mon, Apr 02, 2018 at 11:41:12AM +0300, Oleg Bartunov wrote:
>> On Mon, Apr 2, 2018 at 9:45 AM, Arthur Zakirov  
>> wrote:
>> I found this bug, when working on presentation about FTS and it looked
>> annoying, since it validates
>> the consistency of FTS.I think this is a bug, which needs to be fixed,
>> else inconsistency with existing full text search  will be gets
>> deeper.
>>
>> The fix looks trivial, but needs a review, of course.
>
> Oh, I understood. The code looks good, tests passed. But maybe it is
> better to use NumericGetDatum() instead of PointerGetDatum()?

Well, technically speaking they're the same, but yes, NumericGetDatum would be
more precise. I've modified it in the attached patch.


jsonb_to_tsvector_numeric_v2.patch
Description: Binary data


Re: json(b)_to_tsvector with numeric values

2018-04-02 Thread Arthur Zakirov
On Mon, Apr 02, 2018 at 11:41:12AM +0300, Oleg Bartunov wrote:
> On Mon, Apr 2, 2018 at 9:45 AM, Arthur Zakirov  
> wrote:
> I found this bug, when working on presentation about FTS and it looked
> annoying, since it validates
> the consistency of FTS.I think this is a bug, which needs to be fixed,
> else inconsistency with existing full text search  will be gets
> deeper.
> 
> The fix looks trivial, but needs a review, of course.

Oh, I understood. The code looks good, tests passed. But maybe it is
better to use NumericGetDatum() instead of PointerGetDatum()?

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: json(b)_to_tsvector with numeric values

2018-04-02 Thread Oleg Bartunov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


On Mon, Apr 2, 2018 at 9:45 AM, Arthur Zakirov  wrote:
> Hello Dmitry,
>
> Dmitry Dolgov <9erthali...@gmail.com> wrote:
>>
>> Any opinions about this suggestion? Can it be considered as a bug fix and
>> included into this release?
>
>
> I think there is no chance to include it into v11. You can add the patch to
> the 2018-09 commitfest.

I found this bug, when working on presentation about FTS and it looked
annoying, since it validates
the consistency of FTS.I think this is a bug, which needs to be fixed,
else inconsistency with existing full text search  will be gets
deeper.

The fix looks trivial, but needs a review, of course.

>
>
> --
> Arthur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company



Re: json(b)_to_tsvector with numeric values

2018-04-01 Thread Arthur Zakirov
Hello Dmitry,

Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> Any opinions about this suggestion? Can it be considered as a bug fix and
> included into this release?
>

I think there is no chance to include it into v11. You can add the patch to
the 2018-09 commitfest.


-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company