Re: [HACKERS] Tsvector editing functions

Stas Kelvich Wed, 27 Jan 2016 08:40:34 -0800

Hi

> On 22 Jan 2016, at 19:03, Tomas Vondra <[email protected]> wrote:
> OK, although I do recommend using more sensible variable names, i.e. why how 
> to use 'lexemes' instead of 'lexarr' for example? Similarly for the other 
> functions.



Changed. With old names I tried to follow conventions in surrounding code, but 
probably that is a good idea to switch to more meaningful names in new code.

>> 
>> 
>> delete(tsin tsvector, tsv_filter tsvector) — Delete lexemes and/or positions 
>> of tsv_filter from tsin. When lexeme in tsv_filter has no positions function 
>> will delete any occurrence of same lexeme in tsin. When tsv_filter lexeme 
>> have positions function will delete them from positions of matching lexeme 
>> in tsin. If after such removal resulting positions set is empty then 
>> function will delete that lexeme from resulting tsvector.
>> 
> 
> I can't really imagine situation in which I'd need this, but if you do have a 
> use case for it ... although in the initial paragraph you say "... but if 
> somebody wants to delete for example ..." which suggests you may not have 
> such use case.
> 
> Based on bad experience with extending API based on vague ideas, I recommend 
> only really adding functions with existing need. It's easy to add a function 
> later, much more difficult to remove it or change the signature.

I tried to create more or less self-contained api, e.g. have ability to negate 
effect of concatenation. But i’ve also asked people around what they think 
about extending API and everybody convinced that it is better to stick to 
smaller API. So let’s drop it. At least that functions exists in mail list in 
case if somebody will google for such kind of behaviour.

>> 
>> Also if we want some level of completeness of API and taking into account 
>> that concat() function shift positions on second argument I thought that it 
>> can be useful to also add function that can shift all positions of specific 
>> value. This helps to undo concatenation: delete one of concatenating 
>> tsvectors and then shift positions in resulting tsvector. So I also wrote 
>> one another small function:
>> 
>> shift(tsin tsvector,offset int16) — Shift all positions in tsin by given 
>> offset
> 
> That seems rather too low-level. Shouldn't it be really built into delete() 
> directly somehow?


I think it is ambiguous task on delete. But if we are dropping support of 
delete(tsvector, tsvector) I don’t see points in keeping that functions.

>>> 
>>> 7) Some of the functions use intexterm that does not match the function
>>>   name. I see two such cases - to_tsvector and setweight. Is there a
>>>   reason for that?
>>> 
>> 
>> Because sgml compiler wants unique indexterm. Both functions that
>> youmentioned use overloading of arguments and have non-unique name.
> 
> As Michael pointed out, that should probably be handled by using <primary> 
> and <secondary> tags.


Done.


> On 19 Jan 2016, at 00:21, Alvaro Herrera <[email protected]> wrote:
> 
> 
> It's a bit funny that you reintroduce the "unrecognized weight: %d"
> (instead of %c) in tsvector_setweight_by_filter.
> 


Ah, I was thinking about moving it to separate diff and messed. Fixed and 
attaching diff with same fix for old tsvector_setweight.

tsvector_ops-v2.1.diff
Description: Binary data

tsvector_ops-v2.2.diff
Description: Binary data



---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tsvector editing functions

Reply via email to