Re: [HACKERS] tsearch patch and namespace pollution

2007-08-17 Thread Michael Paesold

Bruce Momjian wrote:

I would be happy if all text search functions began with 'ts', 'ts_', or
'to_ts', and if we don't clean this up now, it is going to be harder in
the future.


+1 from me. \df is also much more useful then.

 I think users can expect some migration for text search in

8.3 as a benefit of getting into core and be dump-able.


I guess so. Especially if you change some functions, they will have to 
change source code anyway. So you can as well cleanup all functions that 
don't fit into a sound naming schema.


Best Regards
Michael Paesold


---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] tsearch patch and namespace pollution

2007-08-16 Thread Tom Lane
I find the following additions to pg_proc in the current tsearch2 patch:

   proc   | prorettype 
--+
 pg_ts_parser_is_visible(oid) | boolean
 pg_ts_dict_is_visible(oid)   | boolean
 pg_ts_template_is_visible(oid)   | boolean
 pg_ts_config_is_visible(oid) | boolean
 tsvectorin(cstring)  | tsvector
 tsvectorout(tsvector)| cstring
 tsvectorsend(tsvector)   | bytea
 tsqueryin(cstring)   | tsquery
 tsqueryout(tsquery)  | cstring
 tsquerysend(tsquery) | bytea
 gtsvectorin(cstring) | gtsvector
 gtsvectorout(gtsvector)  | cstring
 tsvector_lt(tsvector,tsvector)   | boolean
 tsvector_le(tsvector,tsvector)   | boolean
 tsvector_eq(tsvector,tsvector)   | boolean
 tsvector_ne(tsvector,tsvector)   | boolean
 tsvector_ge(tsvector,tsvector)   | boolean
 tsvector_gt(tsvector,tsvector)   | boolean
 tsvector_cmp(tsvector,tsvector)  | integer
 length(tsvector) | integer
 strip(tsvector)  | tsvector
 setweight(tsvector,char)   | tsvector
 tsvector_concat(tsvector,tsvector)   | tsvector
 vq_exec(tsvector,tsquery)| boolean
 qv_exec(tsquery,tsvector)| boolean
 tt_exec(text,text)   | boolean
 ct_exec(character varying,text)  | boolean
 tq_exec(text,tsquery)| boolean
 cq_exec(character varying,tsquery)   | boolean
 tsquery_lt(tsquery,tsquery)  | boolean
 tsquery_le(tsquery,tsquery)  | boolean
 tsquery_eq(tsquery,tsquery)  | boolean
 tsquery_ne(tsquery,tsquery)  | boolean
 tsquery_ge(tsquery,tsquery)  | boolean
 tsquery_gt(tsquery,tsquery)  | boolean
 tsquery_cmp(tsquery,tsquery) | integer
 tsquery_and(tsquery,tsquery) | tsquery
 tsquery_or(tsquery,tsquery)  | tsquery
 tsquery_not(tsquery) | tsquery
 tsq_mcontains(tsquery,tsquery)   | boolean
 tsq_mcontained(tsquery,tsquery)  | boolean
 numnode(tsquery) | integer
 querytree(tsquery)   | text
 rewrite(tsquery,tsquery,tsquery) | tsquery
 rewrite(tsquery,text)| tsquery
 rewrite_accum(tsquery,tsquery[]) | tsquery
 rewrite_finish(tsquery)  | tsquery
 rewrite(tsquery[])   | tsquery
 stat(text)   | record
 stat(text,text)  | record
 rank(real[],tsvector,tsquery,integer)| real
 rank(real[],tsvector,tsquery)| real
 rank(tsvector,tsquery,integer)   | real
 rank(tsvector,tsquery)   | real
 rank_cd(real[],tsvector,tsquery,integer) | real
 rank_cd(real[],tsvector,tsquery) | real
 rank_cd(tsvector,tsquery,integer)| real
 rank_cd(tsvector,tsquery)| real
 token_type(oid)  | record
 token_type(text) | record
 parse(oid,text)  | record
 parse(text,text) | record
 lexize(oid,text) | text[]
 lexize(text,text)| text[]
 headline(oid,text,tsquery,text)  | text
 headline(oid,text,tsquery)   | text
 headline(text,text,tsquery,text) | text
 headline(text,text,tsquery)  | text
 headline(text,tsquery,text)  | text
 headline(text,tsquery)   | text
 to_tsvector(oid,text)| tsvector
 to_tsvector(text,text)   | tsvector
 to_tsquery(oid,text) | tsquery
 to_tsquery(text,text)| tsquery
 plainto_tsquery(oid,text)| tsquery
 plainto_tsquery(text,text)   | tsquery
 to_tsvector(text)| tsvector
 to_tsquery(text) | tsquery
 plainto_tsquery(text)| tsquery
 tsvector_update_trigger()| trigger
 get_ts_config_oid(text)  | oid
 get_current_ts_config()  | oid
(82 rows)

(This list omits functions with INTERNAL arguments, as those are of
no particular concern to users.)

While most of these are probably OK, I'm disturbed by the prospect
that we are commandeering names as generic as parse or stat
with argument types as generic as text.  I think we need to put
a ts_ prefix on some of these.  Specifically, I find these names
totally unacceptable without a ts_ prefix:

 stat(text)   | record
 stat(text,text)  | record

 token_type(oid)  | record
 token_type(text)   

Re: [HACKERS] tsearch patch and namespace pollution

2007-08-16 Thread Bruce Momjian
Tom Lane wrote:
 I find the following additions to pg_proc in the current tsearch2 patch:

It seems a lot of these are useless and just bloat.  I will mark a few:

proc   | prorettype 
 --+
  pg_ts_parser_is_visible(oid) | boolean
  pg_ts_dict_is_visible(oid)   | boolean
  pg_ts_template_is_visible(oid)   | boolean
  pg_ts_config_is_visible(oid) | boolean

Why would anyone look these up via OID rather than name?

  tsvectorin(cstring)  | tsvector
  tsvectorout(tsvector)| cstring
  tsvectorsend(tsvector)   | bytea
  tsqueryin(cstring)   | tsquery
  tsqueryout(tsquery)  | cstring
  tsquerysend(tsquery) | bytea
  gtsvectorin(cstring) | gtsvector
  gtsvectorout(gtsvector)  | cstring
  tsvector_lt(tsvector,tsvector)   | boolean
  tsvector_le(tsvector,tsvector)   | boolean
  tsvector_eq(tsvector,tsvector)   | boolean
  tsvector_ne(tsvector,tsvector)   | boolean
  tsvector_ge(tsvector,tsvector)   | boolean
  tsvector_gt(tsvector,tsvector)   | boolean
  tsvector_cmp(tsvector,tsvector)  | integer
  length(tsvector) | integer
  strip(tsvector)  | tsvector
  setweight(tsvector,char)   | tsvector
  tsvector_concat(tsvector,tsvector)   | tsvector
  vq_exec(tsvector,tsquery)| boolean
  qv_exec(tsquery,tsvector)| boolean
  tt_exec(text,text)   | boolean
  ct_exec(character varying,text)  | boolean
  tq_exec(text,tsquery)| boolean
  cq_exec(character varying,tsquery)   | boolean
  tsquery_lt(tsquery,tsquery)  | boolean
  tsquery_le(tsquery,tsquery)  | boolean
  tsquery_eq(tsquery,tsquery)  | boolean
  tsquery_ne(tsquery,tsquery)  | boolean
  tsquery_ge(tsquery,tsquery)  | boolean
  tsquery_gt(tsquery,tsquery)  | boolean
  tsquery_cmp(tsquery,tsquery) | integer
  tsquery_and(tsquery,tsquery) | tsquery
  tsquery_or(tsquery,tsquery)  | tsquery
  tsquery_not(tsquery) | tsquery
  tsq_mcontains(tsquery,tsquery)   | boolean
  tsq_mcontained(tsquery,tsquery)  | boolean
  numnode(tsquery) | integer
  querytree(tsquery)   | text
  rewrite(tsquery,tsquery,tsquery) | tsquery
  rewrite(tsquery,text)| tsquery
  rewrite_accum(tsquery,tsquery[]) | tsquery
  rewrite_finish(tsquery)  | tsquery
  rewrite(tsquery[])   | tsquery
  stat(text)   | record
  stat(text,text)  | record

  rank(real[],tsvector,tsquery,integer)| real
  rank(real[],tsvector,tsquery)| real
  rank(tsvector,tsquery,integer)   | real
  rank(tsvector,tsquery)   | real
  rank_cd(real[],tsvector,tsquery,integer) | real
  rank_cd(real[],tsvector,tsquery) | real
  rank_cd(tsvector,tsquery,integer)| real
  rank_cd(tsvector,tsquery)| real

Do we realy need this many ranking functions?

  token_type(oid)  | record

Again, why by OID?

  token_type(text) | record
  parse(oid,text)  | record
  parse(text,text) | record
  lexize(oid,text) | text[]
  lexize(text,text)| text[]
  headline(oid,text,tsquery,text)  | text
  headline(oid,text,tsquery)   | text
  headline(text,text,tsquery,text) | text
  headline(text,text,tsquery)  | text
  headline(text,tsquery,text)  | text
  headline(text,tsquery)   | text
  to_tsvector(oid,text)| tsvector
  to_tsvector(text,text)   | tsvector
  to_tsquery(oid,text) | tsquery

Why OID again for the configuration?  I just don't see the use case and
it is bloat and causes confusion.

  to_tsquery(text,text)| tsquery
  plainto_tsquery(oid,text)| tsquery
  plainto_tsquery(text,text)   | tsquery

Again, OID.  I asked Oleg about this and he said:

 Bruce, just remove oid argument specification from documentation.

so I think we can go ahead and remove cases where the configuration name
or object is specified by oid.  I have already removed them from the
documentation and I though the patch had them removed too, but I guess
not.  Admittedly this API has been in flux.

  to_tsvector(text)| tsvector
  to_tsquery(text) | tsquery