subject:"Re\: \[HACKERS\] tsearch_core patch\: permissions and security issues"

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-15 Thread Simon Riggs

On Fri, 2007-06-15 at 10:36 -0400, Tom Lane wrote:
> "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > Although I'm happy to see tsearch finally hit the big time, I'm a bit
> > disappointed to see so many new datatype-specific SQL commands created.
> 
> Per subsequent discussion we are down to just one new set of commands,
> CREATE/ALTER/DROP TEXT SEARCH CONFIGURATION, so it's not as big a
> footprint as it was to start with.

Thats a lot better, thanks. I'm sure that will work better in PgAdmin
and many other places too.

> I have been thinking that it would be smart to try to use the generic
> "definition list" syntax, like CREATE OPERATOR and CREATE AGGREGATE.
> But the motivation for that is just to avoid defining more keywords
> (which has an overall impact on parser size and performance).  It's
> not really going to do anything for us in terms of having an
> implementation that can be shared with anything else.

It's OK; ALTER RFID TAG NOMENCLATURE has a nice ring to it. :-)

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-15 Thread Robert Treat

On Friday 15 June 2007 00:46, Oleg Bartunov wrote:
> On Thu, 14 Jun 2007, Tom Lane wrote:
> > [ thinks some more... ]  If we revived the GENERATED AS patch,
> > you could imagine computing tsvector columns via "GENERATED AS
> > to_tsvector('english'::regconfig, big_text_col)" instead of a
> > trigger, and then again you've got the dependency exposed where
> > the system can see it.  I don't wanna try to do that for 8.3,
> > but it might be a good path to pursue in future, instead of assuming
> > that triggers will be the way forevermore.
> >
> > Thoughts?
>
> No way with standard. GENERATED AS says that "all columns references in an
> expression associated with a generated column must be to columns of the
> base table containing that generated column."
>
> tsvector could be result of rather complex select involving several tables.
>

Is there some reason for this restriction in the standard?  I might be in 
favor of "extending" the standard to allow this case if not.  

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-15 Thread Robert Treat

On Thursday 14 June 2007 15:10, Teodor Sigaev wrote:
> That changes are doable for several days. I'd like to make changes together
> with replacing of FULLTEXT keyword to TEXT SEARCH as you suggested.

AIUI the discussion on this change took place off list?  Can we get a preview 
of what the commands will look like and maybe a summary of the discussion?  It 
may sound a bit pedantic, but the concept/wording of "full text searching" is 
pretty well understood by the database community, so switching to just TEXT 
SEARCH sounds like we're adding ambiguity for reasons that escape me   

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-15 Thread Tom Lane

"Simon Riggs" <[EMAIL PROTECTED]> writes:
> Although I'm happy to see tsearch finally hit the big time, I'm a bit
> disappointed to see so many new datatype-specific SQL commands created.

Per subsequent discussion we are down to just one new set of commands,
CREATE/ALTER/DROP TEXT SEARCH CONFIGURATION, so it's not as big a
footprint as it was to start with.

> Can we consider CREATE TYPE CONFIGURATION with subsets such as...
> CREATE TYPE CONFIGURATION name USING FULLTEXT (map)
> CREATE TYPE CONFIGURATION name USING FULLTEXT (dictionary)
> CREATE TYPE CONFIGURATION name USING FULLTEXT (parser)

This seems entirely cosmetic, unless you have some proposal for allowing
a uniform underlying implementation not only syntax.  In the absence of
some concrete cases to consider, I don't see how we could imagine that
we know how to implement a useful generic approach.

I have been thinking that it would be smart to try to use the generic
"definition list" syntax, like CREATE OPERATOR and CREATE AGGREGATE.
But the motivation for that is just to avoid defining more keywords
(which has an overall impact on parser size and performance).  It's
not really going to do anything for us in terms of having an
implementation that can be shared with anything else.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-15 Thread Simon Riggs

On Wed, 2007-06-13 at 18:06 -0400, Bruce Momjian wrote:
> You bring up a very good point.  There are fifteen new commands being
> added for full text indexing:
> 
>   alter-fulltext-config.sgml  alter-fulltext-owner.sgml
>   create-fulltext-dict.sgml   drop-fulltext-dict.sgml
>   alter-fulltext-dict.sgmlalter-fulltext-parser.sgml
>   create-fulltext-map.sgmldrop-fulltext-map.sgml
>   alter-fulltext-dictset.sgml comment-fulltext.sgml
>   create-fulltext-parser.sgml drop-fulltext-parser.sgml
>   alter-fulltext-map.sgml create-fulltext-config.sgml
>   drop-fulltext-config.sgml

Although I'm happy to see tsearch finally hit the big time, I'm a bit
disappointed to see so many new datatype-specific SQL commands created.
That sets a bad precedent for other datatype authors, as well as all
those people that want to invent new things like SKYLINE etc..

Whatever the reasons for the new commands, those reasons must also be
potentially shared by authors of other datatypes too. Is there a way to
genericise these commands so that we can offer those same benefits to
all datatypes, rather than setting a double standard? Or do we think
that when a Geodetic datatype tries to come into core we would have
commands like CREATE COORDINATE TRANSFORM?

Can we consider CREATE TYPE CONFIGURATION with subsets such as...
CREATE TYPE CONFIGURATION name USING FULLTEXT (map)
CREATE TYPE CONFIGURATION name USING FULLTEXT (dictionary)
CREATE TYPE CONFIGURATION name USING FULLTEXT (parser)

Your choice of syntax may vary, but my point is about creating a
mechanism by which any datatype author can reference complex
configuration details. We managed to do this for INDEXes and OPERATORs,
so it seems a shame to go for the full 15 new commands when we could do
the same thing with much fewer commands and ones that could then be
utilised by others, e.g. PostGIS.

Last minute change true, but only parser+docs changes suggested. I want
the full tsearch2 functionality as much as anyone.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Oleg Bartunov


On Thu, 14 Jun 2007, Tom Lane wrote:


Oleg Bartunov <[EMAIL PROTECTED]> writes:

You're correct. But we can't defend users from all possible errors.
Other side, that we need somehow to help user to identify what fts
configuration was used to produce tsvector. For example, comment on
tsvector column would be useful, but we don't know how to do this
automatically.


Yeah, I was wondering about that too.  The only way we could relax the
superuser, you-better-know-what-you're-doing restriction on changing
configurations would be if we had a way to identify which tsvector
columns needed to be updated.  Right now that's pretty hard to find out
because the references to configurations are buried in the bodies of
trigger functions.  That whole trigger-function business is not the
nicest part of tsearch2, either ... it'd be better if we could automate
tsvector maintenance more.


yes, trigger function is a complex stuff, our tsearch() trigger is an 
example of automated stuff. It could be written very easy on plpgsql,

for example.

=# create function my_update() returns trigger as 
$$

BEGIN
   NEW.fts=
   setweight( to_tsvector('english',NEW.t1),'A') || ' ' ||
   setweight( to_tsvector('english',NEW.t2),'B');
 RETURN NEW;
END;
$$ 
language plpgsql;




One thing I was thinking about is that rather than storing a physical
tsvector column, people might index a "virtual" column using functional
indexes:

create index ... (to_tsvector('english', big_text_col))

which could be queried

select ... where to_tsvector('english', big_text_col) @@ tsquery



this is already possible for gin index

create index gin_text_idx on test using gin (
( coalesce(to_tsvector(title),'') || coalesce(to_tsvector(body),'') )
);


apod=# select title from test where 
(coalesce(to_tsvector(title),'') || coalesce(to_tsvector(body),'') ) @@ 
to_tsquery('supernovae') order by sdate desc limit 10;





Assuming that the index is lossy, the index condition would have to be
rechecked, so to_tsvector() would have to be recomputed, but only at the
rows identified as candidate matches by the index.  The I/O savings from
eliminating the heap's tsvector column might counterbalance the extra
CPU for recomputing tsvectors.  Or not, but in any case this is
attractive because it doesn't need any handmade maintenance support like
a trigger --- the regular index maintenance code does it all.


I'm afraid it wouldn't work for all cases. We already have headline() function 
which had to reparse document to produce text snippet and it's very slow
and eats most select time. 
ALso, trigger stuff is a normal machinery for databases.




It strikes me that we could play the same kind of game we played to make
nextval() references to sequences be recognized as dependencies on
sequences.  Invent a "regconfig" OID type that's just like regclass
except it handles OIDs of ts_config entries instead of pg_class entries,
and make the first argument of to_tsvector be one of those:

create index ... (to_tsvector('english'::regconfig, big_text_col))

Now dependency.c can be taught to recognize the regconfig Const as
depending on the referenced ts_config entry, and voila we have a
pg_depend entry showing that the index depends on the configuration.
What we actually do about it is another question, but this at least
gets the knowledge into the system.



interesting. And \di could display all configuration stuff for text search
indexes ?



[ thinks some more... ]  If we revived the GENERATED AS patch,
you could imagine computing tsvector columns via "GENERATED AS
to_tsvector('english'::regconfig, big_text_col)" instead of a
trigger, and then again you've got the dependency exposed where
the system can see it.  I don't wanna try to do that for 8.3,
but it might be a good path to pursue in future, instead of assuming
that triggers will be the way forevermore.

Thoughts?


No way with standard. GENERATED AS says that "all columns references in an 
expression associated with a generated column must be to columns of the base 
table containing that generated column."


tsvector could be result of rather complex select involving several tables.



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Teodor Sigaev

It should be. Instances of ispell (and synonym, thesaurus) dictionaries are 
different only in dict_initoption part, so it will be only one entry in 
pg_ts_dict_template and several ones in pg_ts_dict.


No, I was thinking of still having just one pg_ts_dict catalog (no template)
but removing its dictinit field.  Instead, the init strings would be
stored with configuration mapping entries.

This would mean having to remember to provide the right option along
with the dictionary name when doing ALTER CONFIGURATION ADD MAPPING.
Not sure if that would be harder or easier to use than what you're
thinking of.


Hmm. Dictionary may present in several lists of dictionaries in one 
configuration. Suppose, it isn't practical to store dictinitoption 
several times. In other hand, the same dictionary (template) with 
different init option may present on configuration too. Typical example 
is configuration for russian language:

lword, lpword tokens have dictionary's list {ispell_en, stem_en}
nlword, nlpword tokens have dictionary's list {ispell_ru, stem_ru}

stem_(ru|en) is a Snowball's stemmer, but ispell_(ru|en) is a ispell 
dictionary (template) with different dictinitoption. Next, 
configurations may share dictionaries.


And, init option may be rather big.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Tom Lane

Michael Paesold <[EMAIL PROTECTED]> writes:
> After reading the discussion and the introduction, here is what I think 
> tsearch in core should at least accomplish in 8.3.
> ...
>   - Stop words in tables, not in external files.

I realized that there's a pretty serious problem with doing that, which
is encoding.  We don't have any way to deal with preloaded catalog data
that exceeds 7-bit-ASCII, because when you do CREATE DATABASE ... ENCODING
it's going to be copied over exactly as-is.  And there's plenty of
not-ASCII stuff in the non-English stopword files.  This is something we
need to solve eventually, but I think it ties into the whole multiple
locale can-of-worms; there's no way we're getting it done for 8.3.

So I'm afraid we have to settle for stop words in external files for the
moment.  I do have two suggestions though:

* Let's have just one stopword file for each language, with the
convention that the file is stored in UTF8 no matter what language
you're talking about.  We can have the stopword reading code convert
to the database encoding on-the-fly when it reads the file.  Without
this there's just a whole bunch of foot-guns there.  We'd at least need
to have encoding verification checks when reading the files, which seems
hardly cheaper than just translating the data.

* Let's fix it so the reference to the stoplist in the user-visible
options is just a name, with no path or anything like that.  (Similar
to the handling of timezone_abbreviations.)  Then it will be feasible
to re-interpret the option as a reference to a named list in a catalog
someday, when we solve the encoding problem.  Right now the patch has
things like

+ DATA(insert OID = 5140 (  "ru_stem_koi8" PGNSP PGUID 5135 5137 
"dicts_data/russian.stop.koi8"));

which is really binding the option pretty tightly to being a filename;
not to mention the large security risks involved in letting anyone but
a superuser have control of such an option.

> What I don't really like is the number of commands introduced without 
> any strong reference to full text search. E.g. CREATE CONFIGURATION 
> gives no hint at all that this is about full text search.

Yeah.  We had some off-list discussion about this and concluded that
TEXT SEARCH seemed to be the right phrase to use in the command names.
That hasn't gotten reflected into the patch yet.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Gregory Stark

"Oleg Bartunov" <[EMAIL PROTECTED]> writes:

> On Thu, 14 Jun 2007, Gregory Stark wrote:
>
>> Am I correct to think of this like changing collations leaving your btree
>> index "corrupt"? In that case it probably won't cause any backend crash 
>> either
>> but you will get incorrect results. For example, returning different results
>> depending on whether the index or a full table scan is used.
>
> You're correct. But we can't defend users from all possible errors. 

Sure, but it seems like a the line, at least in existing cases, is that if you
fiddle with catalogs directly then you should know what consequences you need
to be careful of.

But when if you make changes through a supported, documented interface then
the system will protect you from breaking things. 

Hm, I went to construct an example and accidentally found a precedent for not
necessarily protecting users from themselves in every case:


postgres=# create table x (i integer);
CREATE TABLE
postgres=# create function f(integer) returns integer as 'select $1' immutable 
strict language sql;
CREATE FUNCTION
postgres=# select f(1);
 f 
---
 1
(1 row)

postgres=# create index xi on x (f(i));
CREATE INDEX
postgres=# insert into x values (1);
INSERT 0 1
postgres=# insert into x values (2);
INSERT 0 1
postgres=# create or replace function f(integer) returns integer as 'select 
-$1' immutable strict language sql;
CREATE FUNCTION


Uhm. Oops! And yes, the resulting index is, of course, corrupt:



postgres=# insert into x (select random() from generate_series(1,2000));
INSERT 0 2000
postgres=# select count(*) from x where f(i) = -1;
 count 
---
 0
(1 row)
postgres=# set enable_bitmapscan = off;
SET
postgres=# set enable_indexscan = off;
SET
postgres=#  select count(*) from x where f(i) = -1;
 count 
---
  1003
(1 row)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Teodor Sigaev <[EMAIL PROTECTED]> writes:
>>> The reason to save SQLish interface to dictionaries is a simplicity of 
>>> configuration. Snowball's stemmers are useful as is, but ispell dictionary 
>>> requires some configuration action before using.
>> 
>> Yeah.  I had been wondering about moving the dict_initoption over to the
>> configuration entry --- is that sane at all?  It would mean that

> It should be. Instances of ispell (and synonym, thesaurus) dictionaries are 
> different only in dict_initoption part, so it will be only one entry in 
> pg_ts_dict_template and several ones in pg_ts_dict.

No, I was thinking of still having just one pg_ts_dict catalog (no template)
but removing its dictinit field.  Instead, the init strings would be
stored with configuration mapping entries.

This would mean having to remember to provide the right option along
with the dictionary name when doing ALTER CONFIGURATION ADD MAPPING.
Not sure if that would be harder or easier to use than what you're
thinking of.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Michael Paesold


Bruce Momjian wrote:

I an attempt to communicate what full text search does, and what
features we are thinking of adding/removing, I have put up the
introduction in HTML:

http://momjian.us/expire/fulltext/HTML/fulltext-intro.html



Very good idea, Bruce!

After reading the discussion and the introduction, here is what I think 
tsearch in core should at least accomplish in 8.3. Please bear in mind, 
that (a) I am talking from a user perspective (there might be technical 
arguments against my thoughts) and (b) I have no hands-on experience 
with Tsearch2 yet, so more experienced users might have different needs.


 - Basic full text search usable for non-superusers
 - Out-of-the-box working configuration for as many languages as
   reasonable (Teodor named quite a number of languages working as-is,
   so this is really an improvement over contrib, great!)
 - No foot-guns accessible to non-superuser
 - Agreement on function names, perhaps some should be changed. For
   instance to_tsquery() and plainto_tsquery() seem rather unintuitive
   because they don't have a common prefix, and they are not consistent
   about using underscores. Perhaps to_tsquery() and to_tsquery_plain()?
 - Future compatibility for all features available to non-superusers
 - Stop words in tables, not in external files.
 - At least for superusers, all features available in contrib now,
   should be available, too (don't know about pg_dump).

What I don't really like is the number of commands introduced without 
any strong reference to full text search. E.g. CREATE CONFIGURATION 
gives no hint at all that this is about full text search. IMHO there are 
more configurations than just full text ones. :-) So perhaps better 
spell this CREATE FULLTEXT CONFIGURATION etc.? (Think about tab 
completion in psql, for instance.)


I guess this is in line with what Tom said about mapping objects and 
CREATE ATTRIBUTE vs. CREATE/ALTER CONFIGURATION.

(http://archives.postgresql.org/pgsql-hackers/2007-06/msg00522.php)

After all, I would really welcome having full text search capabilities 
in core.


Best Regards
Michael Paesold


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Tom Lane

Oleg Bartunov <[EMAIL PROTECTED]> writes:
> You're correct. But we can't defend users from all possible errors. 
> Other side, that we need somehow to help user to identify what fts 
> configuration was used to produce tsvector. For example, comment on
> tsvector column would be useful, but we don't know how to do this
> automatically.

Yeah, I was wondering about that too.  The only way we could relax the
superuser, you-better-know-what-you're-doing restriction on changing
configurations would be if we had a way to identify which tsvector
columns needed to be updated.  Right now that's pretty hard to find out
because the references to configurations are buried in the bodies of
trigger functions.  That whole trigger-function business is not the
nicest part of tsearch2, either ... it'd be better if we could automate
tsvector maintenance more.

One thing I was thinking about is that rather than storing a physical
tsvector column, people might index a "virtual" column using functional
indexes:

create index ... (to_tsvector('english', big_text_col))

which could be queried

select ... where to_tsvector('english', big_text_col) @@ tsquery

Assuming that the index is lossy, the index condition would have to be
rechecked, so to_tsvector() would have to be recomputed, but only at the
rows identified as candidate matches by the index.  The I/O savings from
eliminating the heap's tsvector column might counterbalance the extra
CPU for recomputing tsvectors.  Or not, but in any case this is
attractive because it doesn't need any handmade maintenance support like
a trigger --- the regular index maintenance code does it all.

It strikes me that we could play the same kind of game we played to make
nextval() references to sequences be recognized as dependencies on
sequences.  Invent a "regconfig" OID type that's just like regclass
except it handles OIDs of ts_config entries instead of pg_class entries,
and make the first argument of to_tsvector be one of those:

create index ... (to_tsvector('english'::regconfig, big_text_col))

Now dependency.c can be taught to recognize the regconfig Const as
depending on the referenced ts_config entry, and voila we have a
pg_depend entry showing that the index depends on the configuration.
What we actually do about it is another question, but this at least
gets the knowledge into the system.

[ thinks some more... ]  If we revived the GENERATED AS patch,
you could imagine computing tsvector columns via "GENERATED AS
to_tsvector('english'::regconfig, big_text_col)" instead of a
trigger, and then again you've got the dependency exposed where
the system can see it.  I don't wanna try to do that for 8.3,
but it might be a good path to pursue in future, instead of assuming
that triggers will be the way forevermore.

Thoughts?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Teodor Sigaev


Am I correct to think of this like changing collations leaving your btree
index "corrupt"? In that case it probably won't cause any backend crash either
but you will get incorrect results. For example, returning different results
depending on whether the index or a full table scan is used.


Without exotic cases, maximum disaster may be that queries with some words will 
return more or less results than should be. Because of wrong stemming or wrong 
match of stop-word or wrong mapping.


By default, configuration is useful for most users and works for danish, dutch, 
finnish, french, german, hungarian, italian, norwegian, portuguese, spanish, 
swedish, russin and english languages.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Teodor Sigaev

But, there are reasons to allow users register new templates and in fact 
we know people/projects with application-dependent dictionaries. How 
they could dump/reload their dictionaries ?

The same way as pg_am does.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Oleg Bartunov


On Thu, 14 Jun 2007, Gregory Stark wrote:


"Teodor Sigaev" <[EMAIL PROTECTED]> writes:


But they still need some more thought about permissions, because AFAICS
mucking with a configuration can invalidate some other user's data.


ouch. could mucking with a configuration create a corrupt index?


Depending on what you mean 'corrupted'. It will not corrupted as non-readable
or cause backend crash. But usage of such tsvector column could be limited -
not all words will be searchable.


Am I correct to think of this like changing collations leaving your btree
index "corrupt"? In that case it probably won't cause any backend crash either
but you will get incorrect results. For example, returning different results
depending on whether the index or a full table scan is used.


You're correct. But we can't defend users from all possible errors. 
Other side, that we need somehow to help user to identify what fts 
configuration was used to produce tsvector. For example, comment on

tsvector column would be useful, but we don't know how to do this
automatically.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Oleg Bartunov


On Thu, 14 Jun 2007, Tom Lane wrote:


Teodor Sigaev <[EMAIL PROTECTED]> writes:

The reason to save SQLish interface to dictionaries is a simplicity of
configuration. Snowball's stemmers are useful as is, but ispell dictionary
requires some configuration action before using.


Yeah.  I had been wondering about moving the dict_initoption over to the
configuration entry --- is that sane at all?  It would mean that
dict_init functions would have to guard themselves against invalid
options, but they probably ought to do that anyway.  If we did that,
I think we could have a fixed set of dictionaries without too much
problem, and focus on just configurations as being user-alterable.


currently, all dictionaries we provide are all template dictionaries,
so users could change only parameters.

But, there are reasons to allow users register new templates and in fact we 
know people/projects with application-dependent dictionaries. 
How they could dump/reload their dictionaries ?


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Teodor Sigaev




Tom Lane wrote:

Teodor Sigaev <[EMAIL PROTECTED]> writes:
The reason to save SQLish interface to dictionaries is a simplicity of 
configuration. Snowball's stemmers are useful as is, but ispell dictionary 
requires some configuration action before using.


Yeah.  I had been wondering about moving the dict_initoption over to the
configuration entry --- is that sane at all?  It would mean that
It should be. Instances of ispell (and synonym, thesaurus) dictionaries are 
different only in dict_initoption part, so it will be only one entry in 
pg_ts_dict_template and several ones in pg_ts_dict.


ALTER FULLTEXT CONFIGURATION cfgname ADD MAPPING FOR tokentypename[, ...] WITH 
dictname1[, ...];
ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING FOR tokentypename[, ...] WITH 
dictname1[, ...];

sets dictionary's list for token's type(s)


ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING [FOR tokentypename[, ...]]
  REPLACE olddictname TO newdictname;


Replace dictionary to another dictionary in dictionary's list for token's 
type(s). This command is very useful for tweaking configuration and for creating 
new configuration which differs from already existing one only by pair of 
dictionary.



ALTER FULLTEXT CONFIGURATION cfgname DROP MAPPING [IF EXISTS]  FOR 
tokentypename;
Is it looking reasonable?


Er ... what's the difference between the second and third forms?


That changes are doable for several days. I'd like to make changes together with 
 replacing of FULLTEXT keyword to TEXT SEARCH as you suggested.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Gregory Stark

"Teodor Sigaev" <[EMAIL PROTECTED]> writes:

>>> But they still need some more thought about permissions, because AFAICS
>>> mucking with a configuration can invalidate some other user's data.
>>
>> ouch. could mucking with a configuration create a corrupt index?
>
> Depending on what you mean 'corrupted'. It will not corrupted as non-readable
> or cause backend crash. But usage of such tsvector column could be limited -
> not all words will be searchable.

Am I correct to think of this like changing collations leaving your btree
index "corrupt"? In that case it probably won't cause any backend crash either
but you will get incorrect results. For example, returning different results
depending on whether the index or a full table scan is used.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
> The reason to save SQLish interface to dictionaries is a simplicity of 
> configuration. Snowball's stemmers are useful as is, but ispell dictionary 
> requires some configuration action before using.

Yeah.  I had been wondering about moving the dict_initoption over to the
configuration entry --- is that sane at all?  It would mean that
dict_init functions would have to guard themselves against invalid
options, but they probably ought to do that anyway.  If we did that,
I think we could have a fixed set of dictionaries without too much
problem, and focus on just configurations as being user-alterable.

>>> Next, it took me a while to understand how Mapping objects fit into
>>> the scheme at all, and now that (I think) I understand, I'm wondering
>>> why treat them as an independent concept.

> ALTER FULLTEXT CONFIGURATION cfgname ADD MAPPING FOR tokentypename[, ...] 
> WITH 
> dictname1[, ...];
> ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING FOR tokentypename[, ...] 
> WITH 
> dictname1[, ...];
> ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING [FOR tokentypename[, ...]]
>   REPLACE olddictname TO newdictname;
> ALTER FULLTEXT CONFIGURATION cfgname DROP MAPPING [IF EXISTS]  FOR 
> tokentypename;
> Is it looking reasonable?

Er ... what's the difference between the second and third forms?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Teodor Sigaev


But they still need some more thought about permissions, because AFAICS
mucking with a configuration can invalidate some other user's data.


ouch. could mucking with a configuration create a corrupt index?


Depending on what you mean 'corrupted'. It will not corrupted as non-readable 
or cause backend crash. But usage of such tsvector column could be limited - not 
all words will be searchable.



This sounds sort of analogous to the issues collation bring up.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Teodor Sigaev


can we hard-code into the backend, and just update for every major
release like we do for encodings?


Sorry, no one of them :(. We know projects which introduce new parser, new 
dictionary. Config and map are changes very often.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Teodor Sigaev




Well assuming we have any SQL-level support at all I think we should
strive to avoid these functions taking INTERNAL arguments.

That gets us down to just needing to worry about whether we like the
SQL representation of configurations.  Which is still a nontrivial
issue, but at least it seems manageable on a timescale that's
reasonable for 8.3.


Possible solution is to split pg_ts_dict (I'll talk about dictionaries, but the 
same way is possible to parsers, but now it's looked as overdesign) to two table 
like pg_am and pg_opclass.
First table, pg_ts_dict_template (I don't know the exact name yet) which 
contains columns: oid, template_name, dict_init, dict_lexize and second:

pg_ts_dict with colimns: oid, template_oid, owner, schema, dict_initoption.

CREATE/ALTER/DROP DICTIONARY affects only second table and access to first one 
is only select/update/insert/delete similar to pg_am.


IMHO, this interface solves problems with security and dumping.

The reason to save SQLish interface to dictionaries is a simplicity of 
configuration. Snowball's stemmers are useful as is, but ispell dictionary 
requires some configuration action before using.


Next, INTERNAL arguments parser's and dictionary's APIs are used because if 
performance reason. During creation of tsvector from text, there are a lot of 
calls of parsers and dictionaries. And internal structures of they states may be 
rather complex and cannot be matched in any pgsql's type, even in flat memory 
structure.



> Next, it took me a while to understand how Mapping objects fit into
> the scheme at all, and now that (I think) I understand, I'm wondering
> why treat them as an independent concept.
ALTER FULLTEXT CONFIGURATION cfgname ADD MAPPING FOR tokentypename[, ...] WITH 
dictname1[, ...];
ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING FOR tokentypename[, ...] WITH 
dictname1[, ...];

ALTER FULLTEXT CONFIGURATION cfgname ALTER MAPPING [FOR tokentypename[, ...]]
 REPLACE olddictname TO newdictname;
ALTER FULLTEXT CONFIGURATION cfgname DROP MAPPING [IF EXISTS]  FOR 
tokentypename;
Is it looking reasonable?

> TSConfiguration objects are a different story, since they have only
> type-safe dependencies on parsers, locales, and dictionaries.  But they
> still need some more thought about permissions, because AFAICS mucking
> with a configuration can invalidate some other user's data.Do we want
> to allow runtime changes in a configuration that existing tsvector
> columns already depend on?  How can we even recognize whether there is
> stored data that will be affected by a configuration change?  (AFAICS

Very complex task: experienced users could use several configuration 
simultaneously. For example: indexing use configuration which doesn't reject 
stop-words, but for default searching use configuration which rejects 
stop-words. BTW, the same effects may be produced by dictionary's change.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Bruce Momjian


I an attempt to communicate what full text search does, and what
features we are thinking of adding/removing, I have put up the
introduction in HTML:

http://momjian.us/expire/fulltext/HTML/fulltext-intro.html

The links to the other sections don't work yet.

---

Tom Lane wrote:
> "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> > Well my argument has always been the "core" feature argument. Perhaps I 
> > am missing some info here, but when I read what you wrote, I read that 
> > Tsearch will now be "harder" to work with. Not easier. :(
> 
> Then you misread it.  What I was proposing was essentially that there
> won't be any need for pg_dump support because everything's built-in
> (at least as far as parsers/dictionaries go).
> 
> As for the permissions issues, that's just formalizing something that's
> true today with the contrib module: if you change a configuration, it's
> *your* problem whether that invalidates any table entries, the system
> won't take care of it for you.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Joshua D. Drake


Tom Lane wrote:

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
Well my argument has always been the "core" feature argument. Perhaps I 
am missing some info here, but when I read what you wrote, I read that 
Tsearch will now be "harder" to work with. Not easier. :(


Then you misread it.  What I was proposing was essentially that there
won't be any need for pg_dump support because everything's built-in
(at least as far as parsers/dictionaries go).

As for the permissions issues, that's just formalizing something that's
true today with the contrib module: if you change a configuration, it's
*your* problem whether that invalidates any table entries, the system
won't take care of it for you.


O.k. :)

Joshua D. Drake




regards, tom lane




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Tom Lane

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> Well my argument has always been the "core" feature argument. Perhaps I 
> am missing some info here, but when I read what you wrote, I read that 
> Tsearch will now be "harder" to work with. Not easier. :(

Then you misread it.  What I was proposing was essentially that there
won't be any need for pg_dump support because everything's built-in
(at least as far as parsers/dictionaries go).

As for the permissions issues, that's just formalizing something that's
true today with the contrib module: if you change a configuration, it's
*your* problem whether that invalidates any table entries, the system
won't take care of it for you.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Joshua D. Drake


Tom Lane wrote:

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
O.k. I am not trying to throw any cold water on this, but with the 
limitations we are suggesting, does the patch gain us anything over just 
leaving tsearch in contrib?


Well, if you want to take a hard-nosed approach, no form of the patch
would gain us anything over leaving it in contrib, at least not from a
functionality standpoint.  The argument in favor has always been about
perception, really: if it's a "core" feature not an "add-on", then
people will take it more seriously.  And there's a rather weak
ease-of-use argument that you don't have to install a contrib module.
(The idea that it's targeted at people who can't or won't install a
contrib module is another reason why I think we can skip user-defined
parsers and dictionaries ...)


Well my argument has always been the "core" feature argument. Perhaps I 
am missing some info here, but when I read what you wrote, I read that 
Tsearch will now be "harder" to work with. Not easier. :(


Removal of pg_dump support kind of hurts us, as we already have problems 
with pg_dump support and tsearch2. Adding work to have to re-assign 
permissions to vector columns because we make changes...


I would grant that having the SQL extensions would certainly be nice.

Anyway, I am not trying to stop the progress. I would like to see 
Tsearch2 in core but I also don't want to add complexity. You did say here:


And we reserve the right to whack around the API for the functions 
referenced by the catalog entries.


Which kind of gets us back to upgrade problems doesn't it?


Sincerely,

Joshua D. Drake




regards, tom lane




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Bruce Momjian

Joshua D. Drake wrote:
> Tom Lane wrote:
> > Gregory Stark <[EMAIL PROTECTED]> writes:
> >> Well assuming we have any SQL-level support at all I think we should
> >> strive to avoid these functions taking INTERNAL arguments.
> 
> > That gets us down to just needing to worry about whether we like the
> > SQL representation of configurations.  Which is still a nontrivial
> > issue, but at least it seems manageable on a timescale that's
> > reasonable for 8.3.
> 
> O.k. I am not trying to throw any cold water on this, but with the 
> limitations we are suggesting, does the patch gain us anything over just 
> leaving tsearch in contrib?

The idea is that common operations like searching and mapping dictionaries
will be easier to do, but the more complex stuff will require catalog
manipulations.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Tom Lane

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> O.k. I am not trying to throw any cold water on this, but with the 
> limitations we are suggesting, does the patch gain us anything over just 
> leaving tsearch in contrib?

Well, if you want to take a hard-nosed approach, no form of the patch
would gain us anything over leaving it in contrib, at least not from a
functionality standpoint.  The argument in favor has always been about
perception, really: if it's a "core" feature not an "add-on", then
people will take it more seriously.  And there's a rather weak
ease-of-use argument that you don't have to install a contrib module.
(The idea that it's targeted at people who can't or won't install a
contrib module is another reason why I think we can skip user-defined
parsers and dictionaries ...)

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Joshua D. Drake


Tom Lane wrote:

Gregory Stark <[EMAIL PROTECTED]> writes:

Well assuming we have any SQL-level support at all I think we should
strive to avoid these functions taking INTERNAL arguments.



That gets us down to just needing to worry about whether we like the
SQL representation of configurations.  Which is still a nontrivial
issue, but at least it seems manageable on a timescale that's
reasonable for 8.3.


O.k. I am not trying to throw any cold water on this, but with the 
limitations we are suggesting, does the patch gain us anything over just 
leaving tsearch in contrib?


Sincerely,

Joshua D. Drake




regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> Well assuming we have any SQL-level support at all I think we should
> strive to avoid these functions taking INTERNAL arguments.

I don't think I want to get into redesigning the patch at that level of
detail, at least not for 8.3.  It seems like something possibly worth
thinking about for 8.4 though.  The idea that we might want to change
the API for parser and dictionary support routines seems like another
good argument for not exposing user-level facilities for creating them
right now.

What I'm realizing as I look at it is that this is an enormous patch,
and it's not as close to being ready to apply as I had supposed.  If we
don't scale it back, then either it doesn't get into 8.3 or 8.3 gets
delayed a whole lot longer.  So we need to look at what we can trim or
postpone for a later release.

So all these factors seem to me to point in the same direction: at least
for the time being, we should treat TS parsers and dictionaries the way
we treat index access methods.  There'll be a catalog, which the
adventurous can insert new entries into, but no SQL-level support for
doing it, hence no pg_dump support.  And we reserve the right to whack
around the API for the functions referenced by the catalog entries.

That still leaves us with the question of SQL-level support for TS
configurations, which are built on top of parsers and dictionaries.
We definitely need some level of capability for that.  For the
permissions and dependencies issues, the minimalistic approach is to
say "only superusers can create or alter TS configurations, and if you
alter one it's your responsibility to fix up any dependent tsvector
columns or indexes."  We currently handle index operator classes that
way, so it's not completely ridiculous.  Sure it would be nice to do
better, but maybe that's a post-8.3 project.

That gets us down to just needing to worry about whether we like the
SQL representation of configurations.  Which is still a nontrivial
issue, but at least it seems manageable on a timescale that's
reasonable for 8.3.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-14 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> You could remove the immediate source of this objection if you could
> redesign the APIs for the underlying support functions to be more
> type-safe.  I'm not sure how feasible or useful that would be though.
> The bottom-line question here is whether developing a new parser or
> dictionary implementation is really something that ordinary users might
> do.  If not, then having all this SQL-level support for setting up
> catalog entries seems like wasted effort.

Well assuming we have any SQL-level support at all I think we should strive to
avoid these functions taking INTERNAL arguments.

I feel like having them in the GIST interface has been a major impediment to
more people defining GIST indexes for more datatypes. Because you need to
write C code dealing with internal data structures to handle page splits the
bar to implement GIST index operator classes is too high for most users. So
instead of a simple SQL command we end up with contrib modules implementing
each type of GIST index.

A while back I proposed that we implement the same page-split algorithm that
most (or all?) of those contrib modules copy-paste between them as a default
implementation. That would allow defining a GIST index in terms of a handful
of operators like "distance" which could be defined with a type-safe api. This
would be less flexible than the existing generic solution but it would allow
defining new GIST indexes without writing C code.

> But they still need some more thought about permissions, because AFAICS
> mucking with a configuration can invalidate some other user's data.

ouch. could mucking with a configuration create a corrupt index?

This sounds sort of analogous to the issues collation bring up.

> It seems to me that the single easiest and most useful part of a
> configuration to change is the stop word list; but this setup guarantees
> that no one but a DBA can do that, and what's more that pg_dump won't record
> your changes.

I would second that, in the past I was expected to provide an administrative
web interface to adjust the list of stop words.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-13 Thread Bruce Momjian


You bring up a very good point.  There are fifteen new commands being
added for full text indexing:

alter-fulltext-config.sgml  alter-fulltext-owner.sgml
create-fulltext-dict.sgml   drop-fulltext-dict.sgml
alter-fulltext-dict.sgmlalter-fulltext-parser.sgml
create-fulltext-map.sgmldrop-fulltext-map.sgml
alter-fulltext-dictset.sgml comment-fulltext.sgml
create-fulltext-parser.sgml drop-fulltext-parser.sgml
alter-fulltext-map.sgml create-fulltext-config.sgml
drop-fulltext-config.sgml

I think encoding is a good example to follow.  We allow users to create
new conversions (CREATE CONVERSION), but we don't allow them to create
new encodings --- those are hard-coded in the backend.  Which of the
following full text objects:

config
dict
map
dictset
parser

can we hard-code into the backend, and just update for every major
release like we do for encodings?

---

Tom Lane wrote:
> I've been looking at the tsearch patch a bit, and I think there needs to
> be more thought given to the permissions required to mess around with
> tsearch configuration objects.
> 
> The TSParser objects reference functions declared to take and return
> INTERNAL arguments.  This means that the underlying functions must be
> coded in C and can only be installed by a superuser, which in turn means
> that there is no scenario where it is really useful for a non-superuser
> to execute CREATE PARSER.  What's more, allowing a non-superuser to do
> it creates security holes: if you can find an unrelated function taking
> the right number of INTERNAL arguments, you can install it as a TSParser
> support function.  That trivially allows crashing the backend, and it
> could allow worse security holes than that.
> 
> TSDictionary objects have exactly the same issues since they also depend
> on functions with INTERNAL arguments.
> 
> At minimum this means that we should restrict CREATE/DROP/ALTER commands
> for these objects to superusers.  (Which in turn means there's no point
> in tracking an ownership column for them; every superuser is the same as
> every other one, permissions-wise.)  I'm wondering though whether this
> doesn't mean that we don't need manipulation commands for them at all.
> Is it likely that people will be adding parser or dictionary support to
> an installation on the fly?  Maybe we can just create 'em all at initdb
> time and be done, similar to the way index access methods are treated.
> This doesn't say that it's not possible to add more; you can add an
> index access method on the fly too, if you want, by inserting stuff into
> pg_am by hand.  I'm just wondering whether all that SQL-statement
> support and pg_dump support for custom parsers and dictionaries is
> really worth the code space and future maintenance effort it'll eat up.
> 
> You could remove the immediate source of this objection if you could
> redesign the APIs for the underlying support functions to be more
> type-safe.  I'm not sure how feasible or useful that would be though.
> The bottom-line question here is whether developing a new parser or
> dictionary implementation is really something that ordinary users might
> do.  If not, then having all this SQL-level support for setting up
> catalog entries seems like wasted effort.
> 
> TSConfiguration objects are a different story, since they have only
> type-safe dependencies on parsers, locales, and dictionaries.  But they
> still need some more thought about permissions, because AFAICS mucking
> with a configuration can invalidate some other user's data.  Do we want
> to allow runtime changes in a configuration that existing tsvector
> columns already depend on?  How can we even recognize whether there is
> stored data that will be affected by a configuration change?  (AFAICS
> the patch doesn't put anything into the pg_depend machinery that could
> deal with this.)  And who gets to decide which configuration is default,
> anyway?
> 
> I'm also a bit disturbed that you've made searches for TSConfiguration
> objects be search-path-sensitive.  That is likely to create problems
> similar to those we've recently recognized for function lookup, eg,
> an insertion into a full-text-indexed column gets treated differently
> depending on the caller's search path.  It's particularly bad to have
> the default object be search-path-dependent.  We learned the hard way
> not to do that for default index operator classes; let's not make the
> same mistake again for tsearch configurations.
> 
> Next, it took me a while to understand how Mapping objects fit into
> the scheme at all, and now that (I think) I understand, I'm wondering
> why treat them as an independent concept.  Seems like the mapping from
> token types to dictionaries is really a property of a configuration,
> and we ought to be handling it t

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

Re: [HACKERS] tsearch_core patch: permissions and security issues

32 matches

Site Navigation

Mail list logo

Footer information