Re: [HACKERS] text search vs schemas
On 8/18/07, Tom Lane <[EMAIL PROTECTED]> wrote: > As my copy of the patch currently stands, there are two built-in trigger > functions, tsvector_update_trigger and tsvector_update_trigger_column. > The first expects trigger arguments > name of tsvector col, name of tsconfig to use, name(s) of text col(s) > and the second > name of tsvector col, name of tsconfig col, name(s) of text col(s) > that is, the tsconfig name is stored in a text column. We could fix > the second form by changing it to expect the tsconfig column to be of > type regconfig. The first form is a bit more problematic. I can see > two approaches: either specify both the schema and the tsconfig name, > as two separate arguments, or keep it one argument but insist that > the content of the argument be an explicitly-qualified name. The > second way seems a bit klugier when considered in isolation, but I think > I like it better, because there would be a natural migration path to > treating the argument as being of type regconfig when and if we get > around to having real types for trigger arguments. (Which I think is > a good idea, btw, just not for 8.3.) I like the second approach too. It may be slightly awkward for now, but IMHO it does the right thing. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] text search vs schemas
"Trevor Talbot" <[EMAIL PROTECTED]> writes: > Currently you can schema-qualify objects where you need to, to avoid > issues with search_path subversion. If it's impossible to > schema-qualify tsearch configs now, when schema support is later added > it suddenly exposes everyone to risks that didn't exist before, and > requires manual changes to fix. True. I thought of another counter-argument as well: we use schemas not only so that pg_dump can tell user from system objects, but for permissions purposes. If TS configs don't live in schemas then there is no structure for controlling who may create one. A fairly standard requirement is to be able to prevent someone from creating any non-temp objects (or not even those), and right now you do it by revoking create rights in the public schema, and/or permissions to create new schemas. We'd need some other kluge for non-schema-ified TS objects. So let's go back to the "regconfig" idea. If we invent such a type, and make the 2-parameter forms of to_tsvector et al take that instead of just text for the config name, then I think we have fixed things for the expression index case. (There are also other benefits, eg a command applying one of these functions on many rows wouldn't have to do a name-based config lookup each time.) The problem is with the trigger approach. As my copy of the patch currently stands, there are two built-in trigger functions, tsvector_update_trigger and tsvector_update_trigger_column. The first expects trigger arguments name of tsvector col, name of tsconfig to use, name(s) of text col(s) and the second name of tsvector col, name of tsconfig col, name(s) of text col(s) that is, the tsconfig name is stored in a text column. We could fix the second form by changing it to expect the tsconfig column to be of type regconfig. The first form is a bit more problematic. I can see two approaches: either specify both the schema and the tsconfig name, as two separate arguments, or keep it one argument but insist that the content of the argument be an explicitly-qualified name. The second way seems a bit klugier when considered in isolation, but I think I like it better, because there would be a natural migration path to treating the argument as being of type regconfig when and if we get around to having real types for trigger arguments. (Which I think is a good idea, btw, just not for 8.3.) This looks fairly do-able --- a regconfig data type is no problem, should be able to whip it up with an hour or two of cutting and pasting code from one of the existing OID alias types. And the other changes seem minor. (Note: offhand I don't see a need for "reg" types for parsers, dictionaries, or templates, since none of those are referenced directly in queries.) Thoughts? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] text search vs schemas
On 8/17/07, Tom Lane <[EMAIL PROTECTED]> wrote: > At the moment I feel our thoughts have to revolve not around adding > complexity to tsearch, but taking stuff out. If we ship it with no > schema support for TS objects in 8.3, we can always add that later, > if there proves to be real demand for that (and I note that the contrib > version has gotten along fine without it). But we cannot go in the > other direction. Currently you can schema-qualify objects where you need to, to avoid issues with search_path subversion. If it's impossible to schema-qualify tsearch configs now, when schema support is later added it suddenly exposes everyone to risks that didn't exist before, and requires manual changes to fix. I'm for removing complexity, but per-schema support seems like a design decision that needs to be made up front, whichever way it goes. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] text search vs schemas
Tom Lane <[EMAIL PROTECTED]> wrote: > Gregory Stark <[EMAIL PROTECTED]> writes: > > "Tom Lane" <[EMAIL PROTECTED]> writes: > >> Uh, no. Function names for example are subject to search-path > >> confusion. > > > Wait, are they? They are in PL languages but only because most > > languages store their source code as text just as is happening here. > > Hmmm ... if you look at the current solution for default expressions > for serial columns, ie nextval() on a regclass constant, it's pretty > schema-safe. So we could imagine inventing a regconfig datatype that > is the same sort of wrapper-over-OID. Then make the 2-parameter form > of to_tsvector take that type instead of text. Right, that's what I was getting at. I was confused about the trigger issues, sorry about that. > That seems like it'd fix the problem for expression indexes on > to_tsvector calls, but I don't see how it fixes the problem for > triggers. We don't have any clear path for making trigger arguments > be anything but a list of strings. Okay, trying to catch up here... For the simple case of handling a single column, we've got expression indexes as above. For handling two or more columns, expression indexes don't work that well, so that leaves triggers. There happens to be one utility function provided for that purpose, tsvector_update_trigger(). This trigger function needs its configuration as a (string) argument, and is also the only one with this problem. Is that correct? If so, then it seems the question should really be: is this situation of wanting to index multiple columns together, without even using different ranks for them, so common that this trigger function belongs in core? Maybe it shouldn't be there at all; instead have the docs walk through creating a specialized trigger function. It doesn't get rid of the schema-qualified names issue, but when you're writing PL functions you need to deal with that anyway, tsearch or not. And there's still contrib of course. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] text search vs schemas
Martijn van Oosterhout <[EMAIL PROTECTED]> writes: > On Fri, Aug 17, 2007 at 10:42:09AM -0400, Tom Lane wrote: >> If you insist on a solution that involves attaching type information >> to trigger arguments, then we can forget about getting tsearch into 8.3. > Hmm, maybe I didn't explain clearly enough. I meant that if the > argument is a regclass for example, to pass it in the TG_ARGV list as > the OID in *string form*. Are you expecting the *user* to deal with that? If not, how is the system supposed to know which trigger arguments to do it to? What about dump and reload? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] text search vs schemas
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Am Freitag, 17. August 2007 05:15 schrieb Tom Lane: >> Actually ... I'm suddenly not happy about the choice to put text search >> configurations etc. into schemas at all. > But that isn't different from any other part of the system. A proper fix > would be a mechanism to alleviate the confusion in all places, not simply to > remove features that cause such confusion in some places (but not all, > thereby causing inconsistencies). Well, we are already inconsistent about this. PL languages and index access methods, for example, don't have schema-ified names. >> It does not seem likely that a typical installation will have so many >> text search configs that subdividing them into schemas will really be >> useful. > But schemas are not only used to organize objects because there are so many. > Altering the search path to get at a different implementation without having > to alter the names in every single place is also a useful application. This is isomorphic to the argument about whether default_text_search_config is a good idea; indeed, I think that default_text_search_config pretty much solves this problem already, for the places where it's sane to have the configuration-in-use depend upon context. The problem with using schemas for TS configs is that we can't prevent the search result from changing in contexts where it mustn't change. At least, not short of requiring fully-qualified config names in those places, which doesn't sound like an advance in usability. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] text search vs schemas
On Fri, Aug 17, 2007 at 10:42:09AM -0400, Tom Lane wrote: > Martijn van Oosterhout <[EMAIL PROTECTED]> writes: > > I'm unsure how it works now, but it seems reasonable that when a > > regclass/regtype/regetc is passed to a trigger, pass it in OID form. > > If you insist on a solution that involves attaching type information > to trigger arguments, then we can forget about getting tsearch into 8.3. > That's a nontrivial amount of new design and code that hasn't even been > on the radar screen before. Hmm, maybe I didn't explain clearly enough. I meant that if the argument is a regclass for example, to pass it in the TG_ARGV list as the OID in *string form*. That way trigger arguments stay a list of strings, yet the whole thing is schema safe because when trigger body casts the string back to a regclass, it gets exactly what was passed. Hope this makes more sense, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] text search vs schemas
Martijn van Oosterhout <[EMAIL PROTECTED]> writes: > On Fri, Aug 17, 2007 at 01:16:22AM -0400, Tom Lane wrote: >> That seems like it'd fix the problem for expression indexes on >> to_tsvector calls, but I don't see how it fixes the problem for >> triggers. We don't have any clear path for making trigger arguments >> be anything but a list of strings. > I'm unsure how it works now, but it seems reasonable that when a > regclass/regtype/regetc is passed to a trigger, pass it in OID form. If you insist on a solution that involves attaching type information to trigger arguments, then we can forget about getting tsearch into 8.3. That's a nontrivial amount of new design and code that hasn't even been on the radar screen before. At the moment I feel our thoughts have to revolve not around adding complexity to tsearch, but taking stuff out. If we ship it with no schema support for TS objects in 8.3, we can always add that later, if there proves to be real demand for that (and I note that the contrib version has gotten along fine without it). But we cannot go in the other direction. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] text search vs schemas
Am Freitag, 17. August 2007 05:15 schrieb Tom Lane: > Actually ... I'm suddenly not happy about the choice to put text search > configurations etc. into schemas at all. We've been sitting here and > assuming that to_tsvector('english', my_text_col) has a well defined > meaning --- but as the patch stands, *it does not*. The interpretation > of the config name could easily change depending on search_path. But that isn't different from any other part of the system. A proper fix would be a mechanism to alleviate the confusion in all places, not simply to remove features that cause such confusion in some places (but not all, thereby causing inconsistencies). > It does not seem likely that a typical installation will have so many > text search configs that subdividing them into schemas will really be > useful. But schemas are not only used to organize objects because there are so many. Altering the search path to get at a different implementation without having to alter the names in every single place is also a useful application. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] text search vs schemas
On Fri, Aug 17, 2007 at 01:16:22AM -0400, Tom Lane wrote: > That seems like it'd fix the problem for expression indexes on > to_tsvector calls, but I don't see how it fixes the problem for > triggers. We don't have any clear path for making trigger arguments > be anything but a list of strings. I'm unsure how it works now, but it seems reasonable that when a regclass/regtype/regetc is passed to a trigger, pass it in OID form. This can be cast back safely inside the trigger itself. Seems a little hacky though... Having it as a type would also help with tracking dependancies. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] text search vs schemas
On Thu, 16 Aug 2007, Tom Lane wrote: I wrote: We can't put tsvector_update_trigger() into core in anything like its current form. As is, it will take an unqualified function name, look it up, and call it. The prospects for subversion by search path manipulation are obvious, and even if you aren't concerned about malicious attacks, the effects of the trigger are context-dependent Actually ... I'm suddenly not happy about the choice to put text search configurations etc. into schemas at all. We've been sitting here and assuming that to_tsvector('english', my_text_col) has a well defined meaning --- but as the patch stands, *it does not*. The interpretation of the config name could easily change depending on search_path. what's wrong with schema-qualified name ? It does not seem likely that a typical installation will have so many text search configs that subdividing them into schemas will really be useful. If I recall correctly, Teodor did that on my recommendation it's useful. that it'd be the cleanest way to distinguish built-in from non-built-in objects for dump purposes. That is, pg_dump would ignore TS objects I think you're wrong here. Schema often used to save connections and it's natural to have different searches in different schemes. that are in pg_catalog and dump everything else. But I'm having severe second thoughts about that. What seems the most attractive alternative at the moment is to have a flat namespace for TS objects (no schemas) and introduce something like a "bool is_built_in" column for pg_dump to consult in deciding whether to dump 'em. Comments? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] text search vs schemas
Gregory Stark <[EMAIL PROTECTED]> writes: > "Tom Lane" <[EMAIL PROTECTED]> writes: >> Uh, no. Function names for example are subject to search-path >> confusion. > Wait, are they? They are in PL languages but only because most > languages store their source code as text just as is happening here. Hmmm ... if you look at the current solution for default expressions for serial columns, ie nextval() on a regclass constant, it's pretty schema-safe. So we could imagine inventing a regconfig datatype that is the same sort of wrapper-over-OID. Then make the 2-parameter form of to_tsvector take that type instead of text. That seems like it'd fix the problem for expression indexes on to_tsvector calls, but I don't see how it fixes the problem for triggers. We don't have any clear path for making trigger arguments be anything but a list of strings. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] text search vs schemas
"Tom Lane" <[EMAIL PROTECTED]> writes: > "Trevor Talbot" <[EMAIL PROTECTED]> writes: >> Wouldn't treating them as actual objects remove this whole issue? > > Uh, no. Function names for example are subject to search-path > confusion. Wait, are they? They are in PL languages but only because most languages store their source code as text just as is happening here. But they're not in views or other native SQL uses of functions because they store the reference to the specific function's OID. In dumps they output the schema along with the name. As in: postgres=# \d foo.testv View "foo.testv" Column | Type | Modifiers +-+--- i | integer | a | integer | View definition: SELECT test.i, foo.a(test.i) AS a FROM foo.test; -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] text search vs schemas
"Trevor Talbot" <[EMAIL PROTECTED]> writes: > Wouldn't treating them as actual objects remove this whole issue? Uh, no. Function names for example are subject to search-path confusion. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] text search vs schemas
On 8/16/07, Tom Lane <[EMAIL PROTECTED]> wrote: > Actually ... I'm suddenly not happy about the choice to put text search > configurations etc. into schemas at all. We've been sitting here and > assuming that to_tsvector('english', my_text_col) has a well defined > meaning --- but as the patch stands, *it does not*. The interpretation > of the config name could easily change depending on search_path. > > It does not seem likely that a typical installation will have so many > text search configs that subdividing them into schemas will really be > useful. If I recall correctly, Teodor did that on my recommendation > that it'd be the cleanest way to distinguish built-in from non-built-in > objects for dump purposes. That is, pg_dump would ignore TS objects > that are in pg_catalog and dump everything else. But I'm having severe > second thoughts about that. > > What seems the most attractive alternative at the moment is to have a > flat namespace for TS objects (no schemas) and introduce something like > a "bool is_built_in" column for pg_dump to consult in deciding whether > to dump 'em. That assumes a database-oriented search config, instead of a case of multiple users confined to invidual schemas doing their own thing. Is the latter possible now, and do you want to remove that ability? Something else that occurs to me though: the problem seems to be that parts of tsearch take object names as strings. I thought one advantage of having it in core is that they are now real database objects, with owners etc. How many other database objects are passed around as string labels? Wouldn't treating them as actual objects remove this whole issue? What happens now if you try to drop a configuration that's still used in a trigger somewhere? (I'm new to both tsearch2 and this list, so please excuse any mistakes. Mostly keeping an eye on this for future use in my own projects.) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] text search vs schemas
I wrote: > We can't put tsvector_update_trigger() into core in anything like its > current form. As is, it will take an unqualified function name, look > it up, and call it. The prospects for subversion by search path > manipulation are obvious, and even if you aren't concerned about > malicious attacks, the effects of the trigger are context-dependent Actually ... I'm suddenly not happy about the choice to put text search configurations etc. into schemas at all. We've been sitting here and assuming that to_tsvector('english', my_text_col) has a well defined meaning --- but as the patch stands, *it does not*. The interpretation of the config name could easily change depending on search_path. It does not seem likely that a typical installation will have so many text search configs that subdividing them into schemas will really be useful. If I recall correctly, Teodor did that on my recommendation that it'd be the cleanest way to distinguish built-in from non-built-in objects for dump purposes. That is, pg_dump would ignore TS objects that are in pg_catalog and dump everything else. But I'm having severe second thoughts about that. What seems the most attractive alternative at the moment is to have a flat namespace for TS objects (no schemas) and introduce something like a "bool is_built_in" column for pg_dump to consult in deciding whether to dump 'em. Comments? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly