subject:"\[HACKERS\] tsearch2 in PostgreSQL 8.3\?"

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-18 Thread Trevor Talbot

Digging through the simple vs advanced user discussion, I don't think
expression indexes are really the right idea.  It seems a bit fragile,
you need a certain amount of knowledge about the optimizer to figure
out if your queries can even use the index, and it's just plain ugly.
It also seems like the choice is between either simple one-column
stuff, or triggers.

There are already several CREATE FULLTEXT items, so what if you take
it a bit farther:

CREATE TABLE posts (title text, body text);
CREATE FULLTEXT INDEX posts_fti ON posts (title WEIGHT A, body) CONFIG
english USING GIN;

..with searches looking something like..

... WHERE plainto_tsquery('...') @@ posts_fti ...

Okay, maybe that's not quite the right search abstraction (is it an
index or a column?), but you get the idea.

The point is that it would be fairly straightforward to do the common
things, and it works for people whose needs can be met with a full
text index rather than a multidimensional search for lexemes (or
whatever tsvector + index really is).  The configuration is clearly
defined and stable, but queries can still use a GUC default.
Meanwhile all the current functions, types and operators are there for
use with triggers etc for advanced setups.

There's obviously a lot of detail missing, but if something like this
is the goal, then there doesn't need to be as much concern about
simple interfaces for 8.3, as long as the framework is ok.  In
particular, expression indexes don't necessarily need special work
now.

It's a thought.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?f

2007-08-18 Thread Bruce Momjian

Josh Berkus wrote:
 Bruce,
 
  Oh, so you want the config inside each tsvector value. ?Interesting
  idea.
 
 Yeah, hasn't anyone suggested this before?  It seems like the obvious 
 solution.  A TSvector constructed with en_US is NOT the same as a vector 
 constructed with fr_FR and it's silly to pretend that they are comparable.  
 Sticking the config name at the beginning of the field would allow for the 
 use of single-parameter functions, and default_config would only be used 
 for SELECT queries.  Backup/restore issues should go away completely ...
 
 EXCEPT this would introduce issues if the config is changed or deleted 
 after being used.  However, I'd imagine that we have those anyway -- 
 certainly we would at restore time.

The other problem with encoding the configuration name in the tsvector
value is that pg_dump would somehow have to embed the configuration name
in the tsvector output, and I can't see that working for an expression
index because index contents aren't dumped.  Having a separate
configuration column is clearly simpler.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-18 Thread Bruce Momjian

Trevor Talbot wrote:
 Digging through the simple vs advanced user discussion, I don't think
 expression indexes are really the right idea.  It seems a bit fragile,
 you need a certain amount of knowledge about the optimizer to figure
 out if your queries can even use the index, and it's just plain ugly.
 It also seems like the choice is between either simple one-column
 stuff, or triggers.
 
 There are already several CREATE FULLTEXT items, so what if you take
 it a bit farther:
 
 CREATE TABLE posts (title text, body text);
 CREATE FULLTEXT INDEX posts_fti ON posts (title WEIGHT A, body) CONFIG
 english USING GIN;
 
 ..with searches looking something like..
 
 ... WHERE plainto_tsquery('...') @@ posts_fti ...
 
 Okay, maybe that's not quite the right search abstraction (is it an
 index or a column?), but you get the idea.
 
 The point is that it would be fairly straightforward to do the common
 things, and it works for people whose needs can be met with a full
 text index rather than a multidimensional search for lexemes (or
 whatever tsvector + index really is).  The configuration is clearly
 defined and stable, but queries can still use a GUC default.
 Meanwhile all the current functions, types and operators are there for
 use with triggers etc for advanced setups.
 
 There's obviously a lot of detail missing, but if something like this
 is the goal, then there doesn't need to be as much concern about
 simple interfaces for 8.3, as long as the framework is ok.  In
 particular, expression indexes don't necessarily need special work
 now.

Remember an expression index can be a user-created function so you can
embed whatever you want in your function and just index it's output,
just like you would with a trigger creating a separate column.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-18 Thread Trevor Talbot

On 8/18/07, Bruce Momjian [EMAIL PROTECTED] wrote:

 Remember an expression index can be a user-created function so you can
 embed whatever you want in your function and just index it's output,
 just like you would with a trigger creating a separate column.

Well, you could create a function that returns a tsvector, but how do
you get that to work with queries?  I've been under the impression the
expressions need to match (in the normal case, be the same function
with the same arguments) in order to use the index.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-18 Thread Tom Lane

Oleg Bartunov [EMAIL PROTECTED] writes:
 Tom and Bruce, what version of patch you're using ?
 Bruce complained about using OID in arguments of functions, but 
 AFAIR, it was removed in 0.58 version of patch.

I'm working from the 0.58 version --- that was the latest last I looked.

There are still OID-based versions of some of the functions, but if
we go with the regconfig idea as I'm now thinking we should, those
could be taken out --- they'd have no performance advantage over
regconfig.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-18 Thread Bruce Momjian

Trevor Talbot wrote:
 On 8/18/07, Bruce Momjian [EMAIL PROTECTED] wrote:
 
  Remember an expression index can be a user-created function so you can
  embed whatever you want in your function and just index it's output,
  just like you would with a trigger creating a separate column.
 
 Well, you could create a function that returns a tsvector, but how do
 you get that to work with queries?  I've been under the impression the
 expressions need to match (in the normal case, be the same function
 with the same arguments) in order to use the index.

Yes, so you create a function called complex_ts and create the index:

CREATE INDEX ii on x USING GIT(complex_ts(col1, col2))

and in your WHERE clause you do:

WHERE 'a  b' @@ complex_ts(col1, col2)

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-18 Thread Joshua D. Drake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Bruce Momjian wrote:
 Trevor Talbot wrote:
 On 8/18/07, Bruce Momjian [EMAIL PROTECTED] wrote:

 Remember an expression index can be a user-created function so you can
 embed whatever you want in your function and just index it's output,
 just like you would with a trigger creating a separate column.
 Well, you could create a function that returns a tsvector, but how do
 you get that to work with queries?  I've been under the impression the
 expressions need to match (in the normal case, be the same function
 with the same arguments) in order to use the index.
 
 Yes, so you create a function called complex_ts and create the index:
 
   CREATE INDEX ii on x USING GIT(complex_ts(col1, col2))

GIN?

 
 and in your WHERE clause you do:
 
   WHERE 'a  b' @@ complex_ts(col1, col2)
 


- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxzHtATb/zqfZUUQRAnpFAJ0SXj1hvjxM03l35R3oWpm6TVwKEACeOwup
cfyDu6zdxf1fKcf9AE+5VSs=
=d81X
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-18 Thread Trevor Talbot

On 8/18/07, Bruce Momjian [EMAIL PROTECTED] wrote:
 Trevor Talbot wrote:

  Well, you could create a function that returns a tsvector, but how do
  you get that to work with queries?  I've been under the impression the
  expressions need to match (in the normal case, be the same function
  with the same arguments) in order to use the index.

 Yes, so you create a function called complex_ts and create the index:

 CREATE INDEX ii on x USING GIT(complex_ts(col1, col2))

 and in your WHERE clause you do:

 WHERE 'a  b' @@ complex_ts(col1, col2)

Oh, duh, of course.  I kept thinking of the index as something
abstract instead of reusing the expression, even when the examples
were right in front of me...


On 8/18/07, Joshua D. Drake [EMAIL PROTECTED] wrote:
 Bruce Momjian wrote:

CREATE INDEX ii on x USING GIT(complex_ts(col1, col2))

 GIN?

Freudian slip, that's what he thinks of me :D

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-18 Thread Bruce Momjian

Joshua D. Drake wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Bruce Momjian wrote:
  Trevor Talbot wrote:
  On 8/18/07, Bruce Momjian [EMAIL PROTECTED] wrote:
 
  Remember an expression index can be a user-created function so you can
  embed whatever you want in your function and just index it's output,
  just like you would with a trigger creating a separate column.
  Well, you could create a function that returns a tsvector, but how do
  you get that to work with queries?  I've been under the impression the
  expressions need to match (in the normal case, be the same function
  with the same arguments) in order to use the index.
  
  Yes, so you create a function called complex_ts and create the index:
  
  CREATE INDEX ii on x USING GIT(complex_ts(col1, col2))
 
 GIN?

Yes, sorry, GIN.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-18 Thread Bruce Momjian

Trevor Talbot wrote:
 On 8/18/07, Bruce Momjian [EMAIL PROTECTED] wrote:
  Trevor Talbot wrote:
 
   Well, you could create a function that returns a tsvector, but how do
   you get that to work with queries?  I've been under the impression the
   expressions need to match (in the normal case, be the same function
   with the same arguments) in order to use the index.
 
  Yes, so you create a function called complex_ts and create the index:
 
  CREATE INDEX ii on x USING GIT(complex_ts(col1, col2))
 
  and in your WHERE clause you do:
 
  WHERE 'a  b' @@ complex_ts(col1, col2)
 
 Oh, duh, of course.  I kept thinking of the index as something
 abstract instead of reusing the expression, even when the examples
 were right in front of me...
 
 
 On 8/18/07, Joshua D. Drake [EMAIL PROTECTED] wrote:
  Bruce Momjian wrote:
 
 CREATE INDEX ii on x USING GIT(complex_ts(col1, col2))
 
  GIN?
 
 Freudian slip, that's what he thinks of me :D

Group-Index-Tuples, the patch that was held for 8.4.  :-(

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Robert Treat

On Thursday 16 August 2007 15:58, Bruce Momjian wrote:
 Josh Berkus wrote:
  All,
 
  First off, I'll assert that backup/restore is a serious issue and while
  the folks who want Tsearch in core now are dismissing it, we'll be
  fielding the complaints later.  Any solution which involves setting a GUC
  at restore time *which could vary per table or even column* isn't
  acceptable.  We used to do the \SET thing for table ownership with
  backup/restore, and you *know* how many restore failures that caused.

 Agreed.  Let me summarize where we are now.  I talked to Tom on the
 phone yesterday so we have come up with the following plan:

   o  default_text_search_config stays, not super-user-only, not set
  in pg_dump output
   o  tsearch functions that don't have a configuration name will be
  marked so they can't be specified in expression indexes
   o  auto-casts  and :: to tsearch data types will also not work in
  expression indexes (we already do this for timestamp without
  timezone)
   o  GIN on an text column will not promote to tsvector
   o  No rewrite magic for function calls without configuration names in
  WHERE clauses to use indexes that do specify configurations (risky)

 The current documentation explains all this:

   http://momjian.us/expire/textsearch/HTML/textsearch-tables.html

 So, we have disabled the ability to create expression indexes that are
 affected by default_text_search_config, and we have documented other
 possible problems.   tsvector_update_trigger() has to be modified to
 take a configuration name (and frankly I am not excited about the
 filter_name capability either, but that is a separate issue).

 The only remaining problem I see is that the rest of the documentation
 relies heavily on default_text_search_config when in fact the most
 common usage with tables and indexes can't use it.  tsquery can use the
 default easily, but I am betting that tsvector usually cannot.

What exactly does default_text_search_config buy us?  I think it is supposed 
to simplify things, but it sounds like it adds a bunch of corner cases, 
special siutations, if's and but's (and candies and nuts), that I fear will 
lead to more confusion for end users, and make development more difficult in 
the future as we forced to try and live with backwards compatability.  

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Bruce Momjian

Robert Treat wrote:
  The only remaining problem I see is that the rest of the documentation
  relies heavily on default_text_search_config when in fact the most
  common usage with tables and indexes can't use it.  tsquery can use the
  default easily, but I am betting that tsvector usually cannot.
 
 What exactly does default_text_search_config buy us?  I think it is supposed 
 to simplify things, but it sounds like it adds a bunch of corner cases, 
 special siutations, if's and but's (and candies and nuts), that I fear will 
 lead to more confusion for end users, and make development more difficult in 
 the future as we forced to try and live with backwards compatability.  

Agreed.  That was my conclusion long ago but few agreed so I gave up.

In fairness the goal was for default_text_search_config to make text
search easier for clusters that use a single configuration.  If you are
using triggers on a separate tsvector column, only the trigger author
needs to deal with the configuration name (not queries), but expression
indexes require the configuration name to always be used for the
tsvector queries, while the tsquery can use the
default_text_search_config value.  Anyway, again, it is all
special-casing this and that, as you said.  And, if you are specifying
the configuration name for the tsvector but not the tsquery you are more
likely to have a configuration mismatch.  (Of course you might want
different configurations for tsvector and tsquery, but that is for
experts.)

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Tom Lane

Oleg Bartunov [EMAIL PROTECTED] writes:
 On Thu, 16 Aug 2007, Josh Berkus wrote:
 First off, I'll assert that backup/restore is a serious issue and while the
 folks who want Tsearch in core now are dismissing it, we'll be fielding the
 complaints later.  Any solution which involves setting a GUC at restore time
 *which could vary per table or even column* isn't acceptable.

 Josh, all my respects to you, but text searching is not about index at all.
 Text searching is about tsvector and tsquery data type

What's your point?  The problem is just as bad for an auto-update
trigger that computes a stored tsvector column.  If the trigger's
behavior depends on the GUC settings of the person doing an insert,
things will soon be a mess --- do you really want the tsvector contents
to change after an update of an unrelated field?  After awhile you
won't have any idea what's really in the column, because you won't
have any good way to know which rows' tsvectors were generated with
which configurations.

Even if that state of affairs is really what you want, reproducing
it after a dump/reload will be tricky.  I think that a regular
schema-and-data dump would work, because pg_dump doesn't install
triggers until after it's loaded the data ... but a data-only dump
would *not* work, because the trigger would fire while loading rows.

Basically I see no use for a setup in which the configuration used
for a particular tsvector value is not fully determined by the table
definition.  Whether the value is in an index or in the table proper
is irrelevant to this argument.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Luke Lonergan

All - we have customers who very much want tsearch2 and will benefit from its 
inclusion in core.

We are also struggling with the update trigger approach for various reasons.

Is there a good alternative?  Can we embed tsvector updates into the core code 
efficiently?

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Tom Lane [mailto:[EMAIL PROTECTED]
Sent:   Friday, August 17, 2007 11:28 AM Eastern Standard Time
To: Oleg Bartunov
Cc: Josh Berkus; pgsql-hackers@postgresql.org
Subject:Re: [HACKERS] tsearch2 in PostgreSQL 8.3? 

Oleg Bartunov [EMAIL PROTECTED] writes:
 On Thu, 16 Aug 2007, Josh Berkus wrote:
 First off, I'll assert that backup/restore is a serious issue and while the
 folks who want Tsearch in core now are dismissing it, we'll be fielding the
 complaints later.  Any solution which involves setting a GUC at restore time
 *which could vary per table or even column* isn't acceptable.

 Josh, all my respects to you, but text searching is not about index at all.
 Text searching is about tsvector and tsquery data type

What's your point?  The problem is just as bad for an auto-update
trigger that computes a stored tsvector column.  If the trigger's
behavior depends on the GUC settings of the person doing an insert,
things will soon be a mess --- do you really want the tsvector contents
to change after an update of an unrelated field?  After awhile you
won't have any idea what's really in the column, because you won't
have any good way to know which rows' tsvectors were generated with
which configurations.

Even if that state of affairs is really what you want, reproducing
it after a dump/reload will be tricky.  I think that a regular
schema-and-data dump would work, because pg_dump doesn't install
triggers until after it's loaded the data ... but a data-only dump
would *not* work, because the trigger would fire while loading rows.

Basically I see no use for a setup in which the configuration used
for a particular tsvector value is not fully determined by the table
definition.  Whether the value is in an index or in the table proper
is irrelevant to this argument.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Tom Lane

Robert Treat [EMAIL PROTECTED] writes:
 What exactly does default_text_search_config buy us?  I think it is supposed 
 to simplify things, but it sounds like it adds a bunch of corner cases, 

Well, the main thing we'd lose if we remove it is all trace of upward
compatibility from the contrib version of tsearch.  People are
accustomed to using query functions that rely on a default configuration
setting.  Even though I want to prohibit use of a default in the
definition of an index or auto-update trigger, I don't see a good reason
to forbid it in queries.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Bruce Momjian

Luke Lonergan wrote:
 All - we have customers who very much want tsearch2 and will benefit from its 
 inclusion in core.
 
 We are also struggling with the update trigger approach for various reasons.
 
 Is there a good alternative?  Can we embed tsvector updates into the core 
 code efficiently?

No, doing it automatically adds too much complexity for little benefit. 
If you want more concrete suggestions, you will have to provide more
information about the problems you are having.

---


 
 - Luke
 
 Msg is shrt cuz m on ma treo
 
  -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Friday, August 17, 2007 11:28 AM Eastern Standard Time
 To:   Oleg Bartunov
 Cc:   Josh Berkus; pgsql-hackers@postgresql.org
 Subject:  Re: [HACKERS] tsearch2 in PostgreSQL 8.3? 
 
 Oleg Bartunov [EMAIL PROTECTED] writes:
  On Thu, 16 Aug 2007, Josh Berkus wrote:
  First off, I'll assert that backup/restore is a serious issue and while the
  folks who want Tsearch in core now are dismissing it, we'll be fielding the
  complaints later.  Any solution which involves setting a GUC at restore 
  time
  *which could vary per table or even column* isn't acceptable.
 
  Josh, all my respects to you, but text searching is not about index at all.
  Text searching is about tsvector and tsquery data type
 
 What's your point?  The problem is just as bad for an auto-update
 trigger that computes a stored tsvector column.  If the trigger's
 behavior depends on the GUC settings of the person doing an insert,
 things will soon be a mess --- do you really want the tsvector contents
 to change after an update of an unrelated field?  After awhile you
 won't have any idea what's really in the column, because you won't
 have any good way to know which rows' tsvectors were generated with
 which configurations.
 
 Even if that state of affairs is really what you want, reproducing
 it after a dump/reload will be tricky.  I think that a regular
 schema-and-data dump would work, because pg_dump doesn't install
 triggers until after it's loaded the data ... but a data-only dump
 would *not* work, because the trigger would fire while loading rows.
 
 Basically I see no use for a setup in which the configuration used
 for a particular tsvector value is not fully determined by the table
 definition.  Whether the value is in an index or in the table proper
 is irrelevant to this argument.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Tom Lane

Joshua D. Drake [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Well, the main thing we'd lose if we remove it is all trace of upward
 compatibility from the contrib version of tsearch.

 I don't think this is all that big of a deal. In fact I would expect it
 going from contrib to core and never had any illusion to the effect that
 I would be able to just upgrade from 8.2 (8.1) Tsearch2 to 8.3.

I would hope that what we do with contrib/tsearch2 is rewrite it as a
compatibility wrapper.  This at least will provide an answer to anyone
who complains that we renamed the functions.  But if there are
fundamental things missing in the core implementation, and we try to
make the wrapper supply them, then we haven't really eliminated the
problem ... just moved it over a little.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Bruce Momjian

Tom Lane wrote:
 Josh Berkus [EMAIL PROTECTED] writes:
  Here's something not to forget in this whole business: the present TSearch2
  implementation permits you to have a different tsvector configuration for 
  each *row*, not just each column.  That is, applications can be built with
  per-cell configs.
 
 Certainly.  That's actually the easiest case to deal with, because you're
 going to put the tsvector config identity into another column of the
 table, and the trigger or index just references it there.  It hasn't
 been part of the discussion because it's not a problem.

I added an example of that in the documentation (second query):


http://momjian.us/expire/textsearch/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Josh Berkus

Folks,

Here's something not to forget in this whole business: the present TSearch2 
implementation permits you to have a different tsvector configuration for 
each *row*, not just each column.  That is, applications can be built with 
per-cell configs.

I know of at least one out there: Ubuntu's Rosetta.  I'm sure there are 
others.

Therefore there are two cases we're trying to solve:

(1) The simple case: someone wants to build a database with text search 
entirely in one UTF8 language.  All vectors are in that language, and so are 
all queries.  The user wants the simplest syntax possible.

(2) The Rosetta case: different configs are used for each cell and all 
searches have to be language-qualified.

In both cases, the databases need to backup and restore cleanly.

From this, I'd first of all say that I don't see the point of a Superuser 
default_tsvector_search_config.  There are too many failure conditions with 
the default once you get away from the simplest case, so I don't see how 
setting it to Superuser-only protects anything.  Might as well make it a 
userset and then it will be more useful.

Unfortunately, the way I see it the only permanent solution for this is to 
alter the TSvector structure to include a config OID at the beginning of it.  
That doesn't sound like it's doable in time for 8.3, though; is there a way 
we could work around that until 8.4?

And why does this sound exactly like the issues we've had with per-column 
encodings and the currency type?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Joshua D. Drake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom Lane wrote:
 Robert Treat [EMAIL PROTECTED] writes:
 What exactly does default_text_search_config buy us?  I think it is supposed 
 to simplify things, but it sounds like it adds a bunch of corner cases, 
 
 Well, the main thing we'd lose if we remove it is all trace of upward
 compatibility from the contrib version of tsearch.

I don't think this is all that big of a deal. In fact I would expect it
going from contrib to core and never had any illusion to the effect that
I would be able to just upgrade from 8.2 (8.1) Tsearch2 to 8.3.

Sincerely,

Joshua D. Drake



- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxdYTATb/zqfZUUQRAo6gAJ9JDNGdTvYopOdw0Dp7rknffEZqewCaAkR9
d4EmQLv6iMpZ/iWR8Ksy1Ek=
=aEft
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Tom Lane

Josh Berkus [EMAIL PROTECTED] writes:
 Here's something not to forget in this whole business: the present TSearch2
 implementation permits you to have a different tsvector configuration for 
 each *row*, not just each column.  That is, applications can be built with
 per-cell configs.

Certainly.  That's actually the easiest case to deal with, because you're
going to put the tsvector config identity into another column of the
table, and the trigger or index just references it there.  It hasn't
been part of the discussion because it's not a problem.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Bruce Momjian

Josh Berkus wrote:
 Folks,
 
 Here's something not to forget in this whole business: the present TSearch2 
 implementation permits you to have a different tsvector configuration for 
 each *row*, not just each column.  That is, applications can be built with 
 per-cell configs.
 
 I know of at least one out there: Ubuntu's Rosetta.  I'm sure there are 
 others.
 
 Therefore there are two cases we're trying to solve:
 
 (1) The simple case: someone wants to build a database with text search 
 entirely in one UTF8 language.  All vectors are in that language, and so are 
 all queries.  The user wants the simplest syntax possible.
 
 (2) The Rosetta case: different configs are used for each cell and all 
 searches have to be language-qualified.
 
 In both cases, the databases need to backup and restore cleanly.
 
 From this, I'd first of all say that I don't see the point of a Superuser 
 default_tsvector_search_config.  There are too many failure conditions with 
 the default once you get away from the simplest case, so I don't see how 
 setting it to Superuser-only protects anything.  Might as well make it a 
 userset and then it will be more useful.

Per my email yesterday, default_tsvector_search_config is _not_
super-user-only:

  o  default_text_search_config stays, not super-user-only, not set
 in pg_dump output

 Unfortunately, the way I see it the only permanent solution for this is to 
 alter the TSvector structure to include a config OID at the beginning of it.  
 That doesn't sound like it's doable in time for 8.3, though; is there a way 
 we could work around that until 8.4?

Oh, so you want the config inside each tsvector value.  Interesting
idea.

 And why does this sound exactly like the issues we've had with per-column 
 encodings and the currency type?

Yes, this is a very similar issue except we are trying to allow multiple
encodings.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Ron Mayer

Joshua D. Drake wrote:
 Tom Lane wrote:
 Robert Treat [EMAIL PROTECTED] writes:
 What exactly does default_text_search_config buy us?  I think it is 
 supposed 
 to simplify things, but it sounds like it adds a bunch of corner cases, 
 Well, the main thing we'd lose if we remove it is all trace of upward
 compatibility from the contrib version of tsearch.
 
 I don't think this is all that big of a deal. In fact I would expect it
 going from contrib to core and never had any illusion to the effect that
 I would be able to just upgrade from 8.2 (8.1) Tsearch2 to 8.3.

FWIW, I also would _not_ have expected compatibility between contrib
and core.   In fact, I would have expected contrib tsearch to be a
place where experimental APIs existed and that the single
biggest difference between contrib vs core was that the
core APIs removed any cruft that might have been in contrib.

If default_text_search_config makes things more confusing or more
fragile, I'd rather see it gone than kept around for
backward-compatibility-to-pre-core reasons.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Josh Berkus

Bruce,

 Oh, so you want the config inside each tsvector value.  Interesting
 idea.

Yeah, hasn't anyone suggested this before?  It seems like the obvious 
solution.  A TSvector constructed with en_US is NOT the same as a vector 
constructed with fr_FR and it's silly to pretend that they are comparable.  
Sticking the config name at the beginning of the field would allow for the 
use of single-parameter functions, and default_config would only be used 
for SELECT queries.  Backup/restore issues should go away completely ...

EXCEPT this would introduce issues if the config is changed or deleted 
after being used.  However, I'd imagine that we have those anyway -- 
certainly we would at restore time.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Josh Berkus

Tom,

 It might be an obvious solution, but to some other problem than the one
 we have.  The problem we are trying to address is how to know which
 config to use to construct a *new* tsvector.

Oh, right.  Back to the circular arguments then ...

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Tom Lane

Josh Berkus [EMAIL PROTECTED] writes:
 Oh, so you want the config inside each tsvector value. Interesting
 idea.

 Yeah, hasn't anyone suggested this before?  It seems like the obvious
 solution.

It might be an obvious solution, but to some other problem than the one
we have.  The problem we are trying to address is how to know which
config to use to construct a *new* tsvector.

 A TSvector constructed with en_US is NOT the same as a vector 
 constructed with fr_FR and it's silly to pretend that they are comparable.

Um, actually I think Oleg and Teodor believe that they *are* comparable.
If we try to force them not to be then we'll break multi-language
situations.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread tomas

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, Aug 17, 2007 at 04:06:15PM -0700, Josh Berkus wrote:
 Bruce,
 
  Oh, so you want the config inside each tsvector value.  Interesting
  idea.
 
 Yeah, hasn't anyone suggested this before?  It seems like the obvious 
 solution.  A TSvector constructed with en_US is NOT the same as a vector 
 constructed with fr_FR and it's silly to pretend that they are comparable.  

Except that (as I understand Oleg) it even seems to make sense sometimes
to compare a tsvectors constructed with different configs -- so it might
be important not to prevent this use case eihter. Oleg?

Otherwise your proposal makes the most sense...

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFGxm+DBcgs9XrR2kYRAn7RAJ4u508XQB/W6fMTmTchizlsvKEkEwCfTtTK
R0DMLqNil2VQolFBWE69ZU0=
=Tvh/
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Mike Rylander

On 8/18/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On Fri, Aug 17, 2007 at 04:06:15PM -0700, Josh Berkus wrote:
  Bruce,
 
   Oh, so you want the config inside each tsvector value. Interesting
   idea.
 
  Yeah, hasn't anyone suggested this before?  It seems like the obvious
  solution.  A TSvector constructed with en_US is NOT the same as a vector
  constructed with fr_FR and it's silly to pretend that they are comparable.

 Except that (as I understand Oleg) it even seems to make sense sometimes
 to compare a tsvectors constructed with different configs -- so it might
 be important not to prevent this use case eihter. Oleg?

Configs are not simply about languages, they are also about stopword
lists and stemmers and parsers, and there's no reason to think that
one would be using only one configuration to create a single tsvector.

Different fields from within one document may require different
treatment.  Take for instance title, with stopwords included, and
body, with them removed.  Those two initial tsvectors can then be
concatenated together with different weights to provide a very rich,
and simple (relatively speaking) search infrastructure.

--miker

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Oleg Bartunov


On Sat, 18 Aug 2007, [EMAIL PROTECTED] wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, Aug 17, 2007 at 04:06:15PM -0700, Josh Berkus wrote:
 Bruce,
 
  Oh, so you want the config inside each tsvector value. б═Interesting

  idea.
 
 Yeah, hasn't anyone suggested this before?  It seems like the obvious 
 solution.  A TSvector constructed with en_US is NOT the same as a vector 
 constructed with fr_FR and it's silly to pretend that they are comparable. 


Except that (as I understand Oleg) it even seems to make sense sometimes
to compare a tsvectors constructed with different configs -- so it might
be important not to prevent this use case eihter. Oleg?


yes, for example, you have tsvectors obtained from different sources, which
require different processing.



Otherwise your proposal makes the most sense...

Regards
- -- tomц║s
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFGxm+DBcgs9XrR2kYRAn7RAJ4u508XQB/W6fMTmTchizlsvKEkEwCfTtTK
R0DMLqNil2VQolFBWE69ZU0=
=Tvh/
-END PGP SIGNATURE-



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Oleg Bartunov


Tom and Bruce, what version of patch you're using ?
Bruce complained about using OID in arguments of functions, but 
AFAIR, it was removed in 0.58 version of patch.


I and Teodor are very busy and just can't follow all discussions, so
we have to rely on people's wisdom. If we have so many problem with 
integration, that probably we could just integrate support of data types

(tsquery, tsvector), index support for them and set of support functions
like to_tsquery, to_tsvector and leave everything remaining in 
contrib/tsearch2 as an example of text search engine design. 
Then, after fixing design problem as well as some backend's issues we could

come with much better conclusions.

Oleg
On Fri, 17 Aug 2007, Tom Lane wrote:


Josh Berkus [EMAIL PROTECTED] writes:

Oh, so you want the config inside each tsvector value. Interesting
idea.



Yeah, hasn't anyone suggested this before?  It seems like the obvious
solution.


It might be an obvious solution, but to some other problem than the one
we have.  The problem we are trying to address is how to know which
config to use to construct a *new* tsvector.


A TSvector constructed with en_US is NOT the same as a vector
constructed with fr_FR and it's silly to pretend that they are comparable.


Um, actually I think Oleg and Teodor believe that they *are* comparable.
If we try to force them not to be then we'll break multi-language
situations.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Oleg Bartunov


On Sat, 18 Aug 2007, Mike Rylander wrote:


On 8/18/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, Aug 17, 2007 at 04:06:15PM -0700, Josh Berkus wrote:

Bruce,


Oh, so you want the config inside each tsvector value. Interesting
idea.


Yeah, hasn't anyone suggested this before?  It seems like the obvious
solution.  A TSvector constructed with en_US is NOT the same as a vector
constructed with fr_FR and it's silly to pretend that they are comparable.


Except that (as I understand Oleg) it even seems to make sense sometimes
to compare a tsvectors constructed with different configs -- so it might
be important not to prevent this use case eihter. Oleg?


Configs are not simply about languages, they are also about stopword
lists and stemmers and parsers, and there's no reason to think that
one would be using only one configuration to create a single tsvector.

Different fields from within one document may require different
treatment.  Take for instance title, with stopwords included, and
body, with them removed.  Those two initial tsvectors can then be
concatenated together with different weights to provide a very rich,
and simple (relatively speaking) search infrastructure.


I can't say better, Mike !

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Michael Paesold


Bruce Momjian wrote:

Uh, how are we going to prevent the auto-casting to tsvector from using
the default GUC config, e.g. CREATE INDEX i ON x USING GIN(col)?

This is where I started to see the need for education and error-prone
nature of the default GUC just wasn't worth having it, though I know
others disagree.


It can be removed quite easily. AFAIR, this feature was added on 
suggestion of Tom Lane. It was certainly only added in this 
tsearch-to-core release cycle, see here:


http://archives.postgresql.org/pgsql-hackers/2007-03/msg01384.php

Teodor Sigaev wrote:
 2) added operator class for text and varchar
CREATE INDEX idxname ON tblname USING GIN ( textcolumn );

So just remove the operator class or don't specify it as default 
operator class for GIN, and the thing is gone. Perhaps there is a better 
way to do this, though.


[...digging...] The idea was born in the thread starting here (involving 
Tom Lane, Joshua Drake, and Teodor Sigaev):

http://archives.postgresql.org/pgsql-hackers/2007-03/msg00912.php
with the conclusion here:
http://archives.postgresql.org/pgsql-hackers/2007-03/msg00936.php

Best Regards
Michael Paesold


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Tom Lane

Michael Paesold [EMAIL PROTECTED] writes:
 Teodor Sigaev wrote:
 2) added operator class for text and varchar
 CREATE INDEX idxname ON tblname USING GIN ( textcolumn );

 So just remove the operator class or don't specify it as default 
 operator class for GIN, and the thing is gone. Perhaps there is a better 
 way to do this, though.

 [...digging...] The idea was born in the thread starting here (involving 
 Tom Lane, Joshua Drake, and Teodor Sigaev):
 http://archives.postgresql.org/pgsql-hackers/2007-03/msg00912.php
 with the conclusion here:
 http://archives.postgresql.org/pgsql-hackers/2007-03/msg00936.php

Yeah, unfortunately we overlooked the implications of the conversion to
tsvector being environment-dependent.  Those opclasses will have to go
away again.  AFAICS the only safe way to build an index directly on a
text column is

CREATE INDEX idxname ON tblname USING gin (to_tsvector('config', textcolumn));

ie, you hardwire the configuration name directly into the index
definition.  Similarly, if you're using a trigger to build a
materialized tsvector column, you need to hardwire the config
name into the trigger definition.

An alternative in both cases is to take the config name from
another field of the table row.  This is what you'd need to do
for the advanced cases where you use different configs for
different entries in the same table.

We can still have a GUC default_text_search_config, but we have
to design the system around the assumption that that should only
be referenced during *queries*, not during updates.  That's the
only safe way to let it be changeable.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Gregory Stark


Tom Lane [EMAIL PROTECTED] writes:

 Yeah, unfortunately we overlooked the implications of the conversion to
 tsvector being environment-dependent.  Those opclasses will have to go
 away again.  AFAICS the only safe way to build an index directly on a
 text column is

 CREATE INDEX idxname ON tblname USING gin (to_tsvector('config', textcolumn));

Is there a null configuration which could be the default for the casts? So the
syntax would still work and would generate an index which worked well but has
no stop words, no stemming, etc?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Tom Lane

Gregory Stark [EMAIL PROTECTED] writes:
 Is there a null configuration which could be the default for the casts?

If we did that then there would be no point in having a GUC variable,
because it wouldn't control anything.

While that is certainly an alternative solution, I think it's riskier
than having the GUC variable and just preventing (or at least
discouraging) people from relying on it in their index definitions.
The problem is that people will resort to ALTER CONFIGURATION default
to tune their setups, and anytime you make a nontrivial change that way,
you run the risk of breaking your existing indexes.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Josh Berkus

All,

First off, I'll assert that backup/restore is a serious issue and while the 
folks who want Tsearch in core now are dismissing it, we'll be fielding the 
complaints later.  Any solution which involves setting a GUC at restore time 
*which could vary per table or even column* isn't acceptable.  We used to do 
the \SET thing for table ownership with backup/restore, and you *know* how 
many restore failures that caused.

Basically, restore happens at two times: (1) when the server fails and you 
need to recover, and (2) when you're upgrading, already a painful process.  
Glitches which occur at these times cause panic, angry user e-mails and 
people switching away from PostgreSQL.  It's just not acceptable for us to 
put new potential booby-traps in the way of restore.

Second, as attractive as the idea is, I can't see how a typemod would work.  
It's not like we have a fixed list of dictionaries; people can create their 
own.  If we wanted to clean up the syntax I suppose we could have a form of 
to_tsvector which took a two-column composite value as if it were a 
multicolumn index:

CREATE INDEX resumes_fti ON resumes USING GIN ( 'default', resume_text )

 hmmm, that wouldn't work as syntax, would it?  We can't accept a constant 
as a column in a multi-column index, can we?  Another reason why we can't do 
mods for 8.3.

This means, from my perspective, that the only reasonable course for 8.3 is to 
require the 2-parameter form of to_tsvector for indexes.  I'll say that in 
the applications I've developed with TSearch2 I use the 2-parameter form of 
to_tsvector and to_tsquery exclusively, as I've found the behavior of TSearch 
to be highly unreliable if I don't specify.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Oleg Bartunov


On Thu, 16 Aug 2007, Josh Berkus wrote:


All,

First off, I'll assert that backup/restore is a serious issue and while the
folks who want Tsearch in core now are dismissing it, we'll be fielding the
complaints later.  Any solution which involves setting a GUC at restore time
*which could vary per table or even column* isn't acceptable.  We used to do
the \SET thing for table ownership with backup/restore, and you *know* how
many restore failures that caused.

Basically, restore happens at two times: (1) when the server fails and you
need to recover, and (2) when you're upgrading, already a painful process.
Glitches which occur at these times cause panic, angry user e-mails and
people switching away from PostgreSQL.  It's just not acceptable for us to
put new potential booby-traps in the way of restore.

Second, as attractive as the idea is, I can't see how a typemod would work.
It's not like we have a fixed list of dictionaries; people can create their
own.  If we wanted to clean up the syntax I suppose we could have a form of
to_tsvector which took a two-column composite value as if it were a
multicolumn index:

CREATE INDEX resumes_fti ON resumes USING GIN ( 'default', resume_text )


Josh, all my respects to you, but text searching is not about index at all.
Text searching is about tsvector and tsquery data type



 hmmm, that wouldn't work as syntax, would it?  We can't accept a constant
as a column in a multi-column index, can we?  Another reason why we can't do
mods for 8.3.

This means, from my perspective, that the only reasonable course for 8.3 is to
require the 2-parameter form of to_tsvector for indexes.  I'll say that in
the applications I've developed with TSearch2 I use the 2-parameter form of
to_tsvector and to_tsquery exclusively, as I've found the behavior of TSearch
to be highly unreliable if I don't specify.




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Bruce Momjian

Josh Berkus wrote:
 All,
 
 First off, I'll assert that backup/restore is a serious issue and while the 
 folks who want Tsearch in core now are dismissing it, we'll be fielding the 
 complaints later.  Any solution which involves setting a GUC at restore time 
 *which could vary per table or even column* isn't acceptable.  We used to do 
 the \SET thing for table ownership with backup/restore, and you *know* how 
 many restore failures that caused.

Agreed.  Let me summarize where we are now.  I talked to Tom on the
phone yesterday so we have come up with the following plan:

  o  default_text_search_config stays, not super-user-only, not set
 in pg_dump output
  o  tsearch functions that don't have a configuration name will be
 marked so they can't be specified in expression indexes
  o  auto-casts  and :: to tsearch data types will also not work in 
 expression indexes (we already do this for timestamp without 
 timezone)
  o  GIN on an text column will not promote to tsvector
  o  No rewrite magic for function calls without configuration names in 
 WHERE clauses to use indexes that do specify configurations (risky)

The current documentation explains all this:

http://momjian.us/expire/textsearch/HTML/textsearch-tables.html

So, we have disabled the ability to create expression indexes that are
affected by default_text_search_config, and we have documented other
possible problems.   tsvector_update_trigger() has to be modified to
take a configuration name (and frankly I am not excited about the
filter_name capability either, but that is a separate issue).

The only remaining problem I see is that the rest of the documentation
relies heavily on default_text_search_config when in fact the most
common usage with tables and indexes can't use it.  tsquery can use the
default easily, but I am betting that tsvector usually cannot.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-15 Thread Magnus Hagander

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Since I don't think that a datatype solution is the way to go,
 I don't feel that we are as far away from an agreement as Bruce
 is worried about.
 
 Well, from where I sit, there is one person saying give me the foot gun,
 and Heikki saying he wants a bullet-proof type system, and you and I are
 in the middle, so the big problem is I don't see a concensus forming,
 and we have been discussing this for a while.
 
 The people who actually use tsearch2 seem to all have the same opinion ...
 so I think we can't go too far in the bullet-proofing direction.
 
 But I would like a design that is bulletproof in dump/reload scenarios,
 and I think it's fair to question that aspect of the tsearch2 design
 because we've seen many reports of people having trouble updating
 databases that use tsearch2.

dump/reload is *the* biggest problem I've had with tsearch2 so far. But
it hasn't been with the actual data - it's been the functions, and only
when migrating between versions. But solving dump/reload reliably is one
of the main things I'm hoping for in 8.3 ;-)

As for a nother use-pointer, I use different configurations in the same
database - but only one per table. I explicitly use the to_tsvector that
specifies a configuration always - to avoid surprising myself.

I don't use the functional index part, but for new users I can see how
that's certainly a *lot* easier. Requiring the specification of the
configuration explicitly when creating this index I don't see as a big
problem at all - compared to the work needed to set up triggers. But
it's nice not to have to do it when querying. But wouldn't that be
solved by having to_tsvector() require the configuration, but
to_tsquery() and plainto_tsquery() not require it?

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-15 Thread Oleg Bartunov


On Wed, 15 Aug 2007, Magnus Hagander wrote:


Tom Lane wrote:

Bruce Momjian [EMAIL PROTECTED] writes:

Tom Lane wrote:

Since I don't think that a datatype solution is the way to go,
I don't feel that we are as far away from an agreement as Bruce
is worried about.



Well, from where I sit, there is one person saying give me the foot gun,
and Heikki saying he wants a bullet-proof type system, and you and I are
in the middle, so the big problem is I don't see a concensus forming,
and we have been discussing this for a while.


The people who actually use tsearch2 seem to all have the same opinion ...
so I think we can't go too far in the bullet-proofing direction.

But I would like a design that is bulletproof in dump/reload scenarios,
and I think it's fair to question that aspect of the tsearch2 design
because we've seen many reports of people having trouble updating
databases that use tsearch2.


dump/reload is *the* biggest problem I've had with tsearch2 so far. But
it hasn't been with the actual data - it's been the functions, and only
when migrating between versions. But solving dump/reload reliably is one
of the main things I'm hoping for in 8.3 ;-)


The dump/reload problem should be gone once tsearch2 became a part of core.
the problem is an  inability to say what is a correct configuration in case 
of expressional index when restoring. In any other case there are many

use cases when tsvector could be intentionally obtained using different
configurations.



As for a nother use-pointer, I use different configurations in the same
database - but only one per table. I explicitly use the to_tsvector that
specifies a configuration always - to avoid surprising myself.

I don't use the functional index part, but for new users I can see how
that's certainly a *lot* easier. Requiring the specification of the
configuration explicitly when creating this index I don't see as a big
problem at all - compared to the work needed to set up triggers. But
it's nice not to have to do it when querying. But wouldn't that be
solved by having to_tsvector() require the configuration, but
to_tsquery() and plainto_tsquery() not require it?


or better to introduce novice-level interface with configuration name
required and insist on using it with expressional index (don't know
if there is a machinery to do so).


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-15 Thread Simon Riggs

On Tue, 2007-08-14 at 17:41 -0400, Tom Lane wrote:

 I've just finished re-reading the prior thread, and here are what seem
 to me to be the salient points:
 
 * Oleg, Teodor, and all of the old-line users of tsearch2 are
 comfortable with setting up a trigger to maintain a materialized
 tsvector column for a table.  They argue that recomputing the tsvector
 (possibly more than once) during a query is more expensive than fetching
 it from disk.  My suspicion is that the latter argument gets weaker
 every year --- CPUs are getting faster lots faster than disks are.
 
 * Bruce (and I ... not sure about anyone else) want to support usage of
 text search via a functional index.  This is argued to be easier to set
 up (no fooling with triggers) and possibly faster depending on CPU vs
 I/O speeds.  I don't think there is any desire here to eliminate the
 trigger approach, just to provide an alternative.

ISTM that the functional index would be considerably smaller than the
additional column approach, since tsvectors can be quite long. That
seems like a very desirable thing with larger textbases. However,
without an additional column certain queries would not be possible, such
as IndexScans on a non-text search index with an additional filter on
text search. So each way would be desirable in different situations.

Would it be wrong to allow both approaches? If there is strong
disagreement then it usually means both people are right.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-15 Thread Simon Riggs

On Wed, 2007-08-15 at 08:10 +0200, Magnus Hagander wrote:
 Tom Lane wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Since I don't think that a datatype solution is the way to go,
  I don't feel that we are as far away from an agreement as Bruce
  is worried about.
  
  Well, from where I sit, there is one person saying give me the foot gun,
  and Heikki saying he wants a bullet-proof type system, and you and I are
  in the middle, so the big problem is I don't see a concensus forming,
  and we have been discussing this for a while.
  
  The people who actually use tsearch2 seem to all have the same opinion ...
  so I think we can't go too far in the bullet-proofing direction.
  
  But I would like a design that is bulletproof in dump/reload scenarios,
  and I think it's fair to question that aspect of the tsearch2 design
  because we've seen many reports of people having trouble updating
  databases that use tsearch2.
 
 dump/reload is *the* biggest problem I've had with tsearch2 so far. But
 it hasn't been with the actual data - it's been the functions, and only
 when migrating between versions. But solving dump/reload reliably is one
 of the main things I'm hoping for in 8.3 ;-)

I can see the problem, but I'm sure there are more solutions than have
been listed so far.

If dump/restore is a problem we can:

1. force pg_dump to output a SET command for the GUC, so it is correctly
set at restore time. That seems like a straightforward addition to
pg_dump. Maybe this can be done in a generalised manner to support other
dump/restore configuration difficulties that might occur in the future.

2. put the desired value in a table and make sure the text_search_config
table is dumped ahead of other objects. When we restore we build the
index based on the config option set in the table, so it all just
works. 

3... probably other options too. 

Maybe we should consider that the user may be dumping and reloading
*because* they want the configuration to change. Just a thought.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-15 Thread Bruce Momjian

Magnus Hagander wrote:
  But I would like a design that is bulletproof in dump/reload scenarios,
  and I think it's fair to question that aspect of the tsearch2 design
  because we've seen many reports of people having trouble updating
  databases that use tsearch2.
 
 dump/reload is *the* biggest problem I've had with tsearch2 so far. But
 it hasn't been with the actual data - it's been the functions, and only
 when migrating between versions. But solving dump/reload reliably is one
 of the main things I'm hoping for in 8.3 ;-)
 
 As for a nother use-pointer, I use different configurations in the same
 database - but only one per table. I explicitly use the to_tsvector that
 specifies a configuration always - to avoid surprising myself.
 
 I don't use the functional index part, but for new users I can see how
 that's certainly a *lot* easier. Requiring the specification of the
 configuration explicitly when creating this index I don't see as a big
 problem at all - compared to the work needed to set up triggers. But
 it's nice not to have to do it when querying. But wouldn't that be
 solved by having to_tsvector() require the configuration, but
 to_tsquery() and plainto_tsquery() not require it?

Yea, I have thought about splitting up the behavior so tsvector always
needs the configuration but tsquery does not.  However, for a query, you
are probably still creating a tsvector so it didn't see to help much in
clarity.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-15 Thread Magnus Hagander

On Wed, Aug 15, 2007 at 10:23:00AM -0400, Bruce Momjian wrote:
 Magnus Hagander wrote:
   But I would like a design that is bulletproof in dump/reload scenarios,
   and I think it's fair to question that aspect of the tsearch2 design
   because we've seen many reports of people having trouble updating
   databases that use tsearch2.
  
  dump/reload is *the* biggest problem I've had with tsearch2 so far. But
  it hasn't been with the actual data - it's been the functions, and only
  when migrating between versions. But solving dump/reload reliably is one
  of the main things I'm hoping for in 8.3 ;-)
  
  As for a nother use-pointer, I use different configurations in the same
  database - but only one per table. I explicitly use the to_tsvector that
  specifies a configuration always - to avoid surprising myself.
  
  I don't use the functional index part, but for new users I can see how
  that's certainly a *lot* easier. Requiring the specification of the
  configuration explicitly when creating this index I don't see as a big
  problem at all - compared to the work needed to set up triggers. But
  it's nice not to have to do it when querying. But wouldn't that be
  solved by having to_tsvector() require the configuration, but
  to_tsquery() and plainto_tsquery() not require it?
 
 Yea, I have thought about splitting up the behavior so tsvector always
 needs the configuration but tsquery does not.  However, for a query, you
 are probably still creating a tsvector so it didn't see to help much in
 clarity.

Agh, I got stuck thinking the trigger case aagin - when you don't need to
create a vector at all.

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-15 Thread Tom Lane

Simon Riggs [EMAIL PROTECTED] writes:
 ISTM that the functional index would be considerably smaller than the
 additional column approach, since tsvectors can be quite long. That
 seems like a very desirable thing with larger textbases. However,
 without an additional column certain queries would not be possible, such
 as IndexScans on a non-text search index with an additional filter on
 text search. So each way would be desirable in different situations.

Huh?  Of course you can do the searching without an additional column;
you just have to compute the tsvector on-the-fly at each row.  This is
a straight trade of more CPU cycles for less I/O.

 Would it be wrong to allow both approaches?

Nobody has suggested disallowing the trigger approach (indeed it's hard
to see how we could).  The argument is mostly about how to make a
functional index approach work conveniently; and secondarily about
what's needed to make dump/restore reliably reproduce the current
database state, whichever approach you choose.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-15 Thread Ron Mayer

Magnus Hagander wrote:
 I don't use the functional index part, but for new users I can see how
 that's certainly a *lot* easier. 

Can someone with modern hardware test to see if it's
still quite a bit slower than the extra column.  I had
tried it too years ago; and found the functional index
to be quite a bit slower:
http://archives.postgresql.org/pgsql-general/2005-10/msg00475.php
but it'd be interesting to see if faster CPUs changed this.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

A lot of work has been done to try to get /contrib/tsearch2 into the
core backend for 8.3, but we have hit a roadblock in how to handle
multiple text search configurations.  (FYI, the documentation is at
http://momjian.us/expire/textsearch/HTML/textsearch.html.)

There are three options for controlling text search configurations:

1) have a GUC variable which specifies the default configuration
2) require the configuration to be always specified
3) use the type system to automatically use the right configuration

The problem with #1 is that is it error-prone (easy to mismatch
configurations).  One idea was to have the GUC be super-user-only but
then restoring a dump as non-super-user is a problem.

The problem with #2 is that it makes implicit and explicit casting
impossible (there is no place to specify the configuration).

#3 requires more code and is probably not something we want to do at
this stage in 8.3 development.  It requires passing typmod values
between functions and operators (not something we have done easily in
the past).

Given this, should we decide to not include full text search in 8.3?

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Alvaro Herrera

Bruce Momjian wrote:
 A lot of work has been done to try to get /contrib/tsearch2 into the
 core backend for 8.3, but we have hit a roadblock in how to handle
 multiple text search configurations.  (FYI, the documentation is at
 http://momjian.us/expire/textsearch/HTML/textsearch.html.)
 
 There are three options for controlling text search configurations:
 
   1) have a GUC variable which specifies the default configuration
   2) require the configuration to be always specified
   3) use the type system to automatically use the right configuration
 
 The problem with #1 is that is it error-prone (easy to mismatch
 configurations).  One idea was to have the GUC be super-user-only but
 then restoring a dump as non-super-user is a problem.

What is the worst consequence of mismatching configuration?  Does it
cause a system crash?  A backend hang?  A corrupted index?  Lost data?
Or does it, as I assume, just fail to return the exact result set that
would be returned if the correct configuration was supplied?

If the answer is the latter, I think this is not so huge a problem that
FULL TEXT should be rejected for 8.3 on these grounds.  We just tell
people to use the correct query and be done with it.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
La vida es para el que se aventura

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Heikki Linnakangas

Bruce Momjian wrote:
 #3 requires more code and is probably not something we want to do at
 this stage in 8.3 development.  It requires passing typmod values
 between functions and operators (not something we have done easily in
 the past).

It does? I was thinking of implicitly creating a new type, with no
typmod, when you create a new configuration. Similar to enums, I think.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Heikki Linnakangas

Alvaro Herrera wrote:
 What is the worst consequence of mismatching configuration?  Does it
 cause a system crash?  A backend hang?  A corrupted index?  Lost data?
 Or does it, as I assume, just fail to return the exact result set that
 would be returned if the correct configuration was supplied?

Your assumption is correct.

You can mismatch configurations not just by querying in a wrong
configuration, but also by accidentally storing tsvectors generated with
different configurations in the same column (with no additional column
like Mike Rylander had to tell them apart), but it's still going to look
 OK from PostgreSQL's point of view.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Kenneth Marshall

On Tue, Aug 14, 2007 at 03:15:44PM -0400, Alvaro Herrera wrote:
 Bruce Momjian wrote:
  A lot of work has been done to try to get /contrib/tsearch2 into the
  core backend for 8.3, but we have hit a roadblock in how to handle
  multiple text search configurations.  (FYI, the documentation is at
  http://momjian.us/expire/textsearch/HTML/textsearch.html.)
  
  There are three options for controlling text search configurations:
  
  1) have a GUC variable which specifies the default configuration
  2) require the configuration to be always specified
  3) use the type system to automatically use the right configuration
  
  The problem with #1 is that is it error-prone (easy to mismatch
  configurations).  One idea was to have the GUC be super-user-only but
  then restoring a dump as non-super-user is a problem.
 
 What is the worst consequence of mismatching configuration?  Does it
 cause a system crash?  A backend hang?  A corrupted index?  Lost data?
 Or does it, as I assume, just fail to return the exact result set that
 would be returned if the correct configuration was supplied?
 
 If the answer is the latter, I think this is not so huge a problem that
 FULL TEXT should be rejected for 8.3 on these grounds.  We just tell
 people to use the correct query and be done with it.
 
I think that users of the full-text option would need to read the
documentation and we could include any needed caveats. This would
certainly do for the 8.3 release and would give us time to simplify
the management and use in the 8.4 release. I, for one, have been
waiting a long time for it to be integrated into the database. 

Ken

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

Heikki Linnakangas wrote:
 Bruce Momjian wrote:
  #3 requires more code and is probably not something we want to do at
  this stage in 8.3 development.  It requires passing typmod values
  between functions and operators (not something we have done easily in
  the past).
 
 It does? I was thinking of implicitly creating a new type, with no
 typmod, when you create a new configuration. Similar to enums, I think.

So each new configuration is a new data type?  How do the tsearch
functions handle these new data types?  I also question if this can be
completed soon.  I have seen no specification yet, let alone someone
coding it.

Basically, the default GUC doesn't work because of:

error prone
if super-user only, non-super-user doesn't work on restore
if non-super-user, can cause mismatch (perhaps this is the best
  option), and restore still a problem (no storage of config in
  indexes or tables)

No one seems to like the always-specify the configuration (loses cast
ability).  And I don't see the code for new type appearing anytime soon.

Bottom line --- we better figure out something quick or it isn't going
to be in 8.3, and at this point, I am starting to doubt a solution will
magically appear.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Alvaro Herrera

Bruce Momjian wrote:

 Basically, the default GUC doesn't work because of:
 
   error prone
   if super-user only, non-super-user doesn't work on restore
   if non-super-user, can cause mismatch (perhaps this is the best
 option), and restore still a problem (no storage of config in
 indexes or tables)

I haven't really seen anyone else arguing about this.  I wonder whether
you are being overly zealous about it.

-- 
Alvaro Herrera http://www.flickr.com/photos/alvherre/
Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem.  (Tom Lane)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

Alvaro Herrera wrote:
 Bruce Momjian wrote:
 
  Basically, the default GUC doesn't work because of:
  
  error prone
  if super-user only, non-super-user doesn't work on restore
  if non-super-user, can cause mismatch (perhaps this is the best
option), and restore still a problem (no storage of config in
indexes or tables)
 
 I haven't really seen anyone else arguing about this.  I wonder whether
 you are being overly zealous about it.

Uh, OK, but no one has told me how a database restore without a
configuration name would work, so I am all ears.

CREATE INDEX ii on x(to_tsvector(col))

There is nothing that says what configuration that index should use
except the default setting, and if that is different in the restore
database, you have problems.  Same for a trigger that calls to_tsvector
and is restored into a different database.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Mike Rylander

On 8/14/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
 Bruce Momjian wrote:

  Basically, the default GUC doesn't work because of:
 
error prone
if super-user only, non-super-user doesn't work on restore
if non-super-user, can cause mismatch (perhaps this is the best
  option), and restore still a problem (no storage of config in
  indexes or tables)

 I haven't really seen anyone else arguing about this.  I wonder whether
 you are being overly zealous about it.

I hate to just pile on Bruce, but as a production user I tend to
agree... let me shoot myself in the foot. :)

--miker

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Alvaro Herrera wrote:
 I haven't really seen anyone else arguing about this.  I wonder whether
 you are being overly zealous about it.

 Uh, OK, but no one has told me how a database restore without a
 configuration name would work, so I am all ears.

It's the dump/restore problem that really is the kicker here.  I don't
mind so much a GUC that only controls the interpretation of queries, but
if it determines how an index or a tsvector column gets rebuilt, we've
got problems.

I've just finished re-reading the prior thread, and here are what seem
to me to be the salient points:

* Oleg, Teodor, and all of the old-line users of tsearch2 are
comfortable with setting up a trigger to maintain a materialized
tsvector column for a table.  They argue that recomputing the tsvector
(possibly more than once) during a query is more expensive than fetching
it from disk.  My suspicion is that the latter argument gets weaker
every year --- CPUs are getting faster lots faster than disks are.

* Bruce (and I ... not sure about anyone else) want to support usage of
text search via a functional index.  This is argued to be easier to set
up (no fooling with triggers) and possibly faster depending on CPU vs
I/O speeds.  I don't think there is any desire here to eliminate the
trigger approach, just to provide an alternative.

* For *either* the trigger or functional-index approach, I think it is
unacceptable to rely on a GUC variable to determine how the tsvector is
derived from the raw-document fields for storage or indexing.  It's just
too error-prone, particularly when you consider dump-and-reload cases.
What I think we should say is that the ts parsing configuration name
can be either hardwired or taken from another field of the table.
In the trigger case this would mean providing a couple of standard
triggers, one taking the config name as a trigger parameter, and the
other accepting a trigger parameter that's the name of the config name
column.  In the index case this would mean that the index expression
has to be either to_tsvector('constant', ...) or to_tsvector(field, ...).
Note that all four cases boil down to saying that stored or indexed
tsvectors have to be derived from the two-parameter form of to_tsvector.

* For queries, there is not anything very wrong with having a default
configuration, but the trick is how to get the planner to match that up
with an index that's written with the two-parameter form of to_tsvector.
One hackish possibility is to define the single-parameter form of
to_tsvector like this:

  create function to_tsvector(text) returns tsvector as $$
select to_tsvector(get_default_text_search_config(), $1)
  $$ language sql strict stable;

where get_default_text_search_config() is essentially just
current_setting('default_text_search_config') except it is
misleadingly marked immutable.  Then, a query with

WHERE to_tsvector(document_col) @@ tsquery(...)

will have the SQL function inlined, and the
get_default_text_search_config() call const-folded, and suddenly
it looks like

WHERE to_tsvector('english', document_col) @@ tsquery(...)

and can be matched to a functional index that's declared using the
explicit 'english' configuration name.  This is pretty grotty though
... can anyone think of a better way?  (The main objection I can see
to it is that someone could shoot himself in the foot by using this
function instead of two-parameter to_tsvector in a custom trigger
function.  But hopefully, anyone writing a custom trigger function
will have read the manual's warning not to do that.  Note that I
suggest marking the function stable so that it can't be misused
that way in a functional index.  Another possible objection is that
get_default_text_search_config() is a foot-gun all by itself, since
it could be used in a functional index.  Aside from not documenting
it I'm not sure there's much to be done about that.)

* I'm not enamored of Heikki's idea about a datatype-based solution,
because I don't think that it will allow columns containing tsvectors
derived with different configurations.  It's perfectly clear that
advanced users want to be able to do that, and it's also clear that
as long as the config name is coming from a stored column (or can be
reconstructed somehow from the stored data) that it's perfectly
well-defined.

Since I don't think that a datatype solution is the way to go,
I don't feel that we are as far away from an agreement as Bruce
is worried about.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

Mike Rylander wrote:
 On 8/14/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
  Bruce Momjian wrote:
 
   Basically, the default GUC doesn't work because of:
  
 error prone
 if super-user only, non-super-user doesn't work on restore
 if non-super-user, can cause mismatch (perhaps this is the best
   option), and restore still a problem (no storage of config in
   indexes or tables)
 
  I haven't really seen anyone else arguing about this.  I wonder whether
  you are being overly zealous about it.
 
 I hate to just pile on Bruce, but as a production user I tend to
 agree... let me shoot myself in the foot. :)
 
 --miker

Sure, we can document hazards, but the larger problem is related to the
fact that the default controls what gets stored in the database.  This is
a similar problem to when we had an autocommit GUC which caused
problems.

Technically, this is like how the server encoding affects what is stored
in the database.  If we allowed users to change the server encoding in a
database that already had data in it, there would be no way to identify
which data was using the old encoding and which was using the new one. 
Now, the application might be able to identify them just fine, but a
database restore would be unable to recreate the data the same way.

If we want to keep the default GUC we would have to allow non-super-user
changes so we can use it in pg_dump for restore, but even then if the
default is different there is going to be a mix of old/new after the
restore because table changes after the restore is going to use the new
default config.

Perhaps the best we could do is to tell people who change the default GUC
that they are on their own in restoring the database, or they have to be
very carful like with triggers to assign the configuration properly.  I
can imagine how complex that part of the documentation will be, but it
is doable.

The other point is that we should have a good idea of the API because
if it gets into 8.3 it will be harder to change.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 The other point is that we should have a good idea of the API because
 if it gets into 8.3 it will be harder to change.

Yeah, once it's in core we have a pretty strong backwards-compatibility
restriction to deal with.  Someone upthread claimed we can always
simplify it later but that's exactly backward --- we can add features
later, but we can't subtract.

Maybe we should be looking to implement just the minimum set of features
for 8.3 and leave some of the more controversial stuff for 8.4.  I hate
to admit it, but if we take that point of view then triggers are in
and functional-index support is out.  We have to support the trigger
approach because it's what is in tsearch2 now, and the existing users
will expect to continue to have that option.

However, allowing the standard triggers to pay attention to a
configuration GUC variable is simply broken; that bit has to go away.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

Tom Lane wrote:
 * For queries, there is not anything very wrong with having a default
 configuration, but the trick is how to get the planner to match that up
 with an index that's written with the two-parameter form of to_tsvector.
 One hackish possibility is to define the single-parameter form of
 to_tsvector like this:
 
   create function to_tsvector(text) returns tsvector as $$
 select to_tsvector(get_default_text_search_config(), $1)
   $$ language sql strict stable;
 
 where get_default_text_search_config() is essentially just
 current_setting('default_text_search_config') except it is
 misleadingly marked immutable.  Then, a query with
 
   WHERE to_tsvector(document_col) @@ tsquery(...)
 
 will have the SQL function inlined, and the
 get_default_text_search_config() call const-folded, and suddenly
 it looks like
 
   WHERE to_tsvector('english', document_col) @@ tsquery(...)

This is an interesting idea that would allow queries without a
configuration to match an expression index.  The only trick is to train
users not to use such tricks on triggers or expression index, but only
in queries.  The idea of 'stable' helps to fix that for expression
indexes, but not for trigger use, I assume.

 * I'm not enamored of Heikki's idea about a datatype-based solution,
 because I don't think that it will allow columns containing tsvectors
 derived with different configurations.  It's perfectly clear that
 advanced users want to be able to do that, and it's also clear that
 as long as the config name is coming from a stored column (or can be
 reconstructed somehow from the stored data) that it's perfectly
 well-defined.
 
 Since I don't think that a datatype solution is the way to go,
 I don't feel that we are as far away from an agreement as Bruce
 is worried about.

Well, from where I sit, there is one person saying give me the foot gun,
and Heikki saying he wants a bullet-proof type system, and you and I are
in the middle, so the big problem is I don't see a concensus forming,
and we have been discussing this for a while.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Since I don't think that a datatype solution is the way to go,
 I don't feel that we are as far away from an agreement as Bruce
 is worried about.

 Well, from where I sit, there is one person saying give me the foot gun,
 and Heikki saying he wants a bullet-proof type system, and you and I are
 in the middle, so the big problem is I don't see a concensus forming,
 and we have been discussing this for a while.

The people who actually use tsearch2 seem to all have the same opinion ...
so I think we can't go too far in the bullet-proofing direction.

But I would like a design that is bulletproof in dump/reload scenarios,
and I think it's fair to question that aspect of the tsearch2 design
because we've seen many reports of people having trouble updating
databases that use tsearch2.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  The other point is that we should have a good idea of the API because
  if it gets into 8.3 it will be harder to change.
 
 Yeah, once it's in core we have a pretty strong backwards-compatibility
 restriction to deal with.  Someone upthread claimed we can always
 simplify it later but that's exactly backward --- we can add features
 later, but we can't subtract.
 
 Maybe we should be looking to implement just the minimum set of features
 for 8.3 and leave some of the more controversial stuff for 8.4.  I hate
 to admit it, but if we take that point of view then triggers are in
 and functional-index support is out.  We have to support the trigger
 approach because it's what is in tsearch2 now, and the existing users
 will expect to continue to have that option.

Triggers and expression indexes were both in the documentation Oleg
supplied, so I am sure both are being used.  I bet some users don't even
know they are using expression indexes because creating a GIN index on a
column automatically casts to tsvector.  (But GIST does not.)  I had to
ask Oleg to find out this out.

 However, allowing the standard triggers to pay attention to a
 configuration GUC variable is simply broken; that bit has to go away.

The only trigger example supplied by Oleg and Teodor is
tsvector_update_trigger(), and that doesn't take a configuration name,
meaning it uses the default GUC configuration.

Uh, how are we going to prevent the auto-casting to tsvector from using
the default GUC config, e.g. CREATE INDEX i ON x USING GIN(col)?

This is where I started to see the need for education and error-prone
nature of the default GUC just wasn't worth having it, though I know
others disagree.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Since I don't think that a datatype solution is the way to go,
  I don't feel that we are as far away from an agreement as Bruce
  is worried about.
 
  Well, from where I sit, there is one person saying give me the foot gun,
  and Heikki saying he wants a bullet-proof type system, and you and I are
  in the middle, so the big problem is I don't see a concensus forming,
  and we have been discussing this for a while.
 
 The people who actually use tsearch2 seem to all have the same opinion ...
 so I think we can't go too far in the bullet-proofing direction.
 
 But I would like a design that is bulletproof in dump/reload scenarios,
 and I think it's fair to question that aspect of the tsearch2 design
 because we've seen many reports of people having trouble updating
 databases that use tsearch2.

Yea, look at the trouble we are having trying to underestand it all.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Andrew Dunstan




Bruce Momjian wrote:


The people who actually use tsearch2 seem to all have the same opinion ...
so I think we can't go too far in the bullet-proofing direction.



Yeah.

But I would like a design that is bulletproof in dump/reload scenarios,
and I think it's fair to question that aspect of the tsearch2 design
because we've seen many reports of people having trouble updating
databases that use tsearch2.



Yea, look at the trouble we are having trying to underestand it all.


True. But I wasn't too concerned about the forecast difficulties with 
data only dumps. Those fail in plenty of circumstances. It is important 
that there is *some* reliable dump/restore/upgrade path, though.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

63 matches

Mail list logo