Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-30 Thread Josh Berkus

Ishii-san,


Ok, probably we need to copy the English stemming rule to the one for
Japanese.

Pardon my ignorance here, but is the concept of stemming even relevant
to Japanese/Chinese/Korean?  What little I know about ideographic
languages suggests it wouldn't work well.  And surely the specific rules
in the Snowball project's English stemmer wouldn't work.


Your undestanding is correct. English stemmer would not work for
Japanese non English part.


That reminds me, don't you guys have your own full text search for 
Japanese?  Planning on merging it with the core code anytime soon?


--Josh

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-30 Thread Tatsuo Ishii
 Ishii-san,
 
  Ok, probably we need to copy the English stemming rule to the one for
  Japanese.
  Pardon my ignorance here, but is the concept of stemming even relevant
  to Japanese/Chinese/Korean?  What little I know about ideographic
  languages suggests it wouldn't work well.  And surely the specific rules
  in the Snowball project's English stemmer wouldn't work.
  
  Your undestanding is correct. English stemmer would not work for
  Japanese non English part.
 
 That reminds me, don't you guys have your own full text search for 
 Japanese?  Planning on merging it with the core code anytime soon?

No. Actually Japanese (non English part) does not need stemming at
all. However, since Japanese is an agglutinative language, we have to
break continuous Japanese string into space separated words. For
example, we need to break:

todayisfine

into:

today is fine

(of course those English are just for non-Japanese spearker's
understanding, actually they are Japanese).

For this we need good dictionary and software. Fortunately we have
several kinds of open source softwares for this pupose. Once I have
written a PostgreSQL C function envoking one of these software to do
the work and it works great with tsearch2.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] tsearch in core patch

2007-06-27 Thread Teodor Sigaev

But why do you need them to be different at all?  Just make it
russian Russian_Russia
russian ru_RU

Does that not work for some reason?
I'd like to have unique names of configuration. So, if user sets GUC variable or 
call function with configuration's name then postgres should not have a choice 
--- it should use pointed configuration exactly.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-25 Thread Mike Rylander

On 6/25/07, Tom Lane [EMAIL PROTECTED] wrote:

Well, it's not hard at all to find chunks of English text that have
embedded bits of French, Spanish, or what-have-you, but that's not an
argument for trying to intermix the stemmers.  I doubt that such simple
bits of program could tell the language difference well enough to
determine which stemming rules to apply.



While I imagine that is probably true of many, if not most, my project
in particular would greatly benefit from the ability to mix stemmers.
I work with complex bibliographic data, which has language information
embedded within records.  This is not limited to the record level
either.  Individual fields within each bibliographic record can be in
different langauges.

Especially in countries where making software multi-lingual (such as
Canada (en_CA/fr_CA)) is a requirement for use in public institutions,
the ability to choose a stemmer and stop-word list at will for any
particular record will actually provide the exact behavior needed.
The obvious generalization from Canada would be to support any mix of
languages supported by tsearch2.

I can certainly understand the benefit of making the default
configuration a simple locale to language map, but there are
definitely uses for searching using different stemmers/stop-lists even
within the same corpus/index.  So, as a datapoint for the discussion,
I would ask that the option of multiple languages per DB locale not be
removed if it can be at all avoided.

Thanks for listening (and for all the great work on getting tsearch
into core! :) ...

--
Mike Rylander

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-25 Thread Tom Lane
Mike Rylander [EMAIL PROTECTED] writes:
 I can certainly understand the benefit of making the default
 configuration a simple locale to language map, but there are
 definitely uses for searching using different stemmers/stop-lists even
 within the same corpus/index.  So, as a datapoint for the discussion,
 I would ask that the option of multiple languages per DB locale not be
 removed if it can be at all avoided.

Nobody is proposing that --- the issue here is just how we set up the
default configuration.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-25 Thread Mike Rylander

On 6/25/07, Tom Lane [EMAIL PROTECTED] wrote:

Mike Rylander [EMAIL PROTECTED] writes:
 I can certainly understand the benefit of making the default
 configuration a simple locale to language map, but there are
 definitely uses for searching using different stemmers/stop-lists even
 within the same corpus/index.  So, as a datapoint for the discussion,
 I would ask that the option of multiple languages per DB locale not be
 removed if it can be at all avoided.

Nobody is proposing that --- the issue here is just how we set up the
default configuration.



Then I misunderstood.  Sorry for the noise, folks.

--
Mike Rylander

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-24 Thread Tatsuo Ishii
 Tatsuo Ishii wrote:
 
  japanese '{ja_JP, C}'
  
  How would we know C - japanese?
  
 You can't do that. You can't have different languages (not locales)
 mapping to the same 'tsearch language' because the stemmer doesn't know
 that a specific word is in english or japanese. So you have two options:
 (a) disable stemming (b) leave the language set to 'japanese' and see if
 it plays well.

Ok, probably we need to copy the English stemming rule to the one for
Japanese. I think same thing (commonly used English with local
language) can be applied to Chinese and Korean.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] tsearch in core patch

2007-06-24 Thread Tatsuo Ishii
 I would be surprised if C locale defaulted to anything except English.

Don't be surprised. The mechanism of collation is too simple for
Japanse Kanji, and locale is not usefull for Japanse anyway. That's
why Japanese installations of PostgreSQL tend to use C locale.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

 I suppose it would be sensible to add a switch to allow people to select
 a different language.  In any case, the only thing initdb would be doing
 would be setting up an initial value of a table entry or GUC variable,
 so you could always change it yourself later; it may not be worth
 sweating too much about this.
 
   regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-24 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes:
 Ok, probably we need to copy the English stemming rule to the one for
 Japanese.

Pardon my ignorance here, but is the concept of stemming even relevant
to Japanese/Chinese/Korean?  What little I know about ideographic
languages suggests it wouldn't work well.  And surely the specific rules
in the Snowball project's English stemmer wouldn't work.

 I think same thing (commonly used English with local
 language) can be applied to Chinese and Korean.

Well, it's not hard at all to find chunks of English text that have
embedded bits of French, Spanish, or what-have-you, but that's not an
argument for trying to intermix the stemmers.  I doubt that such simple
bits of program could tell the language difference well enough to
determine which stemming rules to apply.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-24 Thread Tatsuo Ishii
 Tatsuo Ishii [EMAIL PROTECTED] writes:
  Ok, probably we need to copy the English stemming rule to the one for
  Japanese.
 
 Pardon my ignorance here, but is the concept of stemming even relevant
 to Japanese/Chinese/Korean?  What little I know about ideographic
 languages suggests it wouldn't work well.  And surely the specific rules
 in the Snowball project's English stemmer wouldn't work.

Your undestanding is correct. English stemmer would not work for
Japanese non English part.

What I meant was the chunks of English text in Japanese.

  I think same thing (commonly used English with local
  language) can be applied to Chinese and Korean.
 
 Well, it's not hard at all to find chunks of English text that have
 embedded bits of French, Spanish, or what-have-you, but that's not an
 argument for trying to intermix the stemmers.  I doubt that such simple
 bits of program could tell the language difference well enough to
 determine which stemming rules to apply.

For Japanese, it will be fairly simple: 7bit ASCII range words must be
English (Note that mostly used Japanese encodings such as EUC do not
allow to mix with ISO 8859).
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] tsearch in core patch

2007-06-23 Thread Euler Taveira de Oliveira
Alvaro Herrera wrote:

 What I was really suggesting was having a table mapping locale names
 into tsearch languages.  Then the configuration could be made based on
 the language, not on the locale name.  So the stopword list is for
 russian, regardless of whether the locale is Russian_Russia or ru_RU.
 
Agreed. But I'm afraid we couldn't map all of the locale names in a
right way. Man, it's a large list. ;)

 Is this only for the stopword list, or does it also affect selecting a
 stemmer?
 
Both.

 Note: it's possible that the stopword list is different for brazilian
 portuguese than portuguese portuguese, which is why I was suggesting
 using a language portuguese_brazil and not just postuguese.  Whereas
 you need a single stopword list for all the countries speaking spanish,
 which is why you need only one language called spanish.
 
Indeed it's possible for portuguese, because we have some words that are
written in different ways, e.g.,
pt_BR pt_PT english
MônicaMónicaMonica
ação  acção action
Irã   Irão  Iran
.
.
.

Will it be possible to disable stemming or stopwords removal? I'm asking
this 'cause sometimes stemming doesn't lead to good results and/or
stopwords are relevant. Maybe it could be an GUC variables
('enable_stemming' and 'enable_stopwords').


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch in core patch

2007-06-23 Thread Oleg Bartunov

On Sat, 23 Jun 2007, Euler Taveira de Oliveira wrote:


Will it be possible to disable stemming or stopwords removal? I'm asking
this 'cause sometimes stemming doesn't lead to good results and/or
stopwords are relevant. Maybe it could be an GUC variables
('enable_stemming' and 'enable_stopwords').


Just use another configuration.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-23 Thread Euler Taveira de Oliveira
Tatsuo Ishii wrote:

 japanese '{ja_JP, C}'
 
 How would we know C - japanese?
 
You can't do that. You can't have different languages (not locales)
mapping to the same 'tsearch language' because the stemmer doesn't know
that a specific word is in english or japanese. So you have two options:
(a) disable stemming (b) leave the language set to 'japanese' and see if
it plays well.


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Teodor Sigaev

3) ALTER FULLTEXT CONFIGURATION cfgname ADD/ALTER/DROP MAPPING
done

Why not rename ALTER FULLTEXT CONFIGURATION -- ALTER TEXT SEARCH
CONFIGURATION here too ?


It's renamed too.


most languages can be written using UNICODE charset and UTF-8 encoding,
so neither charset not encoding can be used to determine language.

yes


 --- how do many languages use ISO8859-1 locale?. 

 ISO8859-1 is encoding, not locale.

I meant, if we'll use encoding name (for example PG_LATIN1) we couldn't 
distinguish languages which use that encoding (for example italian and finnish 
and some more), but using locale names it's possible: it_IT.ISO8859-1, 
fi_FI.ISO8859-1


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Teodor Sigaev

The recommendation I was making was to use the language name, not the
encoding name, in the user-visible configuration.

How does it determine language of db automatically?

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Bruce Momjian
Teodor Sigaev wrote:
  The recommendation I was making was to use the language name, not the
  encoding name, in the user-visible configuration.

 How does it determine language of db automatically?

I don't think we are going to do language selection automatically ---
the user is going to have to set tsearch_conf_name.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Teodor Sigaev

I don't think we are going to do language selection automatically ---
the user is going to have to set tsearch_conf_name.


Are you suggest to remove long-lived feature of tsearch? In that case we don't 
need cfglocale (or cfglanguage as Tom suggested) and cfgdefault columns in 
pg_ts_cfg at all. Just set up tsearch_conf_name.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 I very much doubt that the different spanishes are any different in the
 stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
 but in the case of portuguese I'm not so sure.  Maybe there are other
 examples (like chinese, but I'm not sure how useful is tsearch for
 chinese).

 And the .ISO8859-1 part you don't need at all if you accept that the
 files are UTF8 by design, as Tom proposed.

Also, the problem we're dealing with here is mainly lack of
standardization of the encoding part of locale names.  AFAIK, just about
everybody agrees on es_ES, ru_RU, etc; it's the part that comes
after that (if any) that is not too consistent across platforms.
So I see no problem in distinguishing between pt_PT and pt_BR if it
turns out we have to.  The trick is to not look at any more of the
locale name than that; and if we standardize on stopword files are
UTF8 then I don't think we need to.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 I don't think we are going to do language selection automatically ---
 the user is going to have to set tsearch_conf_name.

 Are you suggest to remove long-lived feature of tsearch? In that case we 
 don't 
 need cfglocale (or cfglanguage as Tom suggested) and cfgdefault columns in 
 pg_ts_cfg at all. Just set up tsearch_conf_name.

Is the point here for initdb to be able to establish a sane default
initially?  Seems to me it can guess the language from the first
component of the locale (ru_RU - russian).

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Alvaro Herrera
Teodor Sigaev wrote:

  --- how do many languages use ISO8859-1 locale?. 
  ISO8859-1 is encoding, not locale.
 
 I meant, if we'll use encoding name (for example PG_LATIN1) we couldn't 
 distinguish languages which use that encoding (for example italian and 
 finnish and some more), but using locale names it's possible: 
 it_IT.ISO8859-1, fi_FI.ISO8859-1

I don't understand.  Why use it_IT.ISO8859-1?  You just need to know
the language, so it is enough.  The _IT part specifies that it's the
italian spoken in Italy.  This may be irrelevant in most cases, but
consider that pt_PT and pt_BR are AFAIK somewhat different languages.

I very much doubt that the different spanishes are any different in the
stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
but in the case of portuguese I'm not so sure.  Maybe there are other
examples (like chinese, but I'm not sure how useful is tsearch for
chinese).

And the .ISO8859-1 part you don't need at all if you accept that the
files are UTF8 by design, as Tom proposed.

-- 
Alvaro Herrera  Developer, http://www.PostgreSQL.org/
Nadie esta tan esclavizado como el que se cree libre no siendolo (Goethe)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Bruce Momjian
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  I very much doubt that the different spanishes are any different in the
  stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
  but in the case of portuguese I'm not so sure.  Maybe there are other
  examples (like chinese, but I'm not sure how useful is tsearch for
  chinese).
 
  And the .ISO8859-1 part you don't need at all if you accept that the
  files are UTF8 by design, as Tom proposed.
 
 Also, the problem we're dealing with here is mainly lack of
 standardization of the encoding part of locale names.  AFAIK, just about
 everybody agrees on es_ES, ru_RU, etc; it's the part that comes
 after that (if any) that is not too consistent across platforms.
 So I see no problem in distinguishing between pt_PT and pt_BR if it
 turns out we have to.  The trick is to not look at any more of the
 locale name than that; and if we standardize on stopword files are
 UTF8 then I don't think we need to.

OK, and the open question is when do we do this default setting.  If we
do it in initdb then we can isolate all the detection there.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Oleg Bartunov

On Fri, 22 Jun 2007, Bruce Momjian wrote:


Tom Lane wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:

I very much doubt that the different spanishes are any different in the
stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
but in the case of portuguese I'm not so sure.  Maybe there are other
examples (like chinese, but I'm not sure how useful is tsearch for
chinese).



And the .ISO8859-1 part you don't need at all if you accept that the
files are UTF8 by design, as Tom proposed.


Also, the problem we're dealing with here is mainly lack of
standardization of the encoding part of locale names.  AFAIK, just about
everybody agrees on es_ES, ru_RU, etc; it's the part that comes
after that (if any) that is not too consistent across platforms.
So I see no problem in distinguishing between pt_PT and pt_BR if it
turns out we have to.  The trick is to not look at any more of the
locale name than that; and if we standardize on stopword files are
UTF8 then I don't think we need to.


OK, and the open question is when do we do this default setting.  If we
do it in initdb then we can isolate all the detection there.


We can do that at initdb time, but we still have to decide how to map
human-readable language name and lang part of locale name. Are we going
to hardcode it ?

It's not friendly for hosting solution, when people often have no access
to the postgresql.conf, so they need to remember setting tsearch_conf_name.
It could be solved using 'alter user ... set tsearch_conf_name' command though.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Magnus Hagander
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 I very much doubt that the different spanishes are any different in the
 stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
 but in the case of portuguese I'm not so sure.  Maybe there are other
 examples (like chinese, but I'm not sure how useful is tsearch for
 chinese).
 
 And the .ISO8859-1 part you don't need at all if you accept that the
 files are UTF8 by design, as Tom proposed.
 
 Also, the problem we're dealing with here is mainly lack of
 standardization of the encoding part of locale names.  AFAIK, just about
 everybody agrees on es_ES, ru_RU, etc; it's the part that comes
 after that (if any) that is not too consistent across platforms.

That may have been true until we started supporting Windows...
Swedish_Sweden.1252 is what I get on my machine, for example. Principle
is the same, but values certainly aren't.

//Magnus


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Alvaro Herrera
Magnus Hagander wrote:
 Tom Lane wrote:
  Alvaro Herrera [EMAIL PROTECTED] writes:
  I very much doubt that the different spanishes are any different in the
  stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
  but in the case of portuguese I'm not so sure.  Maybe there are other
  examples (like chinese, but I'm not sure how useful is tsearch for
  chinese).
  
  And the .ISO8859-1 part you don't need at all if you accept that the
  files are UTF8 by design, as Tom proposed.
  
  Also, the problem we're dealing with here is mainly lack of
  standardization of the encoding part of locale names.  AFAIK, just about
  everybody agrees on es_ES, ru_RU, etc; it's the part that comes
  after that (if any) that is not too consistent across platforms.
 
 That may have been true until we started supporting Windows...
 Swedish_Sweden.1252 is what I get on my machine, for example. Principle
 is the same, but values certainly aren't.

Well, at least the name is not itself translated, so a mapping table is
not right out of the question.  If they had put a name like
Español_Chile instead of Spanish_Chile we would be in serious
trouble.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread teodor

 That may have been true until we started supporting Windows...
 Swedish_Sweden.1252 is what I get on my machine, for example. Principle
 is the same, but values certainly aren't.

 Well, at least the name is not itself translated, so a mapping table is
 not right out of the question.  If they had put a name like
 Español_Chile instead of Spanish_Chile we would be in serious
 trouble.
I don't think so, in oppsite case you can't type or show it to change
locale :).

So, final propose:
rename cfglocale to cfglanguages and store in it array of laguage names
which is produced from first part of locale names:
russian   '{ru_RU, Russian_Russia}'
spanish   '{es_ES, es_CL, Spanish_Spain, Spanish_Chile}'

Comments?

Is there some obstacles to  use GIN indexes in pg_catalog?


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Bruce Momjian
Michael Glaesemann wrote:
 
 On Jun 22, 2007, at 9:28 , Tom Lane wrote:
 
  Is the point here for initdb to be able to establish a sane default
  initially?  Seems to me it can guess the language from the first
  component of the locale (ru_RU - russian).
 
 How would this work for initdb with locale C?

Yea, that's a problem.  I am thinking we should just avoid the entire
issue and require it to be set by the user, and throw an error if the
configuration is not set.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Alvaro Herrera
[EMAIL PROTECTED] wrote:

 So, final propose:
 rename cfglocale to cfglanguages and store in it array of laguage names
 which is produced from first part of locale names:
 russian   '{ru_RU, Russian_Russia}'
 spanish   '{es_ES, es_CL, Spanish_Spain, Spanish_Chile}'
 
 Comments?

Why not do it the other way around?
es_ES   spanish
Spanish_Spain   spanish
ru_RU   russian
pt_BR   portuguese_brazil

That way you don't need any funny index.  Or do you need the list of
locales for each language? (but even if you do, you can easily obtain it
by indexing both columns separately using btrees anyway)

-- 
Alvaro Herrera   http://www.PlanetPostgreSQL.org/
I can see support will not be a problem.  10 out of 10.(Simon Wittber)
  (http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Tatsuo Ishii
 On Jun 22, 2007, at 9:28 , Tom Lane wrote:
 
  Is the point here for initdb to be able to establish a sane default
  initially?  Seems to me it can guess the language from the first
  component of the locale (ru_RU - russian).
 
 How would this work for initdb with locale C?

I'm worrying about that too.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Michael Glaesemann


On Jun 22, 2007, at 9:28 , Tom Lane wrote:


Is the point here for initdb to be able to establish a sane default
initially?  Seems to me it can guess the language from the first
component of the locale (ru_RU - russian).


How would this work for initdb with locale C?

Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread teodor
 Why not do it the other way around?
 es_ES spanish
 Spanish_Spain spanish
 ru_RU russian
 pt_BR portuguese_brazil

 That way you don't need any funny index.  Or do you need the list of
 locales for each language? (but even if you do, you can easily obtain it
 by indexing both columns separately using btrees anyway)

Yes, that's possible but that icreases number of identical configuration:
russian_win Russian_Russia
russian_unixru_RU

They doesn't differ except locale name.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-22 Thread Tatsuo Ishii
  How would this work for initdb with locale C?
 
  I'm worrying about that too.
 
 english '{en_GB, en_US, C}'
 
 I suppose, that locale name always has a dot separator exept C locale ---
 which is well known exception

So we would have to?:

japanese '{ja_JP, C}'

How would we know C - japanese?

Also I'm wondering how we could handle texts including Japanese and
English. It's very common in Japan.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes:
 On Jun 22, 2007, at 9:28 , Tom Lane wrote:
 Is the point here for initdb to be able to establish a sane default
 initially?  Seems to me it can guess the language from the first
 component of the locale (ru_RU - russian).
 
 How would this work for initdb with locale C?

 I'm worrying about that too.

I would be surprised if C locale defaulted to anything except English.
I suppose it would be sensible to add a switch to allow people to select
a different language.  In any case, the only thing initdb would be doing
would be setting up an initial value of a table entry or GUC variable,
so you could always change it yourself later; it may not be worth
sweating too much about this.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-22 Thread teodor

 How would this work for initdb with locale C?

 I'm worrying about that too.

english '{en_GB, en_US, C}'

I suppose, that locale name always has a dot separator exept C locale ---
which is well known exception




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Alvaro Herrera
[EMAIL PROTECTED] wrote:
  Why not do it the other way around?
  es_ES   spanish
  Spanish_Spain   spanish
  ru_RU   russian
  pt_BR   portuguese_brazil
 
  That way you don't need any funny index.  Or do you need the list of
  locales for each language? (but even if you do, you can easily obtain it
  by indexing both columns separately using btrees anyway)
 
 Yes, that's possible but that icreases number of identical configuration:
 russian_win Russian_Russia
 russian_unixru_RU
 
 They doesn't differ except locale name.

But why do you need them to be different at all?  Just make it

russian Russian_Russia
russian ru_RU

Does that not work for some reason?

What I was really suggesting was having a table mapping locale names
into tsearch languages.  Then the configuration could be made based on
the language, not on the locale name.  So the stopword list is for
russian, regardless of whether the locale is Russian_Russia or ru_RU.

Is this only for the stopword list, or does it also affect selecting a
stemmer?

Note: it's possible that the stopword list is different for brazilian
portuguese than portuguese portuguese, which is why I was suggesting
using a language portuguese_brazil and not just postuguese.  Whereas
you need a single stopword list for all the countries speaking spanish,
which is why you need only one language called spanish.

-- 
Alvaro Herrerahttp://www.advogato.org/person/alvherre
Llegará una época en la que una investigación diligente y prolongada sacará
a la luz cosas que hoy están ocultas (Séneca, siglo I)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] tsearch in core patch

2007-06-21 Thread Teodor Sigaev

http://www.sigaev.ru/misc/tsearch_core-0.52.gz

Plan was:

1) rename FULLTEXT to TEXT SEARCH in SQL command
done

2) rework Snowball stemmer's as Tom suggested
done

3) ALTER FULLTEXT CONFIGURATION cfgname ADD/ALTER/DROP MAPPING
done

4) remove support of default configuration per scheme. Default configuration
   will be only one per locale.
done

5) single encoded files. That will touch snowball, ispell, synonym, thesaurus
   and simple dictionaries
done

6) use encoding names instead of locale's names in configuration
Ugh. I missed that knowledge of encoding doesn't allow to determine exact 
language --- how do many languages use ISO8859-1 locale?. So, it's not done. Tom 
pointed that locale's name isn't portable, but there isn't a lot of names of the 
same locale (ru_RU.UTF-8, ru_RU.UTF8 for example). So it's possible to use array 
of locales instead of one name.


I didn't see comments about security hole pointed by Tom, so I repeat:

About security holes in PARSER/DICTIONARY. I see following ways to resolve it 
now:
1) Allow to superuser only to do CREATE/ALTER/DROP PARSER/DICTIONARY
   Disadvantage: hosting users will not be able to change dictionaries
2) Remove CREATE/ALTER/DROP PARSER, split pg_ts_dict to pg_ts_dict_template
   and pg_ts_dict and accordingly change CREATE/ALTER/DROP DICTIONARY
   Disadvantage: parser and dictionary's template will not dump/restore,
 it should be restored manually (just a INSERT into
 pg_ts_parser/pg_ts_dict_template)
3) Similar to previous point, but:
   * CREATE/ALTER/DROP PARSER - super-user only
   * CREATE/ALTER/DROP DICTIONARY TEMPLATE - super-user only
   * CREATE/ALTER/DROP DICTIONARY - allowed to non-superuser
   Disadvantage: new command CREATE/ALTER/DROP DICTIONARY TEMPLATE
Which way do we choose? or I miss some variant?

I would like to go by 3) way... Comments?

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] tsearch in core patch

2007-06-21 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-06-21 kell 21:44, kirjutas Teodor Sigaev:
 http://www.sigaev.ru/misc/tsearch_core-0.52.gz
 
 Plan was:
 
 1) rename FULLTEXT to TEXT SEARCH in SQL command
 done
 
 2) rework Snowball stemmer's as Tom suggested
 done
 
 3) ALTER FULLTEXT CONFIGURATION cfgname ADD/ALTER/DROP MAPPING
 done

Why not rename ALTER FULLTEXT CONFIGURATION -- ALTER TEXT SEARCH
CONFIGURATION here too ?

 4) remove support of default configuration per scheme. Default configuration
 will be only one per locale.
 done
 
 5) single encoded files. That will touch snowball, ispell, synonym, thesaurus
 and simple dictionaries
 done
 
 6) use encoding names instead of locale's names in configuration
 Ugh. I missed that knowledge of encoding doesn't allow to determine exact 
 language

most languages can be written using UNICODE charset and UTF-8 encoding,
so neither charset not encoding can be used to determine language.

  --- how do many languages use ISO8859-1 locale?. 

ISO8859-1 is encoding, not locale.

 So, it's not done. Tom 
 pointed that locale's name isn't portable, but there isn't a lot of names of 
 the 
 same locale (ru_RU.UTF-8, ru_RU.UTF8 for example). So it's possible to use 
 array 
 of locales instead of one name.
 
 I didn't see comments about security hole pointed by Tom, so I repeat:
 
 About security holes in PARSER/DICTIONARY. I see following ways to resolve it 
 now:
 1) Allow to superuser only to do CREATE/ALTER/DROP PARSER/DICTIONARY
 Disadvantage: hosting users will not be able to change dictionaries
 2) Remove CREATE/ALTER/DROP PARSER, split pg_ts_dict to pg_ts_dict_template
 and pg_ts_dict and accordingly change CREATE/ALTER/DROP DICTIONARY
 Disadvantage: parser and dictionary's template will not dump/restore,
   it should be restored manually (just a INSERT into
   pg_ts_parser/pg_ts_dict_template)
 3) Similar to previous point, but:
 * CREATE/ALTER/DROP PARSER - super-user only
 * CREATE/ALTER/DROP DICTIONARY TEMPLATE - super-user only
 * CREATE/ALTER/DROP DICTIONARY - allowed to non-superuser
 Disadvantage: new command CREATE/ALTER/DROP DICTIONARY TEMPLATE
 Which way do we choose? or I miss some variant?
 
 I would like to go by 3) way... Comments?
 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] tsearch in core patch

2007-06-21 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 Ühel kenal päeval, N, 2007-06-21 kell 21:44, kirjutas Teodor Sigaev:
 6) use encoding names instead of locale's names in configuration
 Ugh. I missed that knowledge of encoding doesn't allow to determine exact 
 language

 most languages can be written using UNICODE charset and UTF-8 encoding,
 so neither charset not encoding can be used to determine language.

The recommendation I was making was to use the language name, not the
encoding name, in the user-visible configuration.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-23 Thread Peter Eisentraut
Am Donnerstag, 22. Februar 2007 18:07 schrieb Markus Schiltknecht:
  I agree so enhancing parser oabout not standard construct isn't good.

 Generally? Wow! This would mean PostgreSQL would always lack behind
 other RDBSes, regarding ease of use. Please don't do that!

You are confusing making a full-text index and configuring the full-text 
engine.  Tsearch already gives you a standard CREATE INDEX variant to do the 
former.  The discussion here is about the latter, and notably Oracle uses 
functions there.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-23 Thread Peter Eisentraut
Am Donnerstag, 22. Februar 2007 14:33 schrieb Teodor Sigaev:
 \df says only types of arguments, not a meaning.

Only if you don't provide argument names.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Markus Schiltknecht

Hi,

Peter Eisentraut wrote:

Oleg Bartunov wrote:

It's not so big addition to the gram.y, see a list of commands
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html.


As we still to still discuss the syntax: is there a proposal for how a 
function based syntax would look like?


CREATE FULLTEXT CONFIGURATION myfts LIKE template_cfg AS DEFAULT;

just seems so much more SQL-like than:

SELECT add_fulltext_config('myfts', 'template_cfg', True);

I admit, that's a very simple and not thought through example. But as 
long as those who prefer not to extend the grammar don't come up with a 
better alternative syntax, one easily gets the impression that extending 
the grammar in general is evil.


In that proposed syntax, I would drop all =, ,, (, and ).  They 
don't seem necessary and they are untypical for SQL commands.  I'd 
compare with CREATE FUNCTION or CREATE SEQUENCE for SQL commands that 
do similar things.


Yup, I'd second that.

Regards

Markus


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Teodor Sigaev
In that proposed syntax, I would drop all =, ,, (, and ).  They 
don't seem necessary and they are untypical for SQL commands.  I'd 
compare with CREATE FUNCTION or CREATE SEQUENCE for SQL commands that 
do similar things.


I was looking at CREATE TYPE mostly. With removing =, ,, (, and ) in 
CREATE/ALTER FULLTEXT it's needed to add several items in unreserved_keyword 
list. And increase gram.y by adding new rules similar to  OptRoleList instead of

simple opt_deflist:
'(' def_list ')'   { $$ = $2; }
  | /*EMPTY*/  { $$ = NIL; }
  ;

Is it acceptable?
List of new keywords is: LOCALE, LEXIZE, INIT, OPT, GETTOKEN, LEXTYPES, HEADLINE

So, syntax will be
CREATE FULLTEXT DICTIONARY dictname
LEXIZE  lexize_function
[ INIT init_function ]
[ OPT opt_text ];

CREATE FULLTEXT DICTIONARY dictname
   [ { LEXIZE  lexize_function | INIT init_function | OPT opt_text } [...] ]
LIKE template_dictname;

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Teodor Sigaev

CREATE FULLTEXT CONFIGURATION myfts LIKE template_cfg AS DEFAULT;
SELECT add_fulltext_config('myfts', 'template_cfg', True);

That's simple, but what about
CREATE FULLTEXT MAPPING ON cfgname FOR lexemetypename[, ...] WITH dictname1[, 
...];
?

SELECT create_fulltext_mapping(cfgname, '{lexemetypename[, ...]}'::text[],
'{dictname1[, ...]}'::text[]);

Seems rather ugly for me...

And function interface does not provide autocompletion and online help in psql. 
\df says only types of arguments, not a meaning.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Andrew Dunstan

Teodor Sigaev wrote:
In that proposed syntax, I would drop all =, ,, (, and ).  
They don't seem necessary and they are untypical for SQL commands.  
I'd compare with CREATE FUNCTION or CREATE SEQUENCE for SQL commands 
that do similar things.


I was looking at CREATE TYPE mostly. With removing =, ,, (, and 
) in CREATE/ALTER FULLTEXT it's needed to add several items in 
unreserved_keyword list. And increase gram.y by adding new rules 
similar to  OptRoleList instead of

simple opt_deflist:
'(' def_list ')'   { $$ = $2; }
  | /*EMPTY*/  { $$ = NIL; }
  ;

Is it acceptable?
List of new keywords is: LOCALE, LEXIZE, INIT, OPT, GETTOKEN, 
LEXTYPES, HEADLINE


So, syntax will be
CREATE FULLTEXT DICTIONARY dictname
LEXIZE  lexize_function
[ INIT init_function ]
[ OPT opt_text ];

CREATE FULLTEXT DICTIONARY dictname
   [ { LEXIZE  lexize_function | INIT init_function | OPT opt_text } 
[...] ]

LIKE template_dictname;




If we are worried about the size of the transition table and keeping it 
in cache (see remarks from Tom upthread) then adding more keywords seems 
a bad idea, as it will surely expand the table.  OTOH, I'd hate to make 
that a design criterion. My main worry has been that the grammar would 
be stable.


Just to quantify all this, I did a quick check on the grammar using 
bison -v - we appear to have 473 terminal symbols, and 420 non-terminal 
sybols in 1749 rules, generating 3142 states. The biggest tables 
generated are yytable and yycheck, each about 90kb on my machine.


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Pavel Stehule

   CREATE FULLTEXT CONFIGURATION myfts LIKE template_cfg AS DEFAULT;
   SELECT add_fulltext_config('myfts', 'template_cfg', True);



That's simple, but what about
CREATE FULLTEXT MAPPING ON cfgname FOR lexemetypename[, ...] WITH 
dictname1[, ...];

?

SELECT create_fulltext_mapping(cfgname, '{lexemetypename[, ...]}'::text[],
'{dictname1[, ...]}'::text[]);

Seems rather ugly for me...


Functions maybe doesn't see efective, but user's cannot learn new syntax.

SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'], ARRAY['...']) 
is readable.


I agree so enhancing parser oabout not standard construct isn't good.

And function interface does not provide autocompletion and online help in 
psql. \df says only types of arguments, not a meaning.


Yes, I miss better support function in psql too. But it's different topic. I 
don't see reason why

\h cannot support better functions.

Nice a day
Pavel Stehule

_
Emotikony a pozadi programu MSN Messenger ozivi vasi konverzaci. 
http://messenger.msn.cz/



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Markus Schiltknecht

Hi,

Andrew Dunstan wrote:
If we are worried about the size of the transition table and keeping it 
in cache (see remarks from Tom upthread) then adding more keywords seems 
a bad idea, as it will surely expand the table.  OTOH, I'd hate to make 
that a design criterion. 


Yeah, me too. Especially because it's an implementation issue against 
ease of use. (Or can somebody convince me that functions would provide a 
simple interface?)


My main worry has been that the grammar would 
be stable.


You mean stability of the grammar for the new additions or for all the 
grammar? Why are you worried about that?


Just to quantify all this, I did a quick check on the grammar using 
bison -v - we appear to have 473 terminal symbols, and 420 non-terminal 
sybols in 1749 rules, generating 3142 states. The biggest tables 
generated are yytable and yycheck, each about 90kb on my machine.


That already sounds somewhat better that Tom's 300 kb. And considering 
that these caches most probably grow faster than our grammar...


Regards

Markus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Markus Schiltknecht

Hi,

Pavel Stehule wrote:

Functions maybe doesn't see efective, but user's cannot learn new syntax.


Are you serious? That argument speaks exactly *for* extending the 
grammar. From other databases, users are used to:


CREATE TABLE ... (SQL)
CREATE INDEX ... (SQL)
CREATE FULLTEXT INDEX ... (Transact-SQL)
CREATE TABLE (... FULLTEXT ...) (MySQL)
CREATE INDEX ... INDEXTYPE IS ctxsys.context PARAMETERS ... (Oracle Text)

And users are constantly complaining that PostgreSQL doesn't have 
fulltext indexing capabilities (if they don't know about tsearch2) or 
about how hard it is to use tsearch2.


SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'], 
ARRAY['...']) is readable.


Hardly. Because it's not like SQL:

 - it's counter-intuitive to have to SELECT, when you want to CREATE 
something.

 - it's confusing to have two actions (select create)
 - why do I have to write ARRAYs to list parameters?
 - it's not obvious what you're selecting (return value?)
 - you have to keep track of the brackets, which can easily get messed 
up with two levels of them. Especially if the command gets multiple 
lines long.



I agree so enhancing parser oabout not standard construct isn't good.


Generally? Wow! This would mean PostgreSQL would always lack behind 
other RDBSes, regarding ease of use. Please don't do that!


Regards

Markus


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Joshua D. Drake
 
 And users are constantly complaining that PostgreSQL doesn't have
 fulltext indexing capabilities (if they don't know about tsearch2) or
 about how hard it is to use tsearch2.
 
 SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'],
 ARRAY['...']) is readable.
 
 Hardly. Because it's not like SQL:

I have to agree here.

SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'],
ARRAY['...']) is readable.

Is a total no op. We might as well just leave it in contrib.

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Pavel Stehule

 And users are constantly complaining that PostgreSQL doesn't have
 fulltext indexing capabilities (if they don't know about tsearch2) or
 about how hard it is to use tsearch2.

 SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'],
 ARRAY['...']) is readable.

 Hardly. Because it's not like SQL:

I have to agree here.

SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'],
ARRAY['...']) is readable.

Is a total no op. We might as well just leave it in contrib.



I am for integration tsearch to core, why not. But I don't see reason for 
special syntax. Stored procedures is exactly good tool for it.


Fulltext is standarised in SQL/MM, SQL Multimedia and Application Packages, 
Part 2: Full-Text


Why implement extensive proprietary solution? If our soulution is 
proprietary, then so it is simple and cheap and doesn't complicate future 
conformance with ANSI SQL.


Regards
Pavel Stehule

_
Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Joshua D. Drake
Pavel Stehule wrote:
  And users are constantly complaining that PostgreSQL doesn't have
  fulltext indexing capabilities (if they don't know about tsearch2) or
  about how hard it is to use tsearch2.
 
  SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'],
  ARRAY['...']) is readable.
 
  Hardly. Because it's not like SQL:

 I have to agree here.

 SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'],
 ARRAY['...']) is readable.

 Is a total no op. We might as well just leave it in contrib.

 
 I am for integration tsearch to core, why not. But I don't see reason
 for special syntax. Stored procedures is exactly good tool for it.

I am not talking about stored procedures. I am talking about a very
ugly, counter intuitive syntax above.

Initializing full text should be as simple as:

CREATE INDEX foo USING FULLTEXT(bar);

(or something similar)

Or:

CREATE TABLE foo (id serial, names text FULLTEXT);

Anything more complicated is a waste of cycles.

Joshua D. Drake



-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Pavel Stehule

I am not talking about stored procedures. I am talking about a very
ugly, counter intuitive syntax above.

Initializing full text should be as simple as:

CREATE INDEX foo USING FULLTEXT(bar);

(or something similar)

Or:

CREATE TABLE foo (id serial, names text FULLTEXT);

Anything more complicated is a waste of cycles.

Joshua D. Drake


I agree. Question: what about multilanguage fulltext.

CREATE INDEX foo USING FULLTEXT(bar) [ WITH czech_dictionary ];
CREATE TABLE foo (id serial, names text FULLTEXT [ (czech_dictionary, 
english_dictionary) ] );


all others can we do via SP.

Pavel Stehule

_
Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com. 
http://www.msn.cz/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Joshua D. Drake

 CREATE TABLE foo (id serial, names text FULLTEXT);

 Anything more complicated is a waste of cycles.

 Joshua D. Drake
 
 I agree. Question: what about multilanguage fulltext.
 
 CREATE INDEX foo USING FULLTEXT(bar) [ WITH czech_dictionary ];
 CREATE TABLE foo (id serial, names text FULLTEXT [ (czech_dictionary,
 english_dictionary) ] );
 
 all others can we do via SP.

That works for me with perhaps a default mapping to locales? For example
if our locale is en_us.UTF8 we are pretty assured that we are using english.

Joshua D. Drake



 
 Pavel Stehule
 
 _
 Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com.
 http://www.msn.cz/
 
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Pavel Stehule


 CREATE TABLE foo (id serial, names text FULLTEXT);

 Anything more complicated is a waste of cycles.

 Joshua D. Drake

 I agree. Question: what about multilanguage fulltext.

 CREATE INDEX foo USING FULLTEXT(bar) [ WITH czech_dictionary ];
 CREATE TABLE foo (id serial, names text FULLTEXT [ (czech_dictionary,
 english_dictionary) ] );

 all others can we do via SP.

That works for me with perhaps a default mapping to locales? For example
if our locale is en_us.UTF8 we are pretty assured that we are using 
english.




90% yes. 10% no. In czech typical task: find word without accents, or find 
german, english, czech stemmed word in multilanguage documents (or different 
languages depend on topology). Lot of databases are minimal bilanguagal (in 
czech rep. german and czech).


Pavel

p.s. missing collates is big minus for PostgreSQL in eu (we have some 
workarounds)


_
Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-22 Thread Robert Treat
On Thursday 25 January 2007 12:51, Oleg Bartunov wrote:
 On Thu, 25 Jan 2007, Nikolay Samokhvalov wrote:
  On 1/25/07, Teodor Sigaev [EMAIL PROTECTED] wrote:
  It's should clear enough for now - dump data from old db and load into
  new one.
  But dump should be without any contrib/tsearch2 related functions.
 
  Upgrading from 8.1.x to 8.2.x was not tivial because of very trivial
  change in API (actually not really API but the content of pg_ts_*
  tables): russian snowball stemming function was forked to 2 different
  ones, for koi8 and utf8 encodings. So, as I  dumped my pg_ts_* tables
  data (to keep my tsearch2 settings), I saw errors during restoration
  (btw, why didn't you keep old russian stemmer function name as a
  synonym to koi8 variant?) -- so, I had to change my dump file
  manually, because I didn't manage to follow tsearch2 best practices

 sed and grep did the trick.

  (to use some kind of bootstrap script that creates tsearch2
  configuration you need from default one -- using several INSERTs and
  UPDATEs). And there were no upgrade notes for tsearch2.

 This is unfair, you promised to write upgrade notes and we discussed the
 problem with name change before release and I rely on you. It was my fault,
 of course.


I got bit by this today and, afaict the best solution for the status quo would 
be to change the install schema to something like tsearch2, which would then 
allow for much easier dump and restore handling. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Markus Schiltknecht

Hi,

Tom Lane wrote:

You mean four different object types.  I'm not totally clear on bison's
scaling behavior relative to the number of productions


You really want to trade parser performance (which is *very* 
implementation specific) for ease of use?


Bison generates a LALR [1] parser, which depend quite a bit on the 
number of productions. But AFAIK the dependency is mostly on memory 
consumption for the internal symbol sets, not so much on runtime 
complexity. I didn't find hard facts about runtime complexity of LALR, 
though (pointers are very welcome).


Are there any ongoing efforts to rewrite the parser (i.e. using another 
algorithm, like a recursive descent parser)?


Regards

Markus

[1]: Wikipedia on the LALR parsing algorithm:
http://en.wikipedia.org/wiki/LALR_parser

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Florian G. Pflug

Markus Schiltknecht wrote:

Hi,

Tom Lane wrote:

You mean four different object types.  I'm not totally clear on bison's
scaling behavior relative to the number of productions


You really want to trade parser performance (which is *very* 
implementation specific) for ease of use?


Bison generates a LALR [1] parser, which depend quite a bit on the 
number of productions. But AFAIK the dependency is mostly on memory 
consumption for the internal symbol sets, not so much on runtime 
complexity. I didn't find hard facts about runtime complexity of LALR, 
though (pointers are very welcome).


According to http://en.wikipedia.org/wiki/LR_parser processing one
token in any LR(1) parser in the worst case needs to
 a) Do a lookup in the action table with the current (state, token) pair
 b) Do a lookup in the goto table with a (state, rule) pair.
 c) Push one state onto the stack, and pop n states with
n being the number of symbols (tokens or other rules) on the right
hand side of a rule.

a) and b) should be O(1). Processing one token pushes at most one state
onto the stack, so overall no more than N stats can be popped off again,
making the whole algorithm O(N) with N being the number of tokens of the
input stream.

AFAIK the only difference between SLR, LALR and LR(1) lies in the
generation of the goto and action tables.

Are there any ongoing efforts to rewrite the parser (i.e. using another 
algorithm, like a recursive descent parser)?

Why would you want to do that?

greetings, Florian Pflug

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Oleg Bartunov wrote:
  It's not so big addition to the gram.y, see a list of commands
  http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html.
 
  I looked at the diff file and the major change in gram.y is the creation
  of a new object type FULLTEXT,
 
 You mean four different object types.  I'm not totally clear on bison's
 scaling behavior relative to the number of productions, but I think
 there's no question that this patch will impose a measurable distributed
 penalty on every single query issued to Postgres by any application,
 whether it's heard of tsearch or not.  The percentage overhead would
 be a lot lower if the patch were introducing a similar number of entries
 into pg_proc.

My point is that the grammar splits off all the tsearch2 objects by
prefixing them with CREATE FULLTEXT object, where there are four object
types supported.

But as others have pointed out, the performance of the grammar is
probably not an issue in this case.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Markus Schiltknecht

Hi,

Florian G. Pflug wrote:

According to http://en.wikipedia.org/wiki/LR_parser processing one
token in any LR(1) parser in the worst case needs to
 a) Do a lookup in the action table with the current (state, token) pair
 b) Do a lookup in the goto table with a (state, rule) pair.
 c) Push one state onto the stack, and pop n states with
n being the number of symbols (tokens or other rules) on the right
hand side of a rule.

a) and b) should be O(1). Processing one token pushes at most one state
onto the stack, so overall no more than N stats can be popped off again,
making the whole algorithm O(N) with N being the number of tokens of the
input stream.


Looks correct, thanks. What exactly is Tom worried about, then?

Are there any ongoing efforts to rewrite the parser (i.e. using 
another algorithm, like a recursive descent parser)?

Why would you want to do that?


I recall having read something about rewriting the parser. Together with 
Tom being worried about parser performance and knowing GCC has switched 
to a hand written parser some time ago, I suspected bison to be slow. 
That's why I've asked.


Regards

Markus


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 Markus Schiltknecht wrote:
 I didn't find hard facts about runtime complexity of LALR, 
 though (pointers are very welcome).

 a) and b) should be O(1). Processing one token pushes at most one state
 onto the stack, so overall no more than N stats can be popped off again,
 making the whole algorithm O(N) with N being the number of tokens of the
 input stream.

Yeah.  I was concerned about the costs involved in trying to pack the
state tables, but it appears that that cost is all paid when the grammar
is compiled --- looking into gram.c, it appears the inner loop contains
just simple array lookups.  Still, bloating of the state tables is
something we ought to pay attention to, because there's a distributed
cost once they no longer fit in a processor's L1 cache.  On my machine
size gram.o is over 360K already ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Brian Hurt

Markus Schiltknecht wrote:


Hi,

I recall having read something about rewriting the parser. Together 
with Tom being worried about parser performance and knowing GCC has 
switched to a hand written parser some time ago, I suspected bison to 
be slow. That's why I've asked.


This has little to do with performance and everything to do with the 
insanity which is C++:

http://gnu.teleglobe.net/software/gcc/gcc-3.4/changes.html


* A hand-written recursive-descent C++ parser has replaced the
  YACC-derived C++ parser from previous GCC releases. The new
  parser contains much improved infrastructure needed for better
  parsing of C++ source codes, handling of extensions, and clean
  separation (where possible) between proper semantics analysis
  and parsing. The new parser fixes many bugs that were found in
  the old parser.



Short form: C++ is basically not LALR(1) parseable.

Brian



Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Florian G. Pflug

Markus Schiltknecht wrote:
Are there any ongoing efforts to rewrite the parser (i.e. using 
another algorithm, like a recursive descent parser)?

Why would you want to do that?


I recall having read something about rewriting the parser. Together with 
Tom being worried about parser performance and knowing GCC has switched 
to a hand written parser some time ago, I suspected bison to be slow. 
That's why I've asked.


I think the case is different for C and C++. The grammars of C and C++
appear to be much more parser-friendly then SQL, making handcrafting
a parser easier I'd think. And I believe that one of the reasons gcc 
wasn't happy with bison was that I limited the quality of their error 
reporting - which isn't that much of a problem for SQL, since SQL 
statements are rather short compared to your typical C/C++ source file.


Last, but not least, the C and C++ syntax is basically set in stone - At
least now the g++ supports nearly all (or all? don't know) of the C++ 
standard. So it doesn't really matter if changes to the parse are a bit 
more work, because the rarely happen. Postgres seems to add new features 
that change the grammar with every release (with is a good thing!).


greetings, Florian Pflug

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 Markus Schiltknecht wrote:
 Are there any ongoing efforts to rewrite the parser (i.e. using 
 another algorithm, like a recursive descent parser)?

 Why would you want to do that?

 Last, but not least, the C and C++ syntax is basically set in stone - At
 least now the g++ supports nearly all (or all? don't know) of the C++ 
 standard. So it doesn't really matter if changes to the parse are a bit 
 more work, because the rarely happen. Postgres seems to add new features 
 that change the grammar with every release (with is a good thing!).

Yeah.  I think it would be a pretty bad idea for us to go over to a
handwritten parser: not only greater implementation effort for grammar
changes, but greater risk of introducing bugs.  Bison tells you about it
when you've written something ambiguous ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Peter Eisentraut
Oleg Bartunov wrote:
 It's not so big addition to the gram.y, see a list of commands
 http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html.

In that proposed syntax, I would drop all =, ,, (, and ).  They 
don't seem necessary and they are untypical for SQL commands.  I'd 
compare with CREATE FUNCTION or CREATE SEQUENCE for SQL commands that 
do similar things.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Peter Eisentraut
Joshua D. Drake wrote:
 This is like the third time we have been around this problem. The
 syntax is clear and reasonable imo.

But others have differing opinions.

 Can we stop arguing about it and just include? If there are specific
 issues beyond syntax that is one
 thing, but that this point it seems we are arguing for the sake of
 arguing.

How is that worse than wanting to abort the discussion for the sake of 
aborting the discussion?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Oleg Bartunov

On Thu, 22 Feb 2007, Peter Eisentraut wrote:


Oleg Bartunov wrote:

It's not so big addition to the gram.y, see a list of commands
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html.


In that proposed syntax, I would drop all =, ,, (, and ).  They
don't seem necessary and they are untypical for SQL commands.  I'd
compare with CREATE FUNCTION or CREATE SEQUENCE for SQL commands that
do similar things.


that looks reasonable.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-20 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Teodor Sigaev wrote:
 We (Oleg and me) are glad to present tsearch in core of pgsql patch. In 
 basic, 
 layout, functions, methods, types etc are the same as in current tsearch2 
 with a 
 lot of improvements:
 
   - pg_ts_* tables now are in pg_catalog
   - parsers, dictionaries, configurations now have owner and namespace 
 similar to
 other pgsql's objects like tables, operator classes etc
   - current tsearch configuration is managed with a help of GUC variable
 tsearch_conf_name.
   - choosing of tsearch cfg by locale may be done for each schema separately
   - managing of tsearch configuration with a help of SQL commands, not with
 insert/update/delete statements. This allows to drive dependencies,
 correct dumping and dropping.
   - psql support with a help of \dF* commands
   - add all available Snowball stemmers and corresponding configuration
   - correct memory freeing by any dictionary
 
 Work is sponsored by EnterpriseDB's PostgreSQL Development Fund.
 
 patch: http://www.sigaev.ru/misc/tsearch_core-0.33.gz
 docs: http://mira.sai.msu.su/~megera/pgsql/ftsdoc/ (not yet completed and 
 it's 
 not yet a patch, just a SGML source)
 
 Implementation details:
 - directory layout
src/backend/utils/adt/tsearch - all IO function and simple operations
src/backend/utils/tsearch - complex processing functions, including
 language processing and dictionaries
 - most of snowball dictionaries are placed in separate .so library and
they plug in into data base by similar way as character conversation
library does.
 
 If there aren't objections then we plan commit patch tomorrow or after 
 tomorrow.
 Before committing, I'll changes oids from 5000+ to lower values to prevent 
 holes 
 in oids. And after that, I'll remove tsearch2 contrib module.
 
 -- 
 Teodor Sigaev   E-mail: [EMAIL PROTECTED]
 WWW: http://www.sigaev.ru/
 
 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at
 
 http://www.postgresql.org/about/donate

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-20 Thread Bruce Momjian

FYI, I added this to the patches queue because I think we decided
full-text indexing should be in the core.  If I am wrong, please let me
know.

---

Teodor Sigaev wrote:
 We (Oleg and me) are glad to present tsearch in core of pgsql patch. In 
 basic, 
 layout, functions, methods, types etc are the same as in current tsearch2 
 with a 
 lot of improvements:
 
   - pg_ts_* tables now are in pg_catalog
   - parsers, dictionaries, configurations now have owner and namespace 
 similar to
 other pgsql's objects like tables, operator classes etc
   - current tsearch configuration is managed with a help of GUC variable
 tsearch_conf_name.
   - choosing of tsearch cfg by locale may be done for each schema separately
   - managing of tsearch configuration with a help of SQL commands, not with
 insert/update/delete statements. This allows to drive dependencies,
 correct dumping and dropping.
   - psql support with a help of \dF* commands
   - add all available Snowball stemmers and corresponding configuration
   - correct memory freeing by any dictionary
 
 Work is sponsored by EnterpriseDB's PostgreSQL Development Fund.
 
 patch: http://www.sigaev.ru/misc/tsearch_core-0.33.gz
 docs: http://mira.sai.msu.su/~megera/pgsql/ftsdoc/ (not yet completed and 
 it's 
 not yet a patch, just a SGML source)
 
 Implementation details:
 - directory layout
src/backend/utils/adt/tsearch - all IO function and simple operations
src/backend/utils/tsearch - complex processing functions, including
 language processing and dictionaries
 - most of snowball dictionaries are placed in separate .so library and
they plug in into data base by similar way as character conversation
library does.
 
 If there aren't objections then we plan commit patch tomorrow or after 
 tomorrow.
 Before committing, I'll changes oids from 5000+ to lower values to prevent 
 holes 
 in oids. And after that, I'll remove tsearch2 contrib module.
 
 -- 
 Teodor Sigaev   E-mail: [EMAIL PROTECTED]
 WWW: http://www.sigaev.ru/
 
 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at
 
 http://www.postgresql.org/about/donate

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-20 Thread Alvaro Herrera
Bruce Momjian wrote:
 
 FYI, I added this to the patches queue because I think we decided
 full-text indexing should be in the core.  If I am wrong, please let me
 know.

One of the objections I remember to this particular implementation was
that configuration should be done using functions rather than new syntax
in gram.y.  This seems a good idea because it avoids bloating the
grammar, while still allowing dependency tracking, pg_dump support,
syscache support etc.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-20 Thread Oleg Bartunov

On Tue, 20 Feb 2007, Alvaro Herrera wrote:


Bruce Momjian wrote:


FYI, I added this to the patches queue because I think we decided
full-text indexing should be in the core.  If I am wrong, please let me
know.


One of the objections I remember to this particular implementation was
that configuration should be done using functions rather than new syntax
in gram.y.  This seems a good idea because it avoids bloating the
grammar, while still allowing dependency tracking, pg_dump support,
syscache support etc.


It's not so big addition to the gram.y, see a list of commands
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html.
SQL commands make FTS syntax clear and follow tradition to manage
system objects. From the user's side, I'd be very unhappy to configure
FTS, which can be very complex, using functions.  All we want is to 
provide users clear syntax.



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-20 Thread Bruce Momjian
Oleg Bartunov wrote:
 On Tue, 20 Feb 2007, Alvaro Herrera wrote:
 
  Bruce Momjian wrote:
 
  FYI, I added this to the patches queue because I think we decided
  full-text indexing should be in the core.  If I am wrong, please let me
  know.
 
  One of the objections I remember to this particular implementation was
  that configuration should be done using functions rather than new syntax
  in gram.y.  This seems a good idea because it avoids bloating the
  grammar, while still allowing dependency tracking, pg_dump support,
  syscache support etc.
 
 It's not so big addition to the gram.y, see a list of commands
 http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html.
 SQL commands make FTS syntax clear and follow tradition to manage
 system objects. From the user's side, I'd be very unhappy to configure
 FTS, which can be very complex, using functions.  All we want is to 
 provide users clear syntax.

I looked at the diff file and the major change in gram.y is the creation
of a new object type FULLTEXT, so you can CREATE, ALTER and DROP
FULLTEXT.

I don't know fulltext administration well enough, so if Oleg says a
function API would be too complex, I am OK with his new parser syntax.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-20 Thread Joshua D. Drake




It's not so big addition to the gram.y, see a list of commands
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html.
SQL commands make FTS syntax clear and follow tradition to manage
system objects. From the user's side, I'd be very unhappy to configure
FTS, which can be very complex, using functions.  All we want is to 
provide users clear syntax.
This is like the third time we have been around this problem. The syntax 
is clear and reasonable imo.
Can we stop arguing about it and just include? If there are specific 
issues beyond syntax that is one

thing, but that this point it seems we are arguing for the sake of arguing.

Joshua D. Drake


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-20 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Oleg Bartunov wrote:
 It's not so big addition to the gram.y, see a list of commands
 http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html.

 I looked at the diff file and the major change in gram.y is the creation
 of a new object type FULLTEXT,

You mean four different object types.  I'm not totally clear on bison's
scaling behavior relative to the number of productions, but I think
there's no question that this patch will impose a measurable distributed
penalty on every single query issued to Postgres by any application,
whether it's heard of tsearch or not.  The percentage overhead would
be a lot lower if the patch were introducing a similar number of entries
into pg_proc.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-26 Thread Naz Gassiep

Andrew Dunstan wrote:

I am constantly running into this:

Q. Does PostgreSQL have full text indexing?
A. Yes it is in contrib.
Q. But that isn't part of core.
A. *sigh*

Where on the website can I see what plugins are included with 
PostgreSQL?


Where on the website can I see the Official PostgreSQL Documentation for
Full Text Indexing?

With TSearch2 in core will that fix the many upgrade problems associated
with using TSearch2?


  


contrib is a horrible misnomer. Can we maybe bite the bullet and call 
it something else?
After years of PG use, I am still afraid to use contrib modules because 
it just *feels* like voodoo. I have spent much time reading this mailing 
list and on IRC with PG users, and I know that contrib modules are on 
the whole tested and safe, but the lack of web documentation and any 
indication of what they do other than check the notes that come with 
the source makes me just feel like they are use and cross fingers 
type thing.


I don't know how hard it would be to implement, but perhaps contrib 
modules could be compiled in a similar way to Apache modules. E.g., 
./configure --with-modulename   with the onus for packaging them 
appropriately falling onto the shoulders of the module authors. I feel 
that even a basic module management system like this would greatly 
increase awareness of and confidence in the contrib modules. Oh, and

+1 on renaming contrib
+1 on the need for a comprehensive list of them
+1 on the need for more doc on the website about each of them, onus 
falling on module authors, perhaps require at least a basic doc patch as 
a requirement for /contrib inclusion.


- Naz

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-25 Thread Bernd Helmle



On Wed, 24 Jan 2007 22:27:10 +0100, Peter Eisentraut [EMAIL PROTECTED] wrote:
 I wrote:
 The closest I could find is Oracle Text, the full-text search for
 Oracle.
 
 Oh, and note that Oracle Text is an extension and not included in the
 Oracle database product proper.
 

Same with DB2 NSE, IBM's fulltext search engine for their UDB. However, they 
employ external
admin tools like db2text to create, configure and alter fulltext indexes (like 
slonik
for example). Textsearch could be done with functions (contains()) in SQL.

Bernd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-25 Thread Dawid Kuroczko

On 1/24/07, Andrew Dunstan [EMAIL PROTECTED] wrote:

Peter Eisentraut wrote:
 contrib is a horrible misnomer. Can we maybe bite the bullet and call
 it something else?
 plugins?


How about 'modules' or 'extras' or 'extensions'? :)


standard-plugins might be more informative. I think of them as being
like perl's standard modules, things that are part of the standard perl
distribution as opposed to all the other stuff on CPAN.


Personally, I don't quite like 'plugins'.  it may be that when I think of
plugins, I think of 'GIMP plugins'. ;)  And I think hosting providers
would exclude plugins almost as often as they do with contrib.
They are not 'core' so it's safe to exclude them

Same with 'extras' or 'extensions' -- they seem to imply that you
can do without them.

This is the reason I like 'modules' best.  It makes one think that it is
something maybe part of core, maybe not, but it has been isolated
into separate entity for maintenance reasons.

My EUR 0.02

  Regards,
 Dawid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-25 Thread Peter Eisentraut
Dawid Kuroczko wrote:
 This is the reason I like 'modules' best.  It makes one think that it
 is something maybe part of core, maybe not, but it has been isolated
 into separate entity for maintenance reasons.

On etymological grounds, modules would also be my favorite, but the 
term module is already used in the SQL standard for something 
different.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-25 Thread Teodor Sigaev

This is a fairly large patch and I would like the chance to review it
before it goes in --- we'll commit tomorrow is not exactly a decent
review window.

Not a problem.


One possible argument for this over the contrib version is a saner
approach to dumping and restoring configurations.  However, as against
that:

1) what's the upgrade path for getting an existing tsearch2
configuration into this implementation?


It's should clear enough for now - dump data from old db and load into new one. 
But dump should be without any contrib/tsearch2 related functions.




2) once we put this in core we are going to be stuck with supporting its
SQL API forever.  Are we convinced that this API is the one we want?
I don't recall even having seen any proposal or discussion.  It was OK
for tsearch2's API to change every release while it was in contrib, but
the expectation of stability is a whole lot higher for core features.


Basic tsearch2 SQL API doesn't changed since its first release, just extended. 
As I can see, there isn't any standard of fulltext search in SQL. DB/2, MS SQL, 
Oracle and MySQL use different SQL API. I don't know which better. I remember 
only one suggestion: 'CREATE FULLTEXT INDEX ...'. So, I believe, existing SQL 
API satisfies users. But it possible to emulate on grammar level subset of MySQL

syntax:
SQL commands
CREATE FULLTEXT INDEX idxname ON tbl [ USING {GIN|GIST} ] ( field1[, [...]] );
SELECT .. FROM table WHERE MATCH( field1[, [...]] ) AGAINST ( txt );

will be translated to
CREATE INDEX idxname ON tbl [ USING {GIN|GIST} ] ( to_tsquery(field1)[ || 
[...]] );
SELECT .. FROM table WHERE ( to_tsquery(field1)[ || [...]] ) @@ plainto_tsquery( 
txt );


 Notes
  1 that is full equivalent MySQL's MATCH() AGAINST (txt IN BOOLEAN MODE)
  2 it requires to keyword MATCH  AGAINST which cannot be a function's name
without quoting.

Internal API changed sometimes (not every release), but I don't see a problem 
here: all other internal API's in postgres are often changed.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-25 Thread Teodor Sigaev

the patch. I'm personally not sold on the need for modifications to the
SQL grammar, for example, as opposed to just using a set of SQL-callable
functions and some new system catalogs.


SQL grammar isn't changed significantly - just add variants of CREATE/DROP/ALTER 
/COMMENTS commands. Next, functions haven't autocomplete feature or built-in 
quick help - if you don't remember exactly kind/type of argument(s) of function 
then you should read a docs.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-25 Thread Joshua D. Drake
Teodor Sigaev wrote:
 the patch. I'm personally not sold on the need for modifications to the
 SQL grammar, for example, as opposed to just using a set of SQL-callable
 functions and some new system catalogs.
 
 SQL grammar isn't changed significantly - just add variants of
 CREATE/DROP/ALTER /COMMENTS commands. Next, functions haven't
 autocomplete feature or built-in quick help - if you don't remember
 exactly kind/type of argument(s) of function then you should read a docs.

I didn't read the patch but I did skim the docs for this and if the docs
are current I see things like this:

CREATE FULLTEXT DICTIONARY en_ispell
( OPT = 'DictFile=ispell/english.dict,
 AffFile=ispell/english.aff,
 StopFile=english.stop'
) LIKE ispell_template;



ALTER FULLTEXT DICTIONARY en_stem SET OPT='english.stop';


Which to me is perfectly reasonable and intuitive. It is unfortunate
though that we still have the more odd grammar of actually using Tsearch
to query. Although I don't really have a better suggestion without
adding some ungodly obscure operator.


Sincerely,

Joshua D. Drake









-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-25 Thread Nikolay Samokhvalov

On 1/25/07, Teodor Sigaev [EMAIL PROTECTED] wrote:

It's should clear enough for now - dump data from old db and load into new one.
But dump should be without any contrib/tsearch2 related functions.


Upgrading from 8.1.x to 8.2.x was not tivial because of very trivial
change in API (actually not really API but the content of pg_ts_*
tables): russian snowball stemming function was forked to 2 different
ones, for koi8 and utf8 encodings. So, as I  dumped my pg_ts_* tables
data (to keep my tsearch2 settings), I saw errors during restoration
(btw, why didn't you keep old russian stemmer function name as a
synonym to koi8 variant?) -- so, I had to change my dump file
manually, because I didn't manage to follow tsearch2 best practices
(to use some kind of bootstrap script that creates tsearch2
configuration you need from default one -- using several INSERTs and
UPDATEs). And there were no upgrade notes for tsearch2.

So, I consider upgrading process for tsearch2 to be a little bit
tricky till present. I assume it will be improved with 8.3...

--
Best regards,
Nikolay

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-25 Thread Teodor Sigaev

though that we still have the more odd grammar of actually using Tsearch
to query. Although I don't really have a better suggestion without
adding some ungodly obscure operator.


IMHO, best possible solution is 'WHERE table.text_field @ text'.
Operator @ internally makes equivalent of 'to_tsvector(table.text_field) @@ 
plainto_tsquery(text)', it's also possible to add GIN/GIST opclasses to speedup 
search queries. Performance of making headline in this case will be decreased 
insignificant, but ranking time will be disastrous. Because of reparsing of 
whole found texts. GIST performance may be decreased too - GIST indexing of 
tsvector is lossy.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-25 Thread Oleg Bartunov

On Thu, 25 Jan 2007, Nikolay Samokhvalov wrote:


On 1/25/07, Teodor Sigaev [EMAIL PROTECTED] wrote:
It's should clear enough for now - dump data from old db and load into new 
one.

But dump should be without any contrib/tsearch2 related functions.


Upgrading from 8.1.x to 8.2.x was not tivial because of very trivial
change in API (actually not really API but the content of pg_ts_*
tables): russian snowball stemming function was forked to 2 different
ones, for koi8 and utf8 encodings. So, as I  dumped my pg_ts_* tables
data (to keep my tsearch2 settings), I saw errors during restoration
(btw, why didn't you keep old russian stemmer function name as a
synonym to koi8 variant?) -- so, I had to change my dump file
manually, because I didn't manage to follow tsearch2 best practices


sed and grep did the trick.


(to use some kind of bootstrap script that creates tsearch2
configuration you need from default one -- using several INSERTs and
UPDATEs). And there were no upgrade notes for tsearch2.


This is unfair, you promised to write upgrade notes and we discussed the
problem with name change before release and I rely on you. It was my fault,
of course.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Teodor Sigaev
We (Oleg and me) are glad to present tsearch in core of pgsql patch. In basic, 
layout, functions, methods, types etc are the same as in current tsearch2 with a 
lot of improvements:


 - pg_ts_* tables now are in pg_catalog
 - parsers, dictionaries, configurations now have owner and namespace similar to
   other pgsql's objects like tables, operator classes etc
 - current tsearch configuration is managed with a help of GUC variable
   tsearch_conf_name.
 - choosing of tsearch cfg by locale may be done for each schema separately
 - managing of tsearch configuration with a help of SQL commands, not with
   insert/update/delete statements. This allows to drive dependencies,
   correct dumping and dropping.
 - psql support with a help of \dF* commands
 - add all available Snowball stemmers and corresponding configuration
 - correct memory freeing by any dictionary

Work is sponsored by EnterpriseDB's PostgreSQL Development Fund.

patch: http://www.sigaev.ru/misc/tsearch_core-0.33.gz
docs: http://mira.sai.msu.su/~megera/pgsql/ftsdoc/ (not yet completed and it's 
not yet a patch, just a SGML source)


Implementation details:
- directory layout
  src/backend/utils/adt/tsearch - all IO function and simple operations
  src/backend/utils/tsearch - complex processing functions, including
  language processing and dictionaries
- most of snowball dictionaries are placed in separate .so library and
  they plug in into data base by similar way as character conversation
  library does.

If there aren't objections then we plan commit patch tomorrow or after tomorrow.
Before committing, I'll changes oids from 5000+ to lower values to prevent holes 
in oids. And after that, I'll remove tsearch2 contrib module.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Peter Eisentraut
Teodor Sigaev wrote:
 If there aren't objections then we plan commit patch tomorrow or
 after tomorrow.

I still haven't heard any argument for why this would be necessary or 
desirable at all, other than that it looks better for marketing 
reasons, which I will counter by saying that it looks worse for 
marketing reasons because our hailed plugin mechanism is apparently so 
poor that it can't support some practical extension module such as 
this.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Joshua D. Drake
Peter Eisentraut wrote:
 Teodor Sigaev wrote:
 If there aren't objections then we plan commit patch tomorrow or
 after tomorrow.
 
 I still haven't heard any argument for why this would be necessary or 
 desirable at all, other than that it looks better for marketing 
 reasons, which I will counter by saying that it looks worse for 
 marketing reasons because our hailed plugin mechanism is apparently so 
 poor that it can't support some practical extension module such as 
 this.

Of which I will counter that we don't have a hailed plugin mechanism. We
have a contrib which professionals generally consider untested and not
part of PostgreSQL.

I am constantly running into this:

Q. Does PostgreSQL have full text indexing?
A. Yes it is in contrib.
Q. But that isn't part of core.
A. *sigh*

Where on the website can I see what plugins are included with PostgreSQL?

Where on the website can I see the Official PostgreSQL Documentation for
Full Text Indexing?

With TSearch2 in core will that fix the many upgrade problems associated
with using TSearch2?

Sincerely,

Joshua D. Drake


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Tom Lane
 Teodor Sigaev wrote:
 If there aren't objections then we plan commit patch tomorrow or
 after tomorrow.

This is a fairly large patch and I would like the chance to review it
before it goes in --- we'll commit tomorrow is not exactly a decent
review window.

Peter Eisentraut [EMAIL PROTECTED] writes:
 I still haven't heard any argument for why this would be necessary or 
 desirable at all, other than that it looks better for marketing 
 reasons,

One possible argument for this over the contrib version is a saner
approach to dumping and restoring configurations.  However, as against
that:

1) what's the upgrade path for getting an existing tsearch2
configuration into this implementation?

2) once we put this in core we are going to be stuck with supporting its
SQL API forever.  Are we convinced that this API is the one we want?
I don't recall even having seen any proposal or discussion.  It was OK
for tsearch2's API to change every release while it was in contrib, but
the expectation of stability is a whole lot higher for core features.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Andrew Dunstan

Joshua D. Drake wrote:

Peter Eisentraut wrote:
  

Teodor Sigaev wrote:


If there aren't objections then we plan commit patch tomorrow or
after tomorrow.
  
I still haven't heard any argument for why this would be necessary or 
desirable at all, other than that it looks better for marketing 
reasons, which I will counter by saying that it looks worse for 
marketing reasons because our hailed plugin mechanism is apparently so 
poor that it can't support some practical extension module such as 
this.



Of which I will counter that we don't have a hailed plugin mechanism. We
have a contrib which professionals generally consider untested and not
part of PostgreSQL.

I am constantly running into this:

Q. Does PostgreSQL have full text indexing?
A. Yes it is in contrib.
Q. But that isn't part of core.
A. *sigh*

Where on the website can I see what plugins are included with PostgreSQL?

Where on the website can I see the Official PostgreSQL Documentation for
Full Text Indexing?

With TSearch2 in core will that fix the many upgrade problems associated
with using TSearch2?


  


contrib is a horrible misnomer. Can we maybe bite the bullet and call it 
something else?


cheers

andrew


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Jeff Davis
On Wed, 2007-01-24 at 19:15 +0100, Peter Eisentraut wrote:
 Teodor Sigaev wrote:
  If there aren't objections then we plan commit patch tomorrow or
  after tomorrow.
 
 I still haven't heard any argument for why this would be necessary or 
 desirable at all, other than that it looks better for marketing 
 reasons, which I will counter by saying that it looks worse for 
 marketing reasons because our hailed plugin mechanism is apparently so 
 poor that it can't support some practical extension module such as 
 this.
 

On that point, why do we have /contrib? It's for plugins that are so
version-dependent that they can't exist as a separate project, as I
understand it.

But what we want when we say we have a plugin mechanism is something
more like CPAN, where software is developed on it's own timeline and can
be added seamlessly into any version of PostgreSQL that supports the
needs of the project.

PostGIS is a good example of this. You don't have to wait for a
PostgreSQL release to upgrade PostGIS, and they don't have to discuss
the intricacies of spatial queries and data on -hackers.

If tsearch2 really does need to be in lockstep with the PostgreSQL
releases (although I don't see why it does), I don't see a problem
putting it in core. It's an important feature, and we're already giving
up a lot of the benefits of plugins anyway by distributing it with the
project.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread David Fetter
On Wed, Jan 24, 2007 at 01:53:54PM -0500, Andrew Dunstan wrote:
 Joshua D. Drake wrote:
 Peter Eisentraut wrote:
   
 Teodor Sigaev wrote:
 
 If there aren't objections then we plan commit patch tomorrow or
 after tomorrow.
   
 I still haven't heard any argument for why this would be necessary or 
 desirable at all, other than that it looks better for marketing 
 reasons, which I will counter by saying that it looks worse for 
 marketing reasons because our hailed plugin mechanism is apparently so 
 poor that it can't support some practical extension module such as 
 this.
 
 Of which I will counter that we don't have a hailed plugin mechanism. We
 have a contrib which professionals generally consider untested and not
 part of PostgreSQL.
 
 I am constantly running into this:
 
 Q. Does PostgreSQL have full text indexing?
 A. Yes it is in contrib.
 Q. But that isn't part of core.
 A. *sigh*
 
 Where on the website can I see what plugins are included with
 PostgreSQL?
 
 Where on the website can I see the Official PostgreSQL
 Documentation for Full Text Indexing?
 
 With TSearch2 in core will that fix the many upgrade problems
 associated with using TSearch2?
 
 contrib is a horrible misnomer. Can we maybe bite the bullet and
 call it something else?

Some version of version-dependent plugins?

Cheers,
D (who hasn't come up with anything shorter just yet)
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Jeremy Drake
On Wed, 24 Jan 2007, Peter Eisentraut wrote:

 Teodor Sigaev wrote:
  If there aren't objections then we plan commit patch tomorrow or
  after tomorrow.

 I still haven't heard any argument for why this would be necessary or
 desirable at all, other than that it looks better for marketing
 reasons, which I will counter by saying that it looks worse for
 marketing reasons because our hailed plugin mechanism is apparently so
 poor that it can't support some practical extension module such as
 this.

I for one am greatly looking forward to tsearch2 being in core.  I was
very fond of the plugin mechanism, until I signed up with a hosting
provider.  I do not have superuser privileges on the database cluster, and
they will not install any plugins due to unspecified security concerns.
So ATM if I want full text indexing, my only choice would be to avail
myself of their mysql instance which has it built in.  So I have been
jaded, and my opinion of optional plugins has gone from wow, this is
neat to man, this is a pain.  They do not install plpgsql so I cannot
write any triggers, they don't install tsearch2 so I don't get full text
indexing, so all of the great features of postgres I have come to enjoy on
my own box are suddenly taken away :(

Sorry for the rant, I am just looking forward to 8.3 so I could get full
text indexing...

-- 
ARCHDUKE FERDINAND FOUND ALIVE --
FIRST WORLD WAR A MISTAKE

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Joshua D. Drake
Jeremy Drake wrote:
 On Wed, 24 Jan 2007, Peter Eisentraut wrote:
 
 Teodor Sigaev wrote:
 If there aren't objections then we plan commit patch tomorrow or
 after tomorrow.
 I still haven't heard any argument for why this would be necessary or
 desirable at all, other than that it looks better for marketing
 reasons, which I will counter by saying that it looks worse for
 marketing reasons because our hailed plugin mechanism is apparently so
 poor that it can't support some practical extension module such as
 this.
 
 I for one am greatly looking forward to tsearch2 being in core.  I was
 very fond of the plugin mechanism, until I signed up with a hosting
 provider.  I do not have superuser privileges on the database cluster, and
 they will not install any plugins due to unspecified security concerns.

You could move to Hub or Command Prompt ;)

Joshua D. Drake

 So ATM if I want full text indexing, my only choice would be to avail
 myself of their mysql instance which has it built in.  So I have been
 jaded, and my opinion of optional plugins has gone from wow, this is
 neat to man, this is a pain.  They do not install plpgsql so I cannot
 write any triggers, they don't install tsearch2 so I don't get full text
 indexing, so all of the great features of postgres I have come to enjoy on
 my own box are suddenly taken away :(
 
 Sorry for the rant, I am just looking forward to 8.3 so I could get full
 text indexing...
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Neil Conway
On Wed, 2007-01-24 at 13:49 -0500, Tom Lane wrote:
 2) once we put this in core we are going to be stuck with supporting its
 SQL API forever.  Are we convinced that this API is the one we want?
 I don't recall even having seen any proposal or discussion.

There has been some prior discussion:

http://archives.postgresql.org/pgsql-hackers/2006-12/msg00919.php

But I agree that we need considerably more discussion before committing
the patch. I'm personally not sold on the need for modifications to the
SQL grammar, for example, as opposed to just using a set of SQL-callable
functions and some new system catalogs.

Another question that would be easier to resolve before the patch is
committed is naming: the patch currently uses a mix of full text and
tsearch[2] as the name of the full-text search feature. If we're going
to bless this as the integrated full-text search in PG, it might make
more sense to use full text search and FTS exclusively.

-Neil



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Andrew Dunstan

Jeremy Drake wrote:

On Wed, 24 Jan 2007, Peter Eisentraut wrote:
  

I still haven't heard any argument for why this would be necessary or
desirable at all, other than that it looks better for marketing
reasons, which I will counter by saying that it looks worse for
marketing reasons because our hailed plugin mechanism is apparently so
poor that it can't support some practical extension module such as
this.



I for one am greatly looking forward to tsearch2 being in core.  

  


For goodness' sake! This is work that's been sponsored! Are we going to 
turn around now and reject it? We'd be a laughing stock.


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Peter Eisentraut
Andrew Dunstan wrote:
 contrib is a horrible misnomer. Can we maybe bite the bullet and call
 it something else?

plugins?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Peter Eisentraut
Jeff Davis wrote:
 On that point, why do we have /contrib? It's for plugins that are
 so version-dependent that they can't exist as a separate project, as
 I understand it.

No.  (I don't know a better and succinct answer, but that is not it.)

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Peter Eisentraut
Jeremy Drake wrote:
 I for one am greatly looking forward to tsearch2 being in core.  I
 was very fond of the plugin mechanism, until I signed up with a
 hosting provider.

Yes, you have told us about your hosting provider before.  Just make 
sure your next hosting provider does not refuse to install database 
objects whose OID is a multiple of 13 because of bad luck, or you might 
miss out on full-text indexing again.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Joshua D. Drake
Peter Eisentraut wrote:
 Jeremy Drake wrote:
 I for one am greatly looking forward to tsearch2 being in core.  I
 was very fond of the plugin mechanism, until I signed up with a
 hosting provider.
 
 Yes, you have told us about your hosting provider before.  Just make 
 sure your next hosting provider does not refuse to install database 
 objects whose OID is a multiple of 13 because of bad luck, or you might 
 miss out on full-text indexing again.

Well we just turn off OIDs to help prevent that possible curse.

Sincerely,

Joshua D. Drake

 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Peter Eisentraut
Neil Conway wrote:
 But I agree that we need considerably more discussion before
 committing the patch. I'm personally not sold on the need for
 modifications to the SQL grammar, for example, as opposed to just
 using a set of SQL-callable functions and some new system catalogs.

In particular, I would think that unless one is affiliated with The New 
COBOL World Order, one would *prefer* a set of functions over new SQL 
statements.  And using functions to manage extensions seems to be the 
established way in Oracle land, if that matters at all.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Stefan Kaltenbrunner
Neil Conway wrote:
 On Wed, 2007-01-24 at 13:49 -0500, Tom Lane wrote:
 2) once we put this in core we are going to be stuck with supporting its
 SQL API forever.  Are we convinced that this API is the one we want?
 I don't recall even having seen any proposal or discussion.
 
 There has been some prior discussion:
 
 http://archives.postgresql.org/pgsql-hackers/2006-12/msg00919.php
 
 But I agree that we need considerably more discussion before committing
 the patch. I'm personally not sold on the need for modifications to the
 SQL grammar, for example, as opposed to just using a set of SQL-callable
 functions and some new system catalogs.

I think one can find arguments for both variants - one of the question
might even be how other databases are doing that and if the proposed
syntax is resembling one of those or not.


 
 Another question that would be easier to resolve before the patch is
 committed is naming: the patch currently uses a mix of full text and
 tsearch[2] as the name of the full-text search feature. If we're going
 to bless this as the integrated full-text search in PG, it might make
 more sense to use full text search and FTS exclusively.

making this consistent makes a lot of sense and I agree that it might be
a good idea to just call it FTS (or similiar).
But on the other side would have to go as far as renaming
TSVECTOR/TSQUERY to FTSVECTOR/FTSQUERY or similiar which might pose some
considerable headache for people upgrading from the contrib/ version.


Stefan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Andrew Dunstan

Peter Eisentraut wrote:

Andrew Dunstan wrote:
  

contrib is a horrible misnomer. Can we maybe bite the bullet and call
it something else?



plugins?

  


standard-plugins might be more informative. I think of them as being 
like perl's standard modules, things that are part of the standard perl 
distribution as opposed to all the other stuff on CPAN.


Maybe it needs to split into two - things that are genuine plugins and 
other stuff (e.g. start-scripts).



cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-24 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Neil Conway wrote:
 Another question that would be easier to resolve before the patch is
 committed is naming: the patch currently uses a mix of full text and
 tsearch[2] as the name of the full-text search feature. If we're going
 to bless this as the integrated full-text search in PG, it might make
 more sense to use full text search and FTS exclusively.

 making this consistent makes a lot of sense and I agree that it might be
 a good idea to just call it FTS (or similiar).
 But on the other side would have to go as far as renaming
 TSVECTOR/TSQUERY to FTSVECTOR/FTSQUERY or similiar which might pose some
 considerable headache for people upgrading from the contrib/ version.

If we use text search (abbrev TS) as the key phrase we can avoid that.

But this reiterates my point that the upgrade path for existing tsearch2
users is an important thing to consider.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


  1   2   >