Re: [Fwd: Re: [HACKERS] tsearch in core patch]
> Ishii-san, > > >>> Ok, probably we need to copy the English stemming rule to the one for > >>> Japanese. > >> Pardon my ignorance here, but is the concept of stemming even relevant > >> to Japanese/Chinese/Korean? What little I know about ideographic > >> languages suggests it wouldn't work well. And surely the specific rules > >> in the Snowball project's English stemmer wouldn't work. > > > > Your undestanding is correct. English stemmer would not work for > > Japanese "non English" part. > > That reminds me, don't you guys have your own full text search for > Japanese? Planning on merging it with the core code anytime soon? No. Actually Japanese (non English part) does not need stemming at all. However, since Japanese is an agglutinative language, we have to break continuous Japanese string into space separated "words". For example, we need to break: todayisfine into: today is fine (of course those English are just for non-Japanese spearker's understanding, actually they are Japanese). For this we need good dictionary and software. Fortunately we have several kinds of open source softwares for this pupose. Once I have written a PostgreSQL C function envoking one of these software to do the work and it works great with tsearch2. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [Fwd: Re: [HACKERS] tsearch in core patch]
Ishii-san, Ok, probably we need to copy the English stemming rule to the one for Japanese. Pardon my ignorance here, but is the concept of stemming even relevant to Japanese/Chinese/Korean? What little I know about ideographic languages suggests it wouldn't work well. And surely the specific rules in the Snowball project's English stemmer wouldn't work. Your undestanding is correct. English stemmer would not work for Japanese "non English" part. That reminds me, don't you guys have your own full text search for Japanese? Planning on merging it with the core code anytime soon? --Josh ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch in core patch
Teodor Sigaev <[EMAIL PROTECTED]> writes: >> But why do you need them to be different at all? Just make it >> russian Russian_Russia >> russian ru_RU >> >> Does that not work for some reason? > I'd like to have unique names of configuration. So, if user sets GUC variable > or > call function with configuration's name then postgres should not have a > choice > --- it should use pointed configuration exactly. Sure, but the configuration name in this example is "russian", and it's unique, no? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch
But why do you need them to be different at all? Just make it russian Russian_Russia russian ru_RU Does that not work for some reason? I'd like to have unique names of configuration. So, if user sets GUC variable or call function with configuration's name then postgres should not have a choice --- it should use pointed configuration exactly. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [Fwd: Re: [HACKERS] tsearch in core patch]
On 6/25/07, Tom Lane <[EMAIL PROTECTED]> wrote: "Mike Rylander" <[EMAIL PROTECTED]> writes: > I can certainly understand the benefit of making the default > configuration a simple locale to language map, but there are > definitely uses for searching using different stemmers/stop-lists even > within the same corpus/index. So, as a datapoint for the discussion, > I would ask that the option of multiple languages per DB locale not be > removed if it can be at all avoided. Nobody is proposing that --- the issue here is just how we set up the "default" configuration. Then I misunderstood. Sorry for the noise, folks. -- Mike Rylander ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [Fwd: Re: [HACKERS] tsearch in core patch]
"Mike Rylander" <[EMAIL PROTECTED]> writes: > I can certainly understand the benefit of making the default > configuration a simple locale to language map, but there are > definitely uses for searching using different stemmers/stop-lists even > within the same corpus/index. So, as a datapoint for the discussion, > I would ask that the option of multiple languages per DB locale not be > removed if it can be at all avoided. Nobody is proposing that --- the issue here is just how we set up the "default" configuration. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [Fwd: Re: [HACKERS] tsearch in core patch]
On 6/25/07, Tom Lane <[EMAIL PROTECTED]> wrote: Well, it's not hard at all to find chunks of English text that have embedded bits of French, Spanish, or what-have-you, but that's not an argument for trying to intermix the stemmers. I doubt that such simple bits of program could tell the language difference well enough to determine which stemming rules to apply. While I imagine that is probably true of many, if not most, my project in particular would greatly benefit from the ability to mix stemmers. I work with complex bibliographic data, which has language information embedded within records. This is not limited to the record level either. Individual fields within each bibliographic record can be in different langauges. Especially in countries where making software multi-lingual (such as Canada (en_CA/fr_CA)) is a requirement for use in public institutions, the ability to choose a stemmer and stop-word list at will for any particular record will actually provide the exact behavior needed. The obvious generalization from Canada would be to support any mix of languages supported by tsearch2. I can certainly understand the benefit of making the default configuration a simple locale to language map, but there are definitely uses for searching using different stemmers/stop-lists even within the same corpus/index. So, as a datapoint for the discussion, I would ask that the option of multiple languages per DB locale not be removed if it can be at all avoided. Thanks for listening (and for all the great work on getting tsearch into core! :) ... -- Mike Rylander ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [Fwd: Re: [HACKERS] tsearch in core patch]
> Tatsuo Ishii <[EMAIL PROTECTED]> writes: > > Ok, probably we need to copy the English stemming rule to the one for > > Japanese. > > Pardon my ignorance here, but is the concept of stemming even relevant > to Japanese/Chinese/Korean? What little I know about ideographic > languages suggests it wouldn't work well. And surely the specific rules > in the Snowball project's English stemmer wouldn't work. Your undestanding is correct. English stemmer would not work for Japanese "non English" part. What I meant was the "chunks of English text" in Japanese. > > I think same thing (commonly used English with local > > language) can be applied to Chinese and Korean. > > Well, it's not hard at all to find chunks of English text that have > embedded bits of French, Spanish, or what-have-you, but that's not an > argument for trying to intermix the stemmers. I doubt that such simple > bits of program could tell the language difference well enough to > determine which stemming rules to apply. For Japanese, it will be fairly simple: 7bit ASCII range words must be English (Note that mostly used Japanese encodings such as EUC do not allow to mix with ISO 8859). -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [Fwd: Re: [HACKERS] tsearch in core patch]
Tatsuo Ishii <[EMAIL PROTECTED]> writes: > Ok, probably we need to copy the English stemming rule to the one for > Japanese. Pardon my ignorance here, but is the concept of stemming even relevant to Japanese/Chinese/Korean? What little I know about ideographic languages suggests it wouldn't work well. And surely the specific rules in the Snowball project's English stemmer wouldn't work. > I think same thing (commonly used English with local > language) can be applied to Chinese and Korean. Well, it's not hard at all to find chunks of English text that have embedded bits of French, Spanish, or what-have-you, but that's not an argument for trying to intermix the stemmers. I doubt that such simple bits of program could tell the language difference well enough to determine which stemming rules to apply. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch in core patch
> I would be surprised if C locale defaulted to anything except English. Don't be surprised. The mechanism of collation is too simple for Japanse Kanji, and locale is not usefull for Japanse anyway. That's why Japanese installations of PostgreSQL tend to use C locale. -- Tatsuo Ishii SRA OSS, Inc. Japan > I suppose it would be sensible to add a switch to allow people to select > a different language. In any case, the only thing initdb would be doing > would be setting up an initial value of a table entry or GUC variable, > so you could always change it yourself later; it may not be worth > sweating too much about this. > > regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [Fwd: Re: [HACKERS] tsearch in core patch]
> Tatsuo Ishii wrote: > > > japanese '{ja_JP, C}' > > > > How would we know C -> japanese? > > > You can't do that. You can't have different languages (not locales) > mapping to the same 'tsearch language' because the stemmer doesn't know > that a specific word is in english or japanese. So you have two options: > (a) disable stemming (b) leave the language set to 'japanese' and see if > it plays well. Ok, probably we need to copy the English stemming rule to the one for Japanese. I think same thing (commonly used English with local language) can be applied to Chinese and Korean. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] tsearch in core patch
On Sat, 23 Jun 2007, Euler Taveira de Oliveira wrote: Will it be possible to disable stemming or stopwords removal? I'm asking this 'cause sometimes stemming doesn't lead to good results and/or stopwords are relevant. Maybe it could be an GUC variables ('enable_stemming' and 'enable_stopwords'). Just use another configuration. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [Fwd: Re: [HACKERS] tsearch in core patch]
Tatsuo Ishii wrote: > japanese '{ja_JP, C}' > > How would we know C -> japanese? > You can't do that. You can't have different languages (not locales) mapping to the same 'tsearch language' because the stemmer doesn't know that a specific word is in english or japanese. So you have two options: (a) disable stemming (b) leave the language set to 'japanese' and see if it plays well. -- Euler Taveira de Oliveira http://www.timbira.com/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch in core patch
Alvaro Herrera wrote: > What I was really suggesting was having a table mapping locale names > into "tsearch languages". Then the configuration could be made based on > the language, not on the locale name. So the stopword list is for > "russian", regardless of whether the locale is Russian_Russia or ru_RU. > Agreed. But I'm afraid we couldn't map all of the locale names in a right way. Man, it's a large list. ;) > Is this only for the stopword list, or does it also affect selecting a > stemmer? > Both. > Note: it's possible that the stopword list is different for brazilian > portuguese than portuguese portuguese, which is why I was suggesting > using a language "portuguese_brazil" and not just "postuguese". Whereas > you need a single stopword list for all the countries speaking spanish, > which is why you need only one language called spanish. > Indeed it's possible for portuguese, because we have some words that are written in different ways, e.g., pt_BR pt_PT english MônicaMónicaMonica ação acção action Irã Irão Iran . . . Will it be possible to disable stemming or stopwords removal? I'm asking this 'cause sometimes stemming doesn't lead to good results and/or stopwords are relevant. Maybe it could be an GUC variables ('enable_stemming' and 'enable_stopwords'). -- Euler Taveira de Oliveira http://www.timbira.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch
[EMAIL PROTECTED] wrote: > > Why not do it the other way around? > > es_ES spanish > > Spanish_Spain spanish > > ru_RU russian > > pt_BR portuguese_brazil > > > > That way you don't need any funny index. Or do you need the list of > > locales for each language? (but even if you do, you can easily obtain it > > by indexing both columns separately using btrees anyway) > > Yes, that's possible but that icreases number of identical configuration: > russian_win Russian_Russia > russian_unixru_RU > > They doesn't differ except locale name. But why do you need them to be different at all? Just make it russian Russian_Russia russian ru_RU Does that not work for some reason? What I was really suggesting was having a table mapping locale names into "tsearch languages". Then the configuration could be made based on the language, not on the locale name. So the stopword list is for "russian", regardless of whether the locale is Russian_Russia or ru_RU. Is this only for the stopword list, or does it also affect selecting a stemmer? Note: it's possible that the stopword list is different for brazilian portuguese than portuguese portuguese, which is why I was suggesting using a language "portuguese_brazil" and not just "postuguese". Whereas you need a single stopword list for all the countries speaking spanish, which is why you need only one language called spanish. -- Alvaro Herrerahttp://www.advogato.org/person/alvherre "Llegará una época en la que una investigación diligente y prolongada sacará a la luz cosas que hoy están ocultas" (Séneca, siglo I) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[Fwd: Re: [HACKERS] tsearch in core patch]
>> How would this work for initdb with locale C? > > I'm worrying about that too. english '{en_GB, en_US, C}' I suppose, that locale name always has a dot separator exept C locale --- which is well known exception ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch in core patch
> Why not do it the other way around? > es_ES spanish > Spanish_Spain spanish > ru_RU russian > pt_BR portuguese_brazil > > That way you don't need any funny index. Or do you need the list of > locales for each language? (but even if you do, you can easily obtain it > by indexing both columns separately using btrees anyway) Yes, that's possible but that icreases number of identical configuration: russian_win Russian_Russia russian_unixru_RU They doesn't differ except locale name. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [Fwd: Re: [HACKERS] tsearch in core patch]
> >> How would this work for initdb with locale C? > > > > I'm worrying about that too. > > english '{en_GB, en_US, C}' > > I suppose, that locale name always has a dot separator exept C locale --- > which is well known exception So we would have to?: japanese '{ja_JP, C}' How would we know C -> japanese? Also I'm wondering how we could handle texts including Japanese and English. It's very common in Japan. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch in core patch
Tatsuo Ishii <[EMAIL PROTECTED]> writes: >> On Jun 22, 2007, at 9:28 , Tom Lane wrote: >>> Is the point here for initdb to be able to establish a sane default >>> initially? Seems to me it can guess the language from the first >>> component of the locale (ru_RU -> russian). >> >> How would this work for initdb with locale C? > I'm worrying about that too. I would be surprised if C locale defaulted to anything except English. I suppose it would be sensible to add a switch to allow people to select a different language. In any case, the only thing initdb would be doing would be setting up an initial value of a table entry or GUC variable, so you could always change it yourself later; it may not be worth sweating too much about this. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch
> On Jun 22, 2007, at 9:28 , Tom Lane wrote: > > > Is the point here for initdb to be able to establish a sane default > > initially? Seems to me it can guess the language from the first > > component of the locale (ru_RU -> russian). > > How would this work for initdb with locale C? I'm worrying about that too. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] tsearch in core patch
[EMAIL PROTECTED] wrote: > So, final propose: > rename cfglocale to cfglanguages and store in it array of laguage names > which is produced from first part of locale names: > russian '{ru_RU, Russian_Russia}' > spanish '{es_ES, es_CL, Spanish_Spain, Spanish_Chile}' > > Comments? Why not do it the other way around? es_ES spanish Spanish_Spain spanish ru_RU russian pt_BR portuguese_brazil That way you don't need any funny index. Or do you need the list of locales for each language? (but even if you do, you can easily obtain it by indexing both columns separately using btrees anyway) -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "I can see support will not be a problem. 10 out of 10."(Simon Wittber) (http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch in core patch
Michael Glaesemann wrote: > > On Jun 22, 2007, at 9:28 , Tom Lane wrote: > > > Is the point here for initdb to be able to establish a sane default > > initially? Seems to me it can guess the language from the first > > component of the locale (ru_RU -> russian). > > How would this work for initdb with locale C? Yea, that's a problem. I am thinking we should just avoid the entire issue and require it to be set by the user, and throw an error if the configuration is not set. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch in core patch
>> That may have been true until we started supporting Windows... >> Swedish_Sweden.1252 is what I get on my machine, for example. Principle >> is the same, but values certainly aren't. > > Well, at least the name is not itself translated, so a mapping table is > not right out of the question. If they had put a name like > "Español_Chile" instead of "Spanish_Chile" we would be in serious > trouble. I don't think so, in oppsite case you can't type or show it to change locale :). So, final propose: rename cfglocale to cfglanguages and store in it array of laguage names which is produced from first part of locale names: russian '{ru_RU, Russian_Russia}' spanish '{es_ES, es_CL, Spanish_Spain, Spanish_Chile}' Comments? Is there some obstacles to use GIN indexes in pg_catalog? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch in core patch
On Jun 22, 2007, at 9:28 , Tom Lane wrote: Is the point here for initdb to be able to establish a sane default initially? Seems to me it can guess the language from the first component of the locale (ru_RU -> russian). How would this work for initdb with locale C? Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch in core patch
Magnus Hagander wrote: > Tom Lane wrote: > > Alvaro Herrera <[EMAIL PROTECTED]> writes: > >> I very much doubt that the different spanishes are any different in the > >> stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc; > >> but in the case of portuguese I'm not so sure. Maybe there are other > >> examples (like chinese, but I'm not sure how useful is tsearch for > >> chinese). > > > >> And the .ISO8859-1 part you don't need at all if you accept that the > >> files are UTF8 by design, as Tom proposed. > > > > Also, the problem we're dealing with here is mainly lack of > > standardization of the encoding part of locale names. AFAIK, just about > > everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes > > after that (if any) that is not too consistent across platforms. > > That may have been true until we started supporting Windows... > Swedish_Sweden.1252 is what I get on my machine, for example. Principle > is the same, but values certainly aren't. Well, at least the name is not itself translated, so a mapping table is not right out of the question. If they had put a name like "Español_Chile" instead of "Spanish_Chile" we would be in serious trouble. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] tsearch in core patch
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: >> I very much doubt that the different spanishes are any different in the >> stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc; >> but in the case of portuguese I'm not so sure. Maybe there are other >> examples (like chinese, but I'm not sure how useful is tsearch for >> chinese). > >> And the .ISO8859-1 part you don't need at all if you accept that the >> files are UTF8 by design, as Tom proposed. > > Also, the problem we're dealing with here is mainly lack of > standardization of the encoding part of locale names. AFAIK, just about > everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes > after that (if any) that is not too consistent across platforms. That may have been true until we started supporting Windows... Swedish_Sweden.1252 is what I get on my machine, for example. Principle is the same, but values certainly aren't. //Magnus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch in core patch
On Fri, 22 Jun 2007, Bruce Momjian wrote: Tom Lane wrote: Alvaro Herrera <[EMAIL PROTECTED]> writes: I very much doubt that the different spanishes are any different in the stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc; but in the case of portuguese I'm not so sure. Maybe there are other examples (like chinese, but I'm not sure how useful is tsearch for chinese). And the .ISO8859-1 part you don't need at all if you accept that the files are UTF8 by design, as Tom proposed. Also, the problem we're dealing with here is mainly lack of standardization of the encoding part of locale names. AFAIK, just about everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes after that (if any) that is not too consistent across platforms. So I see no problem in distinguishing between pt_PT and pt_BR if it turns out we have to. The trick is to not look at any more of the locale name than that; and if we standardize on "stopword files are UTF8" then I don't think we need to. OK, and the open question is when do we do this default setting. If we do it in initdb then we can isolate all the detection there. We can do that at initdb time, but we still have to decide how to map human-readable language name and lang part of locale name. Are we going to hardcode it ? It's not friendly for hosting solution, when people often have no access to the postgresql.conf, so they need to remember setting tsearch_conf_name. It could be solved using 'alter user ... set tsearch_conf_name' command though. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch in core patch
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > I very much doubt that the different spanishes are any different in the > > stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc; > > but in the case of portuguese I'm not so sure. Maybe there are other > > examples (like chinese, but I'm not sure how useful is tsearch for > > chinese). > > > And the .ISO8859-1 part you don't need at all if you accept that the > > files are UTF8 by design, as Tom proposed. > > Also, the problem we're dealing with here is mainly lack of > standardization of the encoding part of locale names. AFAIK, just about > everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes > after that (if any) that is not too consistent across platforms. > So I see no problem in distinguishing between pt_PT and pt_BR if it > turns out we have to. The trick is to not look at any more of the > locale name than that; and if we standardize on "stopword files are > UTF8" then I don't think we need to. OK, and the open question is when do we do this default setting. If we do it in initdb then we can isolate all the detection there. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch in core patch
Alvaro Herrera <[EMAIL PROTECTED]> writes: > I very much doubt that the different spanishes are any different in the > stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc; > but in the case of portuguese I'm not so sure. Maybe there are other > examples (like chinese, but I'm not sure how useful is tsearch for > chinese). > And the .ISO8859-1 part you don't need at all if you accept that the > files are UTF8 by design, as Tom proposed. Also, the problem we're dealing with here is mainly lack of standardization of the encoding part of locale names. AFAIK, just about everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes after that (if any) that is not too consistent across platforms. So I see no problem in distinguishing between pt_PT and pt_BR if it turns out we have to. The trick is to not look at any more of the locale name than that; and if we standardize on "stopword files are UTF8" then I don't think we need to. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] tsearch in core patch
Teodor Sigaev <[EMAIL PROTECTED]> writes: >> I don't think we are going to do language selection automatically --- >> the user is going to have to set tsearch_conf_name. > Are you suggest to remove long-lived feature of tsearch? In that case we > don't > need cfglocale (or cfglanguage as Tom suggested) and cfgdefault columns in > pg_ts_cfg at all. Just set up tsearch_conf_name. Is the point here for initdb to be able to establish a sane default initially? Seems to me it can guess the language from the first component of the locale (ru_RU -> russian). regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch
Teodor Sigaev wrote: > >> --- how do many languages use ISO8859-1 locale?. > > ISO8859-1 is encoding, not locale. > > I meant, if we'll use encoding name (for example PG_LATIN1) we couldn't > distinguish languages which use that encoding (for example italian and > finnish and some more), but using locale names it's possible: > it_IT.ISO8859-1, fi_FI.ISO8859-1 I don't understand. Why use "it_IT.ISO8859-1"? You just need to know the language, so "it" is enough. The _IT part specifies that it's the italian spoken in Italy. This may be irrelevant in most cases, but consider that pt_PT and pt_BR are AFAIK somewhat different languages. I very much doubt that the different spanishes are any different in the stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc; but in the case of portuguese I'm not so sure. Maybe there are other examples (like chinese, but I'm not sure how useful is tsearch for chinese). And the .ISO8859-1 part you don't need at all if you accept that the files are UTF8 by design, as Tom proposed. -- Alvaro Herrera Developer, http://www.PostgreSQL.org/ "Nadie esta tan esclavizado como el que se cree libre no siendolo" (Goethe) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch in core patch
I don't think we are going to do language selection automatically --- the user is going to have to set tsearch_conf_name. Are you suggest to remove long-lived feature of tsearch? In that case we don't need cfglocale (or cfglanguage as Tom suggested) and cfgdefault columns in pg_ts_cfg at all. Just set up tsearch_conf_name. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch in core patch
Teodor Sigaev wrote: > > The recommendation I was making was to use the language name, not the > > encoding name, in the user-visible configuration. > How does it determine language of db automatically? I don't think we are going to do language selection automatically --- the user is going to have to set tsearch_conf_name. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch in core patch
The recommendation I was making was to use the language name, not the encoding name, in the user-visible configuration. How does it determine language of db automatically? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch in core patch
3) ALTER FULLTEXT CONFIGURATION cfgname ADD/ALTER/DROP MAPPING done Why not rename ALTER FULLTEXT CONFIGURATION --> ALTER TEXT SEARCH CONFIGURATION here too ? It's renamed too. most languages can be written using UNICODE charset and UTF-8 encoding, so neither charset not encoding can be used to determine language. yes --- how do many languages use ISO8859-1 locale?. > ISO8859-1 is encoding, not locale. I meant, if we'll use encoding name (for example PG_LATIN1) we couldn't distinguish languages which use that encoding (for example italian and finnish and some more), but using locale names it's possible: it_IT.ISO8859-1, fi_FI.ISO8859-1 -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] tsearch in core patch
Hannu Krosing <[EMAIL PROTECTED]> writes: > Ãhel kenal päeval, N, 2007-06-21 kell 21:44, kirjutas Teodor Sigaev: >> 6) use encoding names instead of locale's names in configuration >> Ugh. I missed that knowledge of encoding doesn't allow to determine exact >> language > most languages can be written using UNICODE charset and UTF-8 encoding, > so neither charset not encoding can be used to determine language. The recommendation I was making was to use the language name, not the encoding name, in the user-visible configuration. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] tsearch in core patch
Ühel kenal päeval, N, 2007-06-21 kell 21:44, kirjutas Teodor Sigaev: > http://www.sigaev.ru/misc/tsearch_core-0.52.gz > > Plan was: > > 1) rename FULLTEXT to TEXT SEARCH in SQL command > done > > 2) rework Snowball stemmer's as Tom suggested > done > > 3) ALTER FULLTEXT CONFIGURATION cfgname ADD/ALTER/DROP MAPPING > done Why not rename ALTER FULLTEXT CONFIGURATION --> ALTER TEXT SEARCH CONFIGURATION here too ? > 4) remove support of default configuration per scheme. Default configuration > will be only one per locale. > done > > 5) single encoded files. That will touch snowball, ispell, synonym, thesaurus > and simple dictionaries > done > > 6) use encoding names instead of locale's names in configuration > Ugh. I missed that knowledge of encoding doesn't allow to determine exact > language most languages can be written using UNICODE charset and UTF-8 encoding, so neither charset not encoding can be used to determine language. > --- how do many languages use ISO8859-1 locale?. ISO8859-1 is encoding, not locale. > So, it's not done. Tom > pointed that locale's name isn't portable, but there isn't a lot of names of > the > same locale (ru_RU.UTF-8, ru_RU.UTF8 for example). So it's possible to use > array > of locales instead of one name. > > I didn't see comments about security hole pointed by Tom, so I repeat: > > About security holes in PARSER/DICTIONARY. I see following ways to resolve it > now: > 1) Allow to superuser only to do CREATE/ALTER/DROP PARSER/DICTIONARY > Disadvantage: hosting users will not be able to change dictionaries > 2) Remove CREATE/ALTER/DROP PARSER, split pg_ts_dict to pg_ts_dict_template > and pg_ts_dict and accordingly change CREATE/ALTER/DROP DICTIONARY > Disadvantage: parser and dictionary's template will not dump/restore, > it should be restored manually (just a INSERT into > pg_ts_parser/pg_ts_dict_template) > 3) Similar to previous point, but: > * CREATE/ALTER/DROP PARSER - super-user only > * CREATE/ALTER/DROP DICTIONARY TEMPLATE - super-user only > * CREATE/ALTER/DROP DICTIONARY - allowed to non-superuser > Disadvantage: new command CREATE/ALTER/DROP DICTIONARY TEMPLATE > Which way do we choose? or I miss some variant? > > I would like to go by 3) way... Comments? > ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] tsearch in core patch, for inclusion
Am Donnerstag, 22. Februar 2007 14:33 schrieb Teodor Sigaev: > \df says only types of arguments, not a meaning. Only if you don't provide argument names. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] tsearch in core patch, for inclusion
Am Donnerstag, 22. Februar 2007 18:07 schrieb Markus Schiltknecht: > > I agree so enhancing parser oabout not standard construct isn't good. > > Generally? Wow! This would mean PostgreSQL would always lack behind > other RDBSes, regarding ease of use. Please don't do that! You are confusing making a full-text index and configuring the full-text engine. Tsearch already gives you a standard CREATE INDEX variant to do the former. The discussion here is about the latter, and notably Oracle uses functions there. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch in core patch, for inclusion
On Thursday 25 January 2007 12:51, Oleg Bartunov wrote: > On Thu, 25 Jan 2007, Nikolay Samokhvalov wrote: > > On 1/25/07, Teodor Sigaev <[EMAIL PROTECTED]> wrote: > >> It's should clear enough for now - dump data from old db and load into > >> new one. > >> But dump should be without any contrib/tsearch2 related functions. > > > > Upgrading from 8.1.x to 8.2.x was not tivial because of very trivial > > change in API (actually not really API but the content of "pg_ts_*" > > tables): russian snowball stemming function was forked to 2 different > > ones, for koi8 and utf8 encodings. So, as I dumped my pg_ts_* tables > > data (to keep my tsearch2 settings), I saw errors during restoration > > (btw, why didn't you keep old russian stemmer function name as a > > synonym to koi8 variant?) -- so, I had to change my dump file > > manually, because I didn't manage to follow "tsearch2 best practices" > > sed and grep did the trick. > > > (to use some kind of "bootstrap" script that creates tsearch2 > > configuration you need from default one -- using several INSERTs and > > UPDATEs). And there were no upgrade notes for tsearch2. > > This is unfair, you promised to write upgrade notes and we discussed the > problem with name change before release and I rely on you. It was my fault, > of course. > I got bit by this today and, afaict the best solution for the status quo would be to change the install schema to something like tsearch2, which would then allow for much easier dump and restore handling. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch in core patch, for inclusion
>> CREATE TABLE foo (id serial, names text FULLTEXT); >> >> Anything more complicated is a waste of cycles. >> >> Joshua D. Drake > > I agree. Question: what about multilanguage fulltext. > > CREATE INDEX foo USING FULLTEXT(bar) [ WITH czech_dictionary ]; > CREATE TABLE foo (id serial, names text FULLTEXT [ (czech_dictionary, > english_dictionary) ] ); > > all others can we do via SP. That works for me with perhaps a default mapping to locales? For example if our locale is en_us.UTF8 we are pretty assured that we are using english. 90% yes. 10% no. In czech typical task: find word without accents, or find german, english, czech stemmed word in multilanguage documents (or different languages depend on topology). Lot of databases are minimal bilanguagal (in czech rep. german and czech). Pavel p.s. missing collates is big minus for PostgreSQL in eu (we have some workarounds) _ Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch, for inclusion
>> CREATE TABLE foo (id serial, names text FULLTEXT); >> >> Anything more complicated is a waste of cycles. >> >> Joshua D. Drake > > I agree. Question: what about multilanguage fulltext. > > CREATE INDEX foo USING FULLTEXT(bar) [ WITH czech_dictionary ]; > CREATE TABLE foo (id serial, names text FULLTEXT [ (czech_dictionary, > english_dictionary) ] ); > > all others can we do via SP. That works for me with perhaps a default mapping to locales? For example if our locale is en_us.UTF8 we are pretty assured that we are using english. Joshua D. Drake > > Pavel Stehule > > _ > Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com. > http://www.msn.cz/ > > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] tsearch in core patch, for inclusion
I am not talking about stored procedures. I am talking about a very ugly, counter intuitive syntax above. Initializing full text should be as simple as: CREATE INDEX foo USING FULLTEXT(bar); (or something similar) Or: CREATE TABLE foo (id serial, names text FULLTEXT); Anything more complicated is a waste of cycles. Joshua D. Drake I agree. Question: what about multilanguage fulltext. CREATE INDEX foo USING FULLTEXT(bar) [ WITH czech_dictionary ]; CREATE TABLE foo (id serial, names text FULLTEXT [ (czech_dictionary, english_dictionary) ] ); all others can we do via SP. Pavel Stehule _ Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com. http://www.msn.cz/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] tsearch in core patch, for inclusion
Pavel Stehule wrote: >> > And users are constantly complaining that PostgreSQL doesn't have >> > fulltext indexing capabilities (if they don't know about tsearch2) or >> > about how hard it is to use tsearch2. >> > >> >> SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'], >> >> ARRAY['...']) is readable. >> > >> > Hardly. Because it's not like SQL: >> >> I have to agree here. >> >> SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'], >> ARRAY['...']) is readable. >> >> Is a total no op. We might as well just leave it in contrib. >> > > I am for integration tsearch to core, why not. But I don't see reason > for special syntax. Stored procedures is exactly good tool for it. I am not talking about stored procedures. I am talking about a very ugly, counter intuitive syntax above. Initializing full text should be as simple as: CREATE INDEX foo USING FULLTEXT(bar); (or something similar) Or: CREATE TABLE foo (id serial, names text FULLTEXT); Anything more complicated is a waste of cycles. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch in core patch, for inclusion
> And users are constantly complaining that PostgreSQL doesn't have > fulltext indexing capabilities (if they don't know about tsearch2) or > about how hard it is to use tsearch2. > >> SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'], >> ARRAY['...']) is readable. > > Hardly. Because it's not like SQL: I have to agree here. SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'], ARRAY['...']) is readable. Is a total no op. We might as well just leave it in contrib. I am for integration tsearch to core, why not. But I don't see reason for special syntax. Stored procedures is exactly good tool for it. Fulltext is standarised in SQL/MM, SQL Multimedia and Application Packages, Part 2: Full-Text Why implement extensive proprietary solution? If our soulution is proprietary, then so it is simple and cheap and doesn't complicate future conformance with ANSI SQL. Regards Pavel Stehule _ Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch in core patch, for inclusion
> > And users are constantly complaining that PostgreSQL doesn't have > fulltext indexing capabilities (if they don't know about tsearch2) or > about how hard it is to use tsearch2. > >> SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'], >> ARRAY['...']) is readable. > > Hardly. Because it's not like SQL: I have to agree here. SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'], ARRAY['...']) is readable. Is a total no op. We might as well just leave it in contrib. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch in core patch, for inclusion
Hi, Pavel Stehule wrote: Functions maybe doesn't see efective, but user's cannot learn new syntax. Are you serious? That argument speaks exactly *for* extending the grammar. From other databases, users are used to: CREATE TABLE ... (SQL) CREATE INDEX ... (SQL) CREATE FULLTEXT INDEX ... (Transact-SQL) CREATE TABLE (... FULLTEXT ...) (MySQL) CREATE INDEX ... INDEXTYPE IS ctxsys.context PARAMETERS ... (Oracle Text) And users are constantly complaining that PostgreSQL doesn't have fulltext indexing capabilities (if they don't know about tsearch2) or about how hard it is to use tsearch2. SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'], ARRAY['...']) is readable. Hardly. Because it's not like SQL: - it's counter-intuitive to have to SELECT, when you want to CREATE something. - it's confusing to have two actions (select create) - why do I have to write ARRAYs to list parameters? - it's not obvious what you're selecting (return value?) - you have to keep track of the brackets, which can easily get messed up with two levels of them. Especially if the command gets multiple lines long. I agree so enhancing parser oabout not standard construct isn't good. Generally? Wow! This would mean PostgreSQL would always lack behind other RDBSes, regarding ease of use. Please don't do that! Regards Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] tsearch in core patch, for inclusion
Hi, Andrew Dunstan wrote: If we are worried about the size of the transition table and keeping it in cache (see remarks from Tom upthread) then adding more keywords seems a bad idea, as it will surely expand the table. OTOH, I'd hate to make that a design criterion. Yeah, me too. Especially because it's an implementation issue against ease of use. (Or can somebody convince me that functions would provide a simple interface?) My main worry has been that the grammar would be stable. You mean stability of the grammar for the new additions or for all the grammar? Why are you worried about that? Just to quantify all this, I did a quick check on the grammar using bison -v - we appear to have 473 terminal symbols, and 420 non-terminal sybols in 1749 rules, generating 3142 states. The biggest tables generated are yytable and yycheck, each about 90kb on my machine. That already sounds somewhat better that Tom's 300 kb. And considering that these caches most probably grow faster than our grammar... Regards Markus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] tsearch in core patch, for inclusion
CREATE FULLTEXT CONFIGURATION myfts LIKE template_cfg AS DEFAULT; SELECT add_fulltext_config('myfts', 'template_cfg', True); That's simple, but what about CREATE FULLTEXT MAPPING ON cfgname FOR lexemetypename[, ...] WITH dictname1[, ...]; ? SELECT create_fulltext_mapping(cfgname, '{lexemetypename[, ...]}'::text[], '{dictname1[, ...]}'::text[]); Seems rather ugly for me... Functions maybe doesn't see efective, but user's cannot learn new syntax. SELECT create_fulltext_mapping(cfgname, ARRAY['lex..','..'], ARRAY['...']) is readable. I agree so enhancing parser oabout not standard construct isn't good. And function interface does not provide autocompletion and online help in psql. \df says only types of arguments, not a meaning. Yes, I miss better support function in psql too. But it's different topic. I don't see reason why \h cannot support better functions. Nice a day Pavel Stehule _ Emotikony a pozadi programu MSN Messenger ozivi vasi konverzaci. http://messenger.msn.cz/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch in core patch, for inclusion
Teodor Sigaev wrote: In that proposed syntax, I would drop all "=", ",", "(", and ")". They don't seem necessary and they are untypical for SQL commands. I'd compare with CREATE FUNCTION or CREATE SEQUENCE for SQL commands that do similar things. I was looking at CREATE TYPE mostly. With removing "=", ",", "(", and ")" in CREATE/ALTER FULLTEXT it's needed to add several items in unreserved_keyword list. And increase gram.y by adding new rules similar to OptRoleList instead of simple opt_deflist: '(' def_list ')' { $$ = $2; } | /*EMPTY*/ { $$ = NIL; } ; Is it acceptable? List of new keywords is: LOCALE, LEXIZE, INIT, OPT, GETTOKEN, LEXTYPES, HEADLINE So, syntax will be CREATE FULLTEXT DICTIONARY dictname LEXIZE lexize_function [ INIT init_function ] [ OPT opt_text ]; CREATE FULLTEXT DICTIONARY dictname [ { LEXIZE lexize_function | INIT init_function | OPT opt_text } [...] ] LIKE template_dictname; If we are worried about the size of the transition table and keeping it in cache (see remarks from Tom upthread) then adding more keywords seems a bad idea, as it will surely expand the table. OTOH, I'd hate to make that a design criterion. My main worry has been that the grammar would be stable. Just to quantify all this, I did a quick check on the grammar using bison -v - we appear to have 473 terminal symbols, and 420 non-terminal sybols in 1749 rules, generating 3142 states. The biggest tables generated are yytable and yycheck, each about 90kb on my machine. cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch in core patch, for inclusion
CREATE FULLTEXT CONFIGURATION myfts LIKE template_cfg AS DEFAULT; SELECT add_fulltext_config('myfts', 'template_cfg', True); That's simple, but what about CREATE FULLTEXT MAPPING ON cfgname FOR lexemetypename[, ...] WITH dictname1[, ...]; ? SELECT create_fulltext_mapping(cfgname, '{lexemetypename[, ...]}'::text[], '{dictname1[, ...]}'::text[]); Seems rather ugly for me... And function interface does not provide autocompletion and online help in psql. \df says only types of arguments, not a meaning. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch in core patch, for inclusion
In that proposed syntax, I would drop all "=", ",", "(", and ")". They don't seem necessary and they are untypical for SQL commands. I'd compare with CREATE FUNCTION or CREATE SEQUENCE for SQL commands that do similar things. I was looking at CREATE TYPE mostly. With removing "=", ",", "(", and ")" in CREATE/ALTER FULLTEXT it's needed to add several items in unreserved_keyword list. And increase gram.y by adding new rules similar to OptRoleList instead of simple opt_deflist: '(' def_list ')' { $$ = $2; } | /*EMPTY*/ { $$ = NIL; } ; Is it acceptable? List of new keywords is: LOCALE, LEXIZE, INIT, OPT, GETTOKEN, LEXTYPES, HEADLINE So, syntax will be CREATE FULLTEXT DICTIONARY dictname LEXIZE lexize_function [ INIT init_function ] [ OPT opt_text ]; CREATE FULLTEXT DICTIONARY dictname [ { LEXIZE lexize_function | INIT init_function | OPT opt_text } [...] ] LIKE template_dictname; -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch in core patch, for inclusion
Hi, Peter Eisentraut wrote: Oleg Bartunov wrote: It's not so big addition to the gram.y, see a list of commands http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html. As we still to still discuss the syntax: is there a proposal for how a function based syntax would look like? CREATE FULLTEXT CONFIGURATION myfts LIKE template_cfg AS DEFAULT; just seems so much more SQL-like than: SELECT add_fulltext_config('myfts', 'template_cfg', True); I admit, that's a very simple and not thought through example. But as long as those who prefer not to extend the grammar don't come up with a better alternative syntax, one easily gets the impression that extending the grammar in general is evil. In that proposed syntax, I would drop all "=", ",", "(", and ")". They don't seem necessary and they are untypical for SQL commands. I'd compare with CREATE FUNCTION or CREATE SEQUENCE for SQL commands that do similar things. Yup, I'd second that. Regards Markus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch, for inclusion
On Thu, 22 Feb 2007, Peter Eisentraut wrote: Oleg Bartunov wrote: It's not so big addition to the gram.y, see a list of commands http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html. In that proposed syntax, I would drop all "=", ",", "(", and ")". They don't seem necessary and they are untypical for SQL commands. I'd compare with CREATE FUNCTION or CREATE SEQUENCE for SQL commands that do similar things. that looks reasonable. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch, for inclusion
Joshua D. Drake wrote: > This is like the third time we have been around this problem. The > syntax is clear and reasonable imo. But others have differing opinions. > Can we stop arguing about it and just include? If there are specific > issues beyond syntax that is one > thing, but that this point it seems we are arguing for the sake of > arguing. How is that worse than wanting to abort the discussion for the sake of aborting the discussion? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch in core patch, for inclusion
Oleg Bartunov wrote: > It's not so big addition to the gram.y, see a list of commands > http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html. In that proposed syntax, I would drop all "=", ",", "(", and ")". They don't seem necessary and they are untypical for SQL commands. I'd compare with CREATE FUNCTION or CREATE SEQUENCE for SQL commands that do similar things. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch in core patch, for inclusion
"Florian G. Pflug" <[EMAIL PROTECTED]> writes: > Markus Schiltknecht wrote: >>> Are there any ongoing efforts to rewrite the parser (i.e. using >>> another algorithm, like a recursive descent parser)? >> Why would you want to do that? > Last, but not least, the C and C++ syntax is basically set in stone - At > least now the g++ supports nearly all (or all? don't know) of the C++ > standard. So it doesn't really matter if changes to the parse are a bit > more work, because the rarely happen. Postgres seems to add new features > that change the grammar with every release (with is a good thing!). Yeah. I think it would be a pretty bad idea for us to go over to a handwritten parser: not only greater implementation effort for grammar changes, but greater risk of introducing bugs. Bison tells you about it when you've written something ambiguous ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch, for inclusion
Markus Schiltknecht wrote: Are there any ongoing efforts to rewrite the parser (i.e. using another algorithm, like a recursive descent parser)? Why would you want to do that? I recall having read something about rewriting the parser. Together with Tom being worried about parser performance and knowing GCC has switched to a hand written parser some time ago, I suspected bison to be slow. That's why I've asked. I think the case is different for C and C++. The grammars of C and C++ appear to be much more parser-friendly then SQL, making handcrafting a parser easier I'd think. And I believe that one of the reasons gcc wasn't happy with bison was that I limited the quality of their error reporting - which isn't that much of a problem for SQL, since SQL statements are rather short compared to your typical C/C++ source file. Last, but not least, the C and C++ syntax is basically set in stone - At least now the g++ supports nearly all (or all? don't know) of the C++ standard. So it doesn't really matter if changes to the parse are a bit more work, because the rarely happen. Postgres seems to add new features that change the grammar with every release (with is a good thing!). greetings, Florian Pflug ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch in core patch, for inclusion
Markus Schiltknecht wrote: Hi, I recall having read something about rewriting the parser. Together with Tom being worried about parser performance and knowing GCC has switched to a hand written parser some time ago, I suspected bison to be slow. That's why I've asked. This has little to do with performance and everything to do with the insanity which is C++: http://gnu.teleglobe.net/software/gcc/gcc-3.4/changes.html * A hand-written recursive-descent C++ parser has replaced the YACC-derived C++ parser from previous GCC releases. The new parser contains much improved infrastructure needed for better parsing of C++ source codes, handling of extensions, and clean separation (where possible) between proper semantics analysis and parsing. The new parser fixes many bugs that were found in the old parser. Short form: C++ is basically not LALR(1) parseable. Brian
Re: [HACKERS] tsearch in core patch, for inclusion
"Florian G. Pflug" <[EMAIL PROTECTED]> writes: > Markus Schiltknecht wrote: >> I didn't find hard facts about runtime complexity of LALR, >> though (pointers are very welcome). > a) and b) should be O(1). Processing one token pushes at most one state > onto the stack, so overall no more than N stats can be popped off again, > making the whole algorithm O(N) with N being the number of tokens of the > input stream. Yeah. I was concerned about the costs involved in trying to pack the state tables, but it appears that that cost is all paid when the grammar is compiled --- looking into gram.c, it appears the inner loop contains just simple array lookups. Still, bloating of the state tables is something we ought to pay attention to, because there's a distributed cost once they no longer fit in a processor's L1 cache. On my machine "size gram.o" is over 360K already ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch, for inclusion
Hi, Florian G. Pflug wrote: According to http://en.wikipedia.org/wiki/LR_parser processing one token in any LR(1) parser in the worst case needs to a) Do a lookup in the action table with the current (state, token) pair b) Do a lookup in the goto table with a (state, rule) pair. c) Push one state onto the stack, and pop n states with n being the number of symbols (tokens or other rules) on the right hand side of a rule. a) and b) should be O(1). Processing one token pushes at most one state onto the stack, so overall no more than N stats can be popped off again, making the whole algorithm O(N) with N being the number of tokens of the input stream. Looks correct, thanks. What exactly is Tom worried about, then? Are there any ongoing efforts to rewrite the parser (i.e. using another algorithm, like a recursive descent parser)? Why would you want to do that? I recall having read something about rewriting the parser. Together with Tom being worried about parser performance and knowing GCC has switched to a hand written parser some time ago, I suspected bison to be slow. That's why I've asked. Regards Markus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] tsearch in core patch, for inclusion
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Oleg Bartunov wrote: > >> It's not so big addition to the gram.y, see a list of commands > >> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html. > > > I looked at the diff file and the major change in gram.y is the creation > > of a new object type FULLTEXT, > > You mean four different object types. I'm not totally clear on bison's > scaling behavior relative to the number of productions, but I think > there's no question that this patch will impose a measurable distributed > penalty on every single query issued to Postgres by any application, > whether it's heard of tsearch or not. The percentage overhead would > be a lot lower if the patch were introducing a similar number of entries > into pg_proc. My point is that the grammar splits off all the tsearch2 objects by prefixing them with CREATE FULLTEXT object, where there are four object types supported. But as others have pointed out, the performance of the grammar is probably not an issue in this case. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch in core patch, for inclusion
Markus Schiltknecht wrote: Hi, Tom Lane wrote: You mean four different object types. I'm not totally clear on bison's scaling behavior relative to the number of productions You really want to trade parser performance (which is *very* implementation specific) for ease of use? Bison generates a LALR [1] parser, which depend quite a bit on the number of productions. But AFAIK the dependency is mostly on memory consumption for the internal symbol sets, not so much on runtime complexity. I didn't find hard facts about runtime complexity of LALR, though (pointers are very welcome). According to http://en.wikipedia.org/wiki/LR_parser processing one token in any LR(1) parser in the worst case needs to a) Do a lookup in the action table with the current (state, token) pair b) Do a lookup in the goto table with a (state, rule) pair. c) Push one state onto the stack, and pop n states with n being the number of symbols (tokens or other rules) on the right hand side of a rule. a) and b) should be O(1). Processing one token pushes at most one state onto the stack, so overall no more than N stats can be popped off again, making the whole algorithm O(N) with N being the number of tokens of the input stream. AFAIK the only difference between SLR, LALR and LR(1) lies in the generation of the goto and action tables. Are there any ongoing efforts to rewrite the parser (i.e. using another algorithm, like a recursive descent parser)? Why would you want to do that? greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch in core patch, for inclusion
Hi, Tom Lane wrote: You mean four different object types. I'm not totally clear on bison's scaling behavior relative to the number of productions You really want to trade parser performance (which is *very* implementation specific) for ease of use? Bison generates a LALR [1] parser, which depend quite a bit on the number of productions. But AFAIK the dependency is mostly on memory consumption for the internal symbol sets, not so much on runtime complexity. I didn't find hard facts about runtime complexity of LALR, though (pointers are very welcome). Are there any ongoing efforts to rewrite the parser (i.e. using another algorithm, like a recursive descent parser)? Regards Markus [1]: Wikipedia on the LALR parsing algorithm: http://en.wikipedia.org/wiki/LALR_parser ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] tsearch in core patch, for inclusion
Bruce Momjian <[EMAIL PROTECTED]> writes: > Oleg Bartunov wrote: >> It's not so big addition to the gram.y, see a list of commands >> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html. > I looked at the diff file and the major change in gram.y is the creation > of a new object type FULLTEXT, You mean four different object types. I'm not totally clear on bison's scaling behavior relative to the number of productions, but I think there's no question that this patch will impose a measurable distributed penalty on every single query issued to Postgres by any application, whether it's heard of tsearch or not. The percentage overhead would be a lot lower if the patch were introducing a similar number of entries into pg_proc. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch in core patch, for inclusion
It's not so big addition to the gram.y, see a list of commands http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html. SQL commands make FTS syntax clear and follow tradition to manage system objects. From the user's side, I'd be very unhappy to configure FTS, which can be very complex, using functions. All we want is to provide users clear syntax. This is like the third time we have been around this problem. The syntax is clear and reasonable imo. Can we stop arguing about it and just include? If there are specific issues beyond syntax that is one thing, but that this point it seems we are arguing for the sake of arguing. Joshua D. Drake ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch in core patch, for inclusion
Oleg Bartunov wrote: > On Tue, 20 Feb 2007, Alvaro Herrera wrote: > > > Bruce Momjian wrote: > >> > >> FYI, I added this to the patches queue because I think we decided > >> full-text indexing should be in the core. If I am wrong, please let me > >> know. > > > > One of the objections I remember to this particular implementation was > > that configuration should be done using functions rather than new syntax > > in gram.y. This seems a good idea because it avoids bloating the > > grammar, while still allowing dependency tracking, pg_dump support, > > syscache support etc. > > It's not so big addition to the gram.y, see a list of commands > http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html. > SQL commands make FTS syntax clear and follow tradition to manage > system objects. From the user's side, I'd be very unhappy to configure > FTS, which can be very complex, using functions. All we want is to > provide users clear syntax. I looked at the diff file and the major change in gram.y is the creation of a new object type FULLTEXT, so you can CREATE, ALTER and DROP FULLTEXT. I don't know fulltext administration well enough, so if Oleg says a function API would be too complex, I am OK with his new parser syntax. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch in core patch, for inclusion
On Tue, 20 Feb 2007, Alvaro Herrera wrote: Bruce Momjian wrote: FYI, I added this to the patches queue because I think we decided full-text indexing should be in the core. If I am wrong, please let me know. One of the objections I remember to this particular implementation was that configuration should be done using functions rather than new syntax in gram.y. This seems a good idea because it avoids bloating the grammar, while still allowing dependency tracking, pg_dump support, syscache support etc. It's not so big addition to the gram.y, see a list of commands http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html. SQL commands make FTS syntax clear and follow tradition to manage system objects. From the user's side, I'd be very unhappy to configure FTS, which can be very complex, using functions. All we want is to provide users clear syntax. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch in core patch, for inclusion
Bruce Momjian wrote: > > FYI, I added this to the patches queue because I think we decided > full-text indexing should be in the core. If I am wrong, please let me > know. One of the objections I remember to this particular implementation was that configuration should be done using functions rather than new syntax in gram.y. This seems a good idea because it avoids bloating the grammar, while still allowing dependency tracking, pg_dump support, syscache support etc. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch in core patch, for inclusion
FYI, I added this to the patches queue because I think we decided full-text indexing should be in the core. If I am wrong, please let me know. --- Teodor Sigaev wrote: > We (Oleg and me) are glad to present tsearch in core of pgsql patch. In > basic, > layout, functions, methods, types etc are the same as in current tsearch2 > with a > lot of improvements: > > - pg_ts_* tables now are in pg_catalog > - parsers, dictionaries, configurations now have owner and namespace > similar to > other pgsql's objects like tables, operator classes etc > - current tsearch configuration is managed with a help of GUC variable > tsearch_conf_name. > - choosing of tsearch cfg by locale may be done for each schema separately > - managing of tsearch configuration with a help of SQL commands, not with > insert/update/delete statements. This allows to drive dependencies, > correct dumping and dropping. > - psql support with a help of \dF* commands > - add all available Snowball stemmers and corresponding configuration > - correct memory freeing by any dictionary > > Work is sponsored by EnterpriseDB's PostgreSQL Development Fund. > > patch: http://www.sigaev.ru/misc/tsearch_core-0.33.gz > docs: http://mira.sai.msu.su/~megera/pgsql/ftsdoc/ (not yet completed and > it's > not yet a patch, just a SGML source) > > Implementation details: > - directory layout >src/backend/utils/adt/tsearch - all IO function and simple operations >src/backend/utils/tsearch - complex processing functions, including > language processing and dictionaries > - most of snowball dictionaries are placed in separate .so library and >they plug in into data base by similar way as character conversation >library does. > > If there aren't objections then we plan commit patch tomorrow or after > tomorrow. > Before committing, I'll changes oids from 5000+ to lower values to prevent > holes > in oids. And after that, I'll remove tsearch2 contrib module. > > -- > Teodor Sigaev E-mail: [EMAIL PROTECTED] > WWW: http://www.sigaev.ru/ > > ---(end of broadcast)--- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch in core patch, for inclusion
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --- Teodor Sigaev wrote: > We (Oleg and me) are glad to present tsearch in core of pgsql patch. In > basic, > layout, functions, methods, types etc are the same as in current tsearch2 > with a > lot of improvements: > > - pg_ts_* tables now are in pg_catalog > - parsers, dictionaries, configurations now have owner and namespace > similar to > other pgsql's objects like tables, operator classes etc > - current tsearch configuration is managed with a help of GUC variable > tsearch_conf_name. > - choosing of tsearch cfg by locale may be done for each schema separately > - managing of tsearch configuration with a help of SQL commands, not with > insert/update/delete statements. This allows to drive dependencies, > correct dumping and dropping. > - psql support with a help of \dF* commands > - add all available Snowball stemmers and corresponding configuration > - correct memory freeing by any dictionary > > Work is sponsored by EnterpriseDB's PostgreSQL Development Fund. > > patch: http://www.sigaev.ru/misc/tsearch_core-0.33.gz > docs: http://mira.sai.msu.su/~megera/pgsql/ftsdoc/ (not yet completed and > it's > not yet a patch, just a SGML source) > > Implementation details: > - directory layout >src/backend/utils/adt/tsearch - all IO function and simple operations >src/backend/utils/tsearch - complex processing functions, including > language processing and dictionaries > - most of snowball dictionaries are placed in separate .so library and >they plug in into data base by similar way as character conversation >library does. > > If there aren't objections then we plan commit patch tomorrow or after > tomorrow. > Before committing, I'll changes oids from 5000+ to lower values to prevent > holes > in oids. And after that, I'll remove tsearch2 contrib module. > > -- > Teodor Sigaev E-mail: [EMAIL PROTECTED] > WWW: http://www.sigaev.ru/ > > ---(end of broadcast)--- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch in core patch, for inclusion
Andrew Dunstan wrote: I am constantly running into this: Q. Does PostgreSQL have full text indexing? A. Yes it is in contrib. Q. But that isn't part of core. A. *sigh* Where on the website can I see what "plugins" are included with PostgreSQL? Where on the website can I see the Official PostgreSQL Documentation for Full Text Indexing? With TSearch2 in core will that fix the many upgrade problems associated with using TSearch2? contrib is a horrible misnomer. Can we maybe bite the bullet and call it something else? After years of PG use, I am still afraid to use contrib modules because it just *feels* like voodoo. I have spent much time reading this mailing list and on IRC with PG users, and I know that contrib modules are on the whole tested and safe, but the lack of web documentation and any indication of what they do other than "check the notes that come with the source" makes me just feel like they are "use and cross fingers" type thing. I don't know how hard it would be to implement, but perhaps contrib modules could be compiled in a similar way to Apache modules. E.g., ./configure --with-modulename with the onus for packaging them appropriately falling onto the shoulders of the module authors. I feel that even a basic module management system like this would greatly increase awareness of and confidence in the contrib modules. Oh, and +1 on renaming contrib +1 on the need for a comprehensive list of them +1 on the need for more doc on the website about each of them, onus falling on module authors, perhaps require at least a basic doc patch as a requirement for /contrib inclusion. - Naz ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] tsearch in core patch, for inclusion
On Thu, 25 Jan 2007, Nikolay Samokhvalov wrote: On 1/25/07, Teodor Sigaev <[EMAIL PROTECTED]> wrote: It's should clear enough for now - dump data from old db and load into new one. But dump should be without any contrib/tsearch2 related functions. Upgrading from 8.1.x to 8.2.x was not tivial because of very trivial change in API (actually not really API but the content of "pg_ts_*" tables): russian snowball stemming function was forked to 2 different ones, for koi8 and utf8 encodings. So, as I dumped my pg_ts_* tables data (to keep my tsearch2 settings), I saw errors during restoration (btw, why didn't you keep old russian stemmer function name as a synonym to koi8 variant?) -- so, I had to change my dump file manually, because I didn't manage to follow "tsearch2 best practices" sed and grep did the trick. (to use some kind of "bootstrap" script that creates tsearch2 configuration you need from default one -- using several INSERTs and UPDATEs). And there were no upgrade notes for tsearch2. This is unfair, you promised to write upgrade notes and we discussed the problem with name change before release and I rely on you. It was my fault, of course. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch in core patch, for inclusion
though that we still have the more odd grammar of actually using Tsearch to query. Although I don't really have a better suggestion without adding some ungodly obscure operator. IMHO, best possible solution is 'WHERE table.text_field @ text'. Operator @ internally makes equivalent of 'to_tsvector(table.text_field) @@ plainto_tsquery(text)', it's also possible to add GIN/GIST opclasses to speedup search queries. Performance of making headline in this case will be decreased insignificant, but ranking time will be disastrous. Because of reparsing of whole found texts. GIST performance may be decreased too - GIST indexing of tsvector is lossy. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch in core patch, for inclusion
On 1/25/07, Teodor Sigaev <[EMAIL PROTECTED]> wrote: It's should clear enough for now - dump data from old db and load into new one. But dump should be without any contrib/tsearch2 related functions. Upgrading from 8.1.x to 8.2.x was not tivial because of very trivial change in API (actually not really API but the content of "pg_ts_*" tables): russian snowball stemming function was forked to 2 different ones, for koi8 and utf8 encodings. So, as I dumped my pg_ts_* tables data (to keep my tsearch2 settings), I saw errors during restoration (btw, why didn't you keep old russian stemmer function name as a synonym to koi8 variant?) -- so, I had to change my dump file manually, because I didn't manage to follow "tsearch2 best practices" (to use some kind of "bootstrap" script that creates tsearch2 configuration you need from default one -- using several INSERTs and UPDATEs). And there were no upgrade notes for tsearch2. So, I consider upgrading process for tsearch2 to be a little bit tricky till present. I assume it will be improved with 8.3... -- Best regards, Nikolay ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch, for inclusion
Teodor Sigaev wrote: >> the patch. I'm personally not sold on the need for modifications to the >> SQL grammar, for example, as opposed to just using a set of SQL-callable >> functions and some new system catalogs. > > SQL grammar isn't changed significantly - just add variants of > CREATE/DROP/ALTER /COMMENTS commands. Next, functions haven't > autocomplete feature or built-in quick help - if you don't remember > exactly kind/type of argument(s) of function then you should read a docs. I didn't read the patch but I did skim the docs for this and if the docs are current I see things like this: CREATE FULLTEXT DICTIONARY en_ispell ( OPT = 'DictFile="ispell/english.dict", AffFile="ispell/english.aff", StopFile="english.stop"' ) LIKE ispell_template; ALTER FULLTEXT DICTIONARY en_stem SET OPT='english.stop'; Which to me is perfectly reasonable and intuitive. It is unfortunate though that we still have the more odd grammar of actually using Tsearch to query. Although I don't really have a better suggestion without adding some ungodly obscure operator. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch in core patch, for inclusion
the patch. I'm personally not sold on the need for modifications to the SQL grammar, for example, as opposed to just using a set of SQL-callable functions and some new system catalogs. SQL grammar isn't changed significantly - just add variants of CREATE/DROP/ALTER /COMMENTS commands. Next, functions haven't autocomplete feature or built-in quick help - if you don't remember exactly kind/type of argument(s) of function then you should read a docs. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch in core patch, for inclusion
This is a fairly large patch and I would like the chance to review it before it goes in --- "we'll commit tomorrow" is not exactly a decent review window. Not a problem. One possible argument for this over the contrib version is a saner approach to dumping and restoring configurations. However, as against that: 1) what's the upgrade path for getting an existing tsearch2 configuration into this implementation? It's should clear enough for now - dump data from old db and load into new one. But dump should be without any contrib/tsearch2 related functions. 2) once we put this in core we are going to be stuck with supporting its SQL API forever. Are we convinced that this API is the one we want? I don't recall even having seen any proposal or discussion. It was OK for tsearch2's API to change every release while it was in contrib, but the expectation of stability is a whole lot higher for core features. Basic tsearch2 SQL API doesn't changed since its first release, just extended. As I can see, there isn't any standard of fulltext search in SQL. DB/2, MS SQL, Oracle and MySQL use different SQL API. I don't know which better. I remember only one suggestion: 'CREATE FULLTEXT INDEX ...'. So, I believe, existing SQL API satisfies users. But it possible to emulate on grammar level subset of MySQL syntax: SQL commands CREATE FULLTEXT INDEX idxname ON tbl [ USING {GIN|GIST} ] ( field1[, [...]] ); SELECT .. FROM table WHERE MATCH( field1[, [...]] ) AGAINST ( txt ); will be translated to CREATE INDEX idxname ON tbl [ USING {GIN|GIST} ] ( to_tsquery(field1)[ || [...]] ); SELECT .. FROM table WHERE ( to_tsquery(field1)[ || [...]] ) @@ plainto_tsquery( txt ); Notes 1 that is full equivalent MySQL's MATCH() AGAINST (txt IN BOOLEAN MODE) 2 it requires to keyword MATCH & AGAINST which cannot be a function's name without quoting. Internal API changed sometimes (not every release), but I don't see a problem here: all other internal API's in postgres are often changed. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch in core patch, for inclusion
Dawid Kuroczko wrote: > This is the reason I like 'modules' best. It makes one think that it > is something maybe part of core, maybe not, but it has been isolated > into separate entity for maintenance reasons. On etymological grounds, "modules" would also be my favorite, but the term "module" is already used in the SQL standard for something different. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch in core patch, for inclusion
On 1/24/07, Andrew Dunstan <[EMAIL PROTECTED]> wrote: Peter Eisentraut wrote: >> contrib is a horrible misnomer. Can we maybe bite the bullet and call >> it something else? > plugins? How about 'modules' or 'extras' or 'extensions'? :) standard-plugins might be more informative. I think of them as being like perl's standard modules, things that are part of the standard perl distribution as opposed to all the other stuff on CPAN. Personally, I don't quite like 'plugins'. it may be that when I think of plugins, I think of 'GIMP plugins'. ;) And I think hosting providers would exclude plugins almost as often as they do with contrib. "They are not 'core' so it's safe to exclude them" Same with 'extras' or 'extensions' -- they seem to imply that you can do without them. This is the reason I like 'modules' best. It makes one think that it is something maybe part of core, maybe not, but it has been isolated into separate entity for maintenance reasons. My EUR 0.02 Regards, Dawid ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch in core patch, for inclusion
On Wed, 24 Jan 2007 22:27:10 +0100, Peter Eisentraut <[EMAIL PROTECTED]> wrote: > I wrote: >> The closest I could find is Oracle Text, the full-text search for >> Oracle. > > Oh, and note that Oracle Text is an "extension" and not included in the > Oracle database product proper. > Same with DB2 NSE, IBM's fulltext search engine for their UDB. However, they employ external admin tools like db2text to create, configure and alter fulltext indexes (like slonik for example). Textsearch could be done with functions (contains()) in SQL. Bernd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] tsearch in core patch, for inclusion
On Wed, 24 Jan 2007, Neil Conway wrote: On Wed, 2007-01-24 at 13:49 -0500, Tom Lane wrote: 2) once we put this in core we are going to be stuck with supporting its SQL API forever. Are we convinced that this API is the one we want? I don't recall even having seen any proposal or discussion. There has been some prior discussion: http://archives.postgresql.org/pgsql-hackers/2006-12/msg00919.php But I agree that we need considerably more discussion before committing the patch. I'm personally not sold on the need for modifications to the SQL grammar, for example, as opposed to just using a set of SQL-callable functions and some new system catalogs. Another question that would be easier to resolve before the patch is committed is naming: the patch currently uses a mix of "full text" and "tsearch[2]" as the name of the full-text search feature. If we're going to bless this as "the" integrated full-text search in PG, it might make more sense to use "full text search" and "FTS" exclusively. We tried to use full-text search (FTS) in the documentation http://mira.sai.msu.su/~megera/pgsql/ftsdoc/index.html. Tsearch[2] used just for historical notes, which may not go to the official documentation. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch, for inclusion
Hi there, sorry, if I will a bit verbose - just tried to answer to several postings. On Wed, 24 Jan 2007, Tom Lane wrote: Teodor Sigaev wrote: If there aren't objections then we plan commit patch tomorrow or after tomorrow. This is a fairly large patch and I would like the chance to review it before it goes in --- "we'll commit tomorrow" is not exactly a decent review window. I see your argument, no problem with that. We intentionally announced its availability several weeks ago. Peter Eisentraut <[EMAIL PROTECTED]> writes: I still haven't heard any argument for why this would be necessary or desirable at all, other than that it looks better for marketing reasons, One possible argument for this over the contrib version is a saner approach to dumping and restoring configurations. However, as against that: 1) what's the upgrade path for getting an existing tsearch2 configuration into this implementation? this is a real question and we will prepare UPGRADE notes. 2) once we put this in core we are going to be stuck with supporting its SQL API forever. Are we convinced that this API is the one we want? I don't recall even having seen any proposal or discussion. It was OK for tsearch2's API to change every release while it was in contrib, but the expectation of stability is a whole lot higher for core features. If you're talking about SQL and psql commands, than they are new and we tried to be consistent with existing approach to manage system objects. Any inconsistence we'd be happy to discuss and improve. I don't remember we changed operators and function for a long time, so users of tsearch2 should not be confused. After all, our intention is to meet user's wish to have FTS in PostgreSQL and nothing more. We several times wrote in mailing list that it's too early to move tsearch2 to the pg core, since we consider (that time) it has some scalability problem. GiN was specially developed to solve this problem and it did it. It's de facto standard to have FTS in modern database and it has no difference how you call it - plugin, extension, contrib module or built-in. It's infair to compare approach of commercial DB with postgres, since they have their own marketing police - they charge separately for every extension ! Our usual peer - MySQL has built-in FTS, for example, and I don't see any objections to not have an additional argument for our PR people, since our FTS is a way better. I agree, that requirements for core features should be stronger that for contrib module, especially, for the stability of API. So, let us discuss it. We are open for suggestions for about 6 years :) I Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] tsearch in core patch, for inclusion
On Wed, 24 Jan 2007, Martijn van Oosterhout wrote: > On Wed, Jan 24, 2007 at 09:38:06PM +0100, Stefan Kaltenbrunner wrote: > > sure that ISP is a bit stupid(especially wrt plpgsql) - but tsearch2 in > > the current version is actually imposing some additional(often > > non-trivial) complexity for things like database restores and upgrades > > so I can see an ISP wanting to avoid that altogether. > > Something I've wondered about before is the concept of having installed > Modules in the system. Let's say for example that while compiling > postgres it compiled the modules in contrib also and installed them in > a modules directory. > > Once installed there, unpriviledged users could say "INSTALL foo" and > it would install the module, even if they do not have the permissions > to create them themselves. That would be great, and also it would be great to be able to CREATE LANGUAGE as a regular user for a trusted pl that is already compiled/installed. > > That way you don't clutter the catalogs with external projects, and > there is some indication from the postgres team of some trust in these > modules. After all, if the installation made it easy to use for users, > it must be safe, right? Essentially, I think they are just pretty reluctant to run commands as a superuser on behalf of a user... -- It is better never to have been born. But who among us has such luck? One in a million, perhaps. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch in core patch, for inclusion
Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: IIRC Tom's main objection to the previous proposal was that it involved large grammar changes, which I understand is not now proposed. No, they're already in there --- the patch seems to have been written according to that proposal despite the objections. Oh. ouch. That seems strange given this query from Oleg back on 18 Nov: So, if we'll not touch grammar, are there any issues with tsearch2 in core ? cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch in core patch, for inclusion
On Wed, Jan 24, 2007 at 09:38:06PM +0100, Stefan Kaltenbrunner wrote: > sure that ISP is a bit stupid(especially wrt plpgsql) - but tsearch2 in > the current version is actually imposing some additional(often > non-trivial) complexity for things like database restores and upgrades > so I can see an ISP wanting to avoid that altogether. Something I've wondered about before is the concept of having installed Modules in the system. Let's say for example that while compiling postgres it compiled the modules in contrib also and installed them in a modules directory. Once installed there, unpriviledged users could say "INSTALL foo" and it would install the module, even if they do not have the permissions to create them themselves. That way you don't clutter the catalogs with external projects, and there is some indication from the postgres team of some trust in these modules. After all, if the installation made it easy to use for users, it must be safe, right? Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] tsearch in core patch, for inclusion
Andrew Dunstan <[EMAIL PROTECTED]> writes: > IIRC Tom's main objection to the previous proposal was that it involved > large grammar changes, which I understand is not now proposed. No, they're already in there --- the patch seems to have been written according to that proposal despite the objections. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch in core patch, for inclusion
Neil Conway wrote: If people had a problem with integrating tsearch2 in core they should have said so much earlier. Peter, Tom and others raised essentially identical objections when this design was initially proposed. For example: http://archives.postgresql.org/pgsql-hackers/2006-11/msg00392.php http://archives.postgresql.org/pgsql-hackers/2006-11/msg00405.php http://archives.postgresql.org/pgsql-hackers/2006-11/msg00437.php http://archives.postgresql.org/pgsql-hackers/2006-11/msg00397.php Was a consensus reached in that thread? (I didn't see one, but perhaps I've overlooked a mail.) IIRC Tom's main objection to the previous proposal was that it involved large grammar changes, which I understand is not now proposed. The way I read that thread was that there was no strenuous objection apart from the grammar parts. Certainly I think we can still argue about details, such as the functional API. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch, for inclusion
On Wed, 2007-01-24 at 18:38 -0300, Alvaro Herrera wrote: > In any case, I agree with Andrew that it would be pretty dumb to reject > a funded, already written patch. Well, there are two separate issues: should we include tsearch2 in core, and what syntax should it use? Changing the syntax would not require rejecting the entire patch. > If people had a problem with integrating tsearch2 in core they should > have said so much earlier. Peter, Tom and others raised essentially identical objections when this design was initially proposed. For example: http://archives.postgresql.org/pgsql-hackers/2006-11/msg00392.php http://archives.postgresql.org/pgsql-hackers/2006-11/msg00405.php http://archives.postgresql.org/pgsql-hackers/2006-11/msg00437.php http://archives.postgresql.org/pgsql-hackers/2006-11/msg00397.php Was a consensus reached in that thread? (I didn't see one, but perhaps I've overlooked a mail.) -Neil ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch, for inclusion
Joshua D. Drake wrote: > Peter Eisentraut wrote: > > I wrote: > >> The closest I could find is Oracle Text, the full-text search for > >> Oracle. > > > > Oh, and note that Oracle Text is an "extension" and not included in the > > Oracle database product proper. > > Cool. Then we will have yet another reason to claim we are superior. It's probably separate just so they can charge extra for it ;-) In our case it's going to be free either way. In any case, I agree with Andrew that it would be pretty dumb to reject a funded, already written patch. If people had a problem with integrating tsearch2 in core they should have said so much earlier. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch, for inclusion
Peter Eisentraut wrote: > I wrote: >> The closest I could find is Oracle Text, the full-text search for >> Oracle. > > Oh, and note that Oracle Text is an "extension" and not included in the > Oracle database product proper. Cool. Then we will have yet another reason to claim we are superior. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch in core patch, for inclusion
I wrote: > The closest I could find is Oracle Text, the full-text search for > Oracle. Oh, and note that Oracle Text is an "extension" and not included in the Oracle database product proper. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch in core patch, for inclusion
Stefan Kaltenbrunner wrote: > I think one can find arguments for both variants - one of the > question might even be how other databases are doing that and if the > proposed syntax is resembling one of those or not. The closest I could find is Oracle Text, the full-text search for Oracle. Browsing the documentation I see things like exec ctx_ddl.create_preference('myjlexer','japanese_lexer'); exec ctx_ddl.add_stopword('globallist','the','French'); which look pretty similar to what a procedure-based interface to tsearch2 could look like. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch in core patch, for inclusion
Peter Eisentraut wrote: > Stefan Kaltenbrunner wrote: >> sure that ISP is a bit stupid(especially wrt plpgsql) - but tsearch2 >> in the current version is actually imposing some additional(often >> non-trivial) complexity for things like database restores and >> upgrades so I can see an ISP wanting to avoid that altogether. > > I have never used tsearch2 across an upgrade, so what exactly are those > problems and why would they be specific to tsearch2? Tsearch2 changes things occasionally from release to release which make upgrades impossible with a standard pg_dump/pg_restore. I would have to double check (because I always work around the problem now) but IIRC there have been function call changes that are different from one release to the next. Sincerely, Joshua D. Drake > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch in core patch, for inclusion
Stefan Kaltenbrunner wrote: > sure that ISP is a bit stupid(especially wrt plpgsql) - but tsearch2 > in the current version is actually imposing some additional(often > non-trivial) complexity for things like database restores and > upgrades so I can see an ISP wanting to avoid that altogether. I have never used tsearch2 across an upgrade, so what exactly are those problems and why would they be specific to tsearch2? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] tsearch in core patch, for inclusion
Andrew Dunstan wrote: > Joshua D. Drake wrote: >> Where on the website can I see what "plugins" are included with >> PostgreSQL? YES! That's IMHO a more fundamental problem. The specific question about Text Search seems more like a symptom. While I don't mind Text Search in core, it seems an even bigger deal that it's hard to find information on extensions (whether from contrib or from gborg or from external places like postgis). A web page with a table easily visible on the postgresql web site that had Extension (i.e. tsearch2, postgis) Project Maturity (i.e. alpha/beta/stable) Compatability (i.e. extension 1.0 works with postgresql 8.2) Description (i.e. "full text search") URL would be a partial fix. > contrib is a horrible misnomer. Can we maybe bite the bullet and call it > something else? +1 How about "plugins" or "extensions" or "optional libraries". ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch in core patch, for inclusion
Peter Eisentraut wrote: > Jeremy Drake wrote: >> I for one am greatly looking forward to tsearch2 being in core. I >> was very fond of the plugin mechanism, until I signed up with a >> hosting provider. > > Yes, you have told us about your hosting provider before. Just make > sure your next hosting provider does not refuse to install database > objects whose OID is a multiple of 13 because of bad luck, or you might > miss out on full-text indexing again. sure that ISP is a bit stupid(especially wrt plpgsql) - but tsearch2 in the current version is actually imposing some additional(often non-trivial) complexity for things like database restores and upgrades so I can see an ISP wanting to avoid that altogether. A fully integrated fulltext search could make that much easier(in a few years when most distributions have picked up 8.3) and just telling people they should switch their hosting ISP is not always an immediatly workable solution (think contracts,migration costs,legacy apps). Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch in core patch, for inclusion
Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes: > Neil Conway wrote: >> Another question that would be easier to resolve before the patch is >> committed is naming: the patch currently uses a mix of "full text" and >> "tsearch[2]" as the name of the full-text search feature. If we're going >> to bless this as "the" integrated full-text search in PG, it might make >> more sense to use "full text search" and "FTS" exclusively. > making this consistent makes a lot of sense and I agree that it might be > a good idea to just call it FTS (or similiar). > But on the other side would have to go as far as renaming > TSVECTOR/TSQUERY to FTSVECTOR/FTSQUERY or similiar which might pose some > considerable headache for people upgrading from the contrib/ version. If we use "text search" (abbrev TS) as the key phrase we can avoid that. But this reiterates my point that the upgrade path for existing tsearch2 users is an important thing to consider. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch in core patch, for inclusion
Peter Eisentraut wrote: Andrew Dunstan wrote: contrib is a horrible misnomer. Can we maybe bite the bullet and call it something else? plugins? standard-plugins might be more informative. I think of them as being like perl's standard modules, things that are part of the standard perl distribution as opposed to all the other stuff on CPAN. Maybe it needs to split into two - things that are genuine plugins and other stuff (e.g. start-scripts). cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch in core patch, for inclusion
Neil Conway wrote: > On Wed, 2007-01-24 at 13:49 -0500, Tom Lane wrote: >> 2) once we put this in core we are going to be stuck with supporting its >> SQL API forever. Are we convinced that this API is the one we want? >> I don't recall even having seen any proposal or discussion. > > There has been some prior discussion: > > http://archives.postgresql.org/pgsql-hackers/2006-12/msg00919.php > > But I agree that we need considerably more discussion before committing > the patch. I'm personally not sold on the need for modifications to the > SQL grammar, for example, as opposed to just using a set of SQL-callable > functions and some new system catalogs. I think one can find arguments for both variants - one of the question might even be how other databases are doing that and if the proposed syntax is resembling one of those or not. > > Another question that would be easier to resolve before the patch is > committed is naming: the patch currently uses a mix of "full text" and > "tsearch[2]" as the name of the full-text search feature. If we're going > to bless this as "the" integrated full-text search in PG, it might make > more sense to use "full text search" and "FTS" exclusively. making this consistent makes a lot of sense and I agree that it might be a good idea to just call it FTS (or similiar). But on the other side would have to go as far as renaming TSVECTOR/TSQUERY to FTSVECTOR/FTSQUERY or similiar which might pose some considerable headache for people upgrading from the contrib/ version. Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match