Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Tom Lane
Tatsuo Ishii <[EMAIL PROTECTED]> writes: > Just for clarification. > Are you going to make these changes in the 8.3 beta test period? Yes, I committed them a couple hours ago. regards, tom lane ---(end of broadcast)--- TIP 7

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Tatsuo Ishii
Just for clarification. Are you going to make these changes in the 8.3 beta test period? -- Tatsuo Ishii SRA OSS, Inc. Japan > If I am reading the state machine in wparser_def.c correctly, the > three classifications of words that the default parser knows are > > lword Composed entirely

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Tom Lane
Michael Glaesemann <[EMAIL PROTECTED]> writes: >> Tom Lane wrote: >>> asciiword >>> word >>> numword > No huge preference, but I see benefit in what Gregory was saying re: > asciiword, alphaword, alnumword. word itself is pretty general, while > alphaword ties it much closer to its intended me

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Michael Glaesemann
On Oct 23, 2007, at 12:09 , Alvaro Herrera wrote: Tom Lane wrote: OK, so with that and Michael's suggestion we have asciiword word numword asciihword hword numhword hword_asciipart hword_part hword_numpart Sold? Sol

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > Out of curiosity would the foo in foo-bär or the foo-beta1 be a > hword_asciipart or a hword_part/hword_numpart? foo would be hword_asciipart independently of what was in the other parts of the hword. AFAICS this is what you want for the purpose, which

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes: > hword_asciipart > hword_part > hword_numpart Out of curiosity would the foo in foo-bär or the foo-beta1 be a hword_asciipart or a hword_part/hword_numpart? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com --

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Alvaro Herrera
Tom Lane wrote: > OK, so with that and Michael's suggestion we have > > asciiword > word > numword > > asciihword > hword > numhword > > hword_asciipart > hword_part > hword_numpart > > Sold? Sold here. -- Alvaro Herrera

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Gregory Stark wrote: >> If we were doing it from scratch I would suggest using longer names. At the >> least I would still suggest using "ascii" or "asciiword" instead of "aword". > +1 for asciiword; "aword" sounds too much like "a word" which is not th

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Alvaro Herrera
Gregory Stark wrote: > "Tom Lane" <[EMAIL PROTECTED]> writes: > > > I wrote: > >> Maybe "aword", "word", and "numword"? > > > > Does the lack of response mean people are satisfied with that? > > Sorry, I had a couple responses partially written but never finished. > > If we were doing it from sc

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Alvaro Herrera
Tom Lane wrote: > Michael Glaesemann <[EMAIL PROTECTED]> writes: > > On Oct 23, 2007, at 10:42 , Tom Lane wrote: > >> apart_hwordPart of hyphenated word, all ASCII letters > >> part_hword Part of hyphenated word, all letters > >> numpart_hword Part of hyphenated word, mixed letters and

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes: > I wrote: >> Maybe "aword", "word", and "numword"? > > Does the lack of response mean people are satisfied with that? Sorry, I had a couple responses partially written but never finished. If we were doing it from scratch I would suggest using longer names.

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Tom Lane
Michael Glaesemann <[EMAIL PROTECTED]> writes: > On Oct 23, 2007, at 10:42 , Tom Lane wrote: >> apart_hword Part of hyphenated word, all ASCII letters >> part_hword Part of hyphenated word, all letters >> numpart_hwordPart of hyphenated word, mixed letters and digits > Is there a ration

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Michael Glaesemann
On Oct 23, 2007, at 10:42 , Tom Lane wrote: apart_hword Part of hyphenated word, all ASCII letters part_hword Part of hyphenated word, all letters numpart_hword Part of hyphenated word, mixed letters and digits Is there a rationale for using these instead of hword_apart, hword_pa

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Tom Lane
I wrote: > (As an example, "foo-beta1" is a numhword, with component tokens > "foo" an aword and "beta1" a numword. This is how it works now > modulo the redefinition of the base categories.) Argh... need more caffeine. Obviously the component tokens would be apart_hword and numpart_hword. They

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-23 Thread Tom Lane
I wrote: > Maybe "aword", "word", and "numword"? Does the lack of response mean people are satisfied with that? Fleshing the proposal out to include the hyphenated-word categories: aword All ASCII letters wordAll letters according to iswalpha() numword Mixed letters

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-22 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > "Heikki Linnakangas" <[EMAIL PROTECTED]> writes: >> I like the "aword" name more than "lword", BTW. If we change the meaning >> of the classes, surely we can change the name as well, right? > I'm not very familiar with the use case here. Is there a good

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-22 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes: > Alvaro Herrera wrote: >> lwordEntirely letters per iswalpha, with at least one ASCII >> nlword Entirely letters per iswalpha >> word Entirely alphanumeric per iswalnum, but not nlword > I don't like this categ

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-22 Thread Gregory Stark
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes: > Alvaro Herrera wrote: >> Tom Lane wrote: >> >>> ISTM that perhaps a more generally useful definition would be >>> >>> lword Only ASCII letters >>> nlword Entirely letters per iswalpha(), but not lword >>> word

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-22 Thread Tatsuo Ishii
> Alvaro Herrera wrote: > > Tom Lane wrote: > > > >> ISTM that perhaps a more generally useful definition would be > >> > >> lword Only ASCII letters > >> nlword Entirely letters per iswalpha(), but not lword > >> word Entirely alphanumeric per iswalnum(), bu

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-22 Thread Heikki Linnakangas
Alvaro Herrera wrote: > Tom Lane wrote: > >> ISTM that perhaps a more generally useful definition would be >> >> lwordOnly ASCII letters >> nlword Entirely letters per iswalpha(), but not lword >> word Entirely alphanumeric per iswalnum(), but not nlword >>

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-21 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> ISTM that perhaps a more generally useful definition would be >> >> lwordOnly ASCII letters >> nlword Entirely letters per iswalpha(), but not lword >> word Entirely alphanumeric per iswalnum(), b

Re: [HACKERS] Latin vs non-Latin words in text search parsing

2007-10-21 Thread Alvaro Herrera
Tom Lane wrote: > ISTM that perhaps a more generally useful definition would be > > lword Only ASCII letters > nlwordEntirely letters per iswalpha(), but not lword > word Entirely alphanumeric per iswalnum(), but not nlword > (hence, includes at leas