I wrote:
(As an example, foo-beta1 is a numhword, with component tokens
foo an aword and beta1 a numword. This is how it works now
modulo the redefinition of the base categories.)
Argh... need more caffeine. Obviously the component tokens would
be apart_hword and numpart_hword. They'd be
I wrote:
Maybe aword, word, and numword?
Does the lack of response mean people are satisfied with that?
Fleshing the proposal out to include the hyphenated-word categories:
aword All ASCII letters
wordAll letters according to iswalpha()
numword Mixed letters and
On Oct 23, 2007, at 10:42 , Tom Lane wrote:
apart_hword Part of hyphenated word, all ASCII letters
part_hword Part of hyphenated word, all letters
numpart_hword Part of hyphenated word, mixed letters and digits
Is there a rationale for using these instead of hword_apart,
Michael Glaesemann [EMAIL PROTECTED] writes:
On Oct 23, 2007, at 10:42 , Tom Lane wrote:
apart_hword Part of hyphenated word, all ASCII letters
part_hword Part of hyphenated word, all letters
numpart_hwordPart of hyphenated word, mixed letters and digits
Is there a rationale for
Tom Lane [EMAIL PROTECTED] writes:
I wrote:
Maybe aword, word, and numword?
Does the lack of response mean people are satisfied with that?
Sorry, I had a couple responses partially written but never finished.
If we were doing it from scratch I would suggest using longer names. At the
least
Tom Lane wrote:
Michael Glaesemann [EMAIL PROTECTED] writes:
On Oct 23, 2007, at 10:42 , Tom Lane wrote:
apart_hwordPart of hyphenated word, all ASCII letters
part_hword Part of hyphenated word, all letters
numpart_hword Part of hyphenated word, mixed letters and digits
Gregory Stark wrote:
Tom Lane [EMAIL PROTECTED] writes:
I wrote:
Maybe aword, word, and numword?
Does the lack of response mean people are satisfied with that?
Sorry, I had a couple responses partially written but never finished.
If we were doing it from scratch I would suggest
Alvaro Herrera [EMAIL PROTECTED] writes:
Gregory Stark wrote:
If we were doing it from scratch I would suggest using longer names. At the
least I would still suggest using ascii or asciiword instead of aword.
+1 for asciiword; aword sounds too much like a word which is not the
meaning I
Tom Lane wrote:
OK, so with that and Michael's suggestion we have
asciiword
word
numword
asciihword
hword
numhword
hword_asciipart
hword_part
hword_numpart
Sold?
Sold here.
--
Alvaro Herrera
Tom Lane [EMAIL PROTECTED] writes:
hword_asciipart
hword_part
hword_numpart
Out of curiosity would the foo in foo-bär or the foo-beta1 be a
hword_asciipart or a hword_part/hword_numpart?
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Gregory Stark [EMAIL PROTECTED] writes:
Out of curiosity would the foo in foo-bär or the foo-beta1 be a
hword_asciipart or a hword_part/hword_numpart?
foo would be hword_asciipart independently of what was in the other
parts of the hword. AFAICS this is what you want for the purpose,
which is
On Oct 23, 2007, at 12:09 , Alvaro Herrera wrote:
Tom Lane wrote:
OK, so with that and Michael's suggestion we have
asciiword
word
numword
asciihword
hword
numhword
hword_asciipart
hword_part
hword_numpart
Sold?
Michael Glaesemann [EMAIL PROTECTED] writes:
Tom Lane wrote:
asciiword
word
numword
No huge preference, but I see benefit in what Gregory was saying re:
asciiword, alphaword, alnumword. word itself is pretty general, while
alphaword ties it much closer to its intended meaning. They've
Just for clarification.
Are you going to make these changes in the 8.3 beta test period?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
If I am reading the state machine in wparser_def.c correctly, the
three classifications of words that the default parser knows are
lword Composed entirely of
Tatsuo Ishii [EMAIL PROTECTED] writes:
Just for clarification.
Are you going to make these changes in the 8.3 beta test period?
Yes, I committed them a couple hours ago.
regards, tom lane
---(end of broadcast)---
TIP 7:
Alvaro Herrera wrote:
Tom Lane wrote:
ISTM that perhaps a more generally useful definition would be
lwordOnly ASCII letters
nlword Entirely letters per iswalpha(), but not lword
word Entirely alphanumeric per iswalnum(), but not nlword
Alvaro Herrera wrote:
Tom Lane wrote:
ISTM that perhaps a more generally useful definition would be
lword Only ASCII letters
nlword Entirely letters per iswalpha(), but not lword
word Entirely alphanumeric per iswalnum(), but not nlword
Heikki Linnakangas [EMAIL PROTECTED] writes:
Alvaro Herrera wrote:
Tom Lane wrote:
ISTM that perhaps a more generally useful definition would be
lword Only ASCII letters
nlword Entirely letters per iswalpha(), but not lword
wordEntirely
Heikki Linnakangas [EMAIL PROTECTED] writes:
Alvaro Herrera wrote:
lwordEntirely letters per iswalpha, with at least one ASCII
nlword Entirely letters per iswalpha
word Entirely alphanumeric per iswalnum, but not nlword
I don't like this categorization
Gregory Stark [EMAIL PROTECTED] writes:
Heikki Linnakangas [EMAIL PROTECTED] writes:
I like the aword name more than lword, BTW. If we change the meaning
of the classes, surely we can change the name as well, right?
I'm not very familiar with the use case here. Is there a good reason to want
If I am reading the state machine in wparser_def.c correctly, the
three classifications of words that the default parser knows are
lword Composed entirely of ASCII letters
nlword Composed entirely of non-ASCII letters
(where letter is defined by iswalpha())
word
Tom Lane wrote:
ISTM that perhaps a more generally useful definition would be
lword Only ASCII letters
nlwordEntirely letters per iswalpha(), but not lword
word Entirely alphanumeric per iswalnum(), but not nlword
(hence, includes at least one
Alvaro Herrera [EMAIL PROTECTED] writes:
Tom Lane wrote:
ISTM that perhaps a more generally useful definition would be
lwordOnly ASCII letters
nlword Entirely letters per iswalpha(), but not lword
word Entirely alphanumeric per iswalnum(), but not
23 matches
Mail list logo