On Sun, 20 Mar 2005, Andrew J. Kopciuch wrote:

> On Thursday 17 March 2005 17:55, you wrote:
> > The short question is why does this:
> >
> > select to_tsvector('default', coalesce(name, '') ||' '||
> > coalesce(description, '') ||' '|| coalesce(keywords,'')) from link_items;
> >
> > give different results than this:
> >
> > update link_items set linksfti=to_tsvector('default', coalesce(name, '')
> >
> > ||' '|| coalesce(description, '') ||' '|| coalesce(keywords,''));
> >
> > select linksfti from link_items;
> >
>
>
> It shouldn't.  I just tested with some of your data on my machine, and the
> results were fine.  (PG 7.4.6).  I can not see why they would be any
> different for 7.3.2.
>
> Your data looks rather strange. (re "'60':1 '000':2").  Is that really all
> that was inserted? Or have you just left some out for your email?

About 95% of the 1900 insertions ended up with empty strings (not NULLs),
the other 5% looked like that above.  Either just numbers, or occasionally
words (defined by consecutive non-whitespace characters separated by
whitespace) that had numbers or symbols in them.  Like: "U.S. Senate" was
transformed in such a way that "Senate" was dropped completely and "U.S."
became lowercased "u.s.".  Another example was a URL that happened to be
in the description column of one was captured, but the rest of the text
was not.  Another had a name of "World T.E.A.M. Sports" and all that was
stored in the vector was "t.e.a.m."

It seems to be selective of only numbers, words with numbers in them,
words with '.' or '/' characters.  It completely ignores any other words
or text in any of the 3 fields.

> I could see this being a configuration issue possibly.  What do your pg_ts
> tables look like?  Have you made modifications there?

This morning, I decided to remove the following trigger from the
link_items table:
CREATE TRIGGER updateprodtbl
  BEFORE INSERT OR UPDATE
  ON link_items
  FOR EACH ROW
  EXECUTE PROCEDURE tsearch2('linksfti', 'name', 'description',
'keywords');

Now the UPDATE command I listed above works, so apparently there is
something about this trigger that is blocking the search vector from being
stored.  This trigger was copied and pasted (with only changes to the
column names) from
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html
in the INDEXING FIELDS IN A TABLE section.

This does fix the immediate problem of getting the search results for the
live website we are running, but now we have the problem of keeping it up
to date.  I could run a nightly cronjob to update all the rows, but it
seems inefficient, and I would really like to know why the trigger would
keep it from working on 7.3, but not 8.0.

You requested the pg_ts_* tables:
On the Linux-redhat, pg7.3.2

pg_ts_cfg: (4 rows)
oid     ts_name                 prs_name        locale
106407  "default"               "default"       "C"
988004  "default_english"       "default"       "en_US"
106408  "default_russian"       "default"       "ru_RU.KOI8-R"
106409  "simple"                "default"       NULL

pg_ts_dict: (5 rows)
oid     dict_name       dict_initoption dict_init       dict_lexize
106356  en_stem         ""              snb_en_init     snb_lexize
106361  ispell_template NULL            spell_init      spell_lexize
106358  ru_stem         "/usr/local/pgsql/share/contrib/russion.stop"
snb_ru_init     snb_lexize
106353  simple          NULL            dex_init        dex_lexize
106364  synonym         NULL            syn_init        syn_lexize

pg_ts_parser: (1 row)
oid     prs_name        prs_start       prs_nexttoken   prs_end
prs_headline    prs_lextype
106389  "default"       prsd_start      prsd_getlexeme  prsd_end
prsd_headline   prsd_lextype

pg_ts_cfgmap(73 rows)
ts_name         tok_alias       dict_name
"default"       "lword" "{en_stem}"
"default"       "nlword"        "{simple}"
"default"       "word"  "{simple}"
"default"       "email" "{simple}"
"default"       "url"   "{simple}"
"default"       "host"  "{simple}"
"default"       "sfloat"        "{simple}"
"default"       "version"       "{simple}"
"default"       "part_hword"    "{simple}"
"default"       "nlpart_hword"  "{simple}"
"default"       "lpart_hword"   "{en_stem}"
"default"       "hword" "{simple}"
"default"       "lhword"        "{en_stem}"
"default"       "nlhword"       "{simple}"
"default"       "uri"   "{simple}"
"default"       "file"  "{simple}"
"default"       "float" "{simple}"
"default"       "int"   "{simple}"
"default"       "uint"  "{simple}"
"default_russian"       "lword"         "{en_stem}"
"default_russian"       "nlword"        "{ru_stem}"
"default_russian"       "word"  "{ru_stem}"
"default_russian"       "email" "{simple}"
"default_russian"       "url"   "{simple}"
"default_russian"       "host"  "{simple}"
"default_russian"       "sfloat"        "{simple}"
"default_russian"       "version"       "{simple}"
"default_russian"       "part_hword"    "{simple}"
"default_russian"       "nlpart_hword"  "{ru_stem}"
"default_russian"       "lpart_hword"   "{en_stem}"
"default_russian"       "hword" "{ru_stem}"
"default_russian"       "lhword"        "{en_stem}"
"default_russian"       "nlhword"       "{ru_stem}"
"default_russian"       "uri"   "{simple}"
"default_russian"       "file"  "{simple}"
"default_russian"       "float" "{simple}"
"default_russian"       "int"   "{simple}"
"default_russian"       "uint"  "{simple}"
"simple"        "lword" "{simple}"
"simple"        "nlword"        "{simple}"
"simple"        "word"  "{simple}"
"simple"        "email" "{simple}"
"simple"        "url"   "{simple}"
"simple"        "host"  "{simple}"
"simple"        "sfloat"        "{simple}"
"simple"        "version"       "{simple}"
"simple"        "part_hword"    "{simple}"
"simple"        "nlpart_hword"  "{simple}"
"simple"        "lpart_hword"   "{simple}"
"simple"        "hword" "{simple}"
"simple"        "lhword"        "{simple}"
"simple"        "nlhword"       "{simple}"
"simple"        "uri"   "{simple}"
"simple"        "file"  "{simple}"
"simple"        "float" "{simple}"
"simple"        "int"   "{simple}"
"simple"        "uint"  "{simple}"
"default_english"       "url"   "{simple}"
"default_english"       "host"  "{simple}"
"default_english"       "sfloat"        "{simple}"
"default_english"       "uri"   "{simple}"
"default_english"       "int"   "{simple}"
"default_english"       "float" "{simple}"
"default_english"       "email" "{simple}"
"default_english"       "word"  "{simple}"
"default_english"       "hword" "{simple}"
"default_english"       "nlword"        "{simple}"
"default_english"       "nlpart_hword"  "{simple}"
"default_english"       "part_hword"    "{simple}"
"default_english"       "nlhword"       "{simple}"
"default_english"       "file"  "{simple}"
"default_english"       "uint"  "{simple}"
"default_english"       "version"       "{simple}"


On the 8.0.0-beta5 Windows 2000 machine:
pg_ts_cfg: (4 rows, identical to 7.3.2)
oid     ts_name                 prs_name        locale
370162  "default"               "default"       "C"
370165  "default_english"       "default"       "en_US"
370163  "default_russian"       "default"       "ru_RU.KOI8-R"
370164  "simple"                "default"       NULL

pg_ts_cfgmap (73 rows, identical to 7.3.2)
Not listed again, all identical except for OIDs

pg_ts_dict (5 rows, identical to 7.3.2 except for OIDS)

pg_ts_parser (1 row, identical to 7.3.2 except for OIDs)

I have made a single change to it from its default installation.  When I
was working with the rank_cd() function on the 8.0.0 machine, it had
errors due to a non-existant english stop file, so I changed
pg_ts_dict.dict_initoption = '' where dict_name = 'en_stem'.  The indexing
system was working fine both before and after the change to the pg_ts_dict
table.  I also propagated the change to the 7.3.2 machine even though it
didn't have the error message (the stop file didn't exist on that computer
either, but it never gave an error message about it).


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to