The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/11/textsearch-debugging.html Description:
It would be helpful if there were some documentation on how to query the dictionaries themselves, to get a canonical root word, either. 1. Directly, such as: "SELECT words FROM english_stem WHERE stem = 'chlorin' -- should return e.g. "chlorine", "chlorination", "chlorinated" -- there isn't any documentation on how to actually do this. 2. Indirectly, such as: "SELECT ts_unlexize('english_stem','chlorin'); -- this is a function which doesn't yet seem to exist: the one-to-many inverse of ts_lexize(). 3. Or, the canonical version of (2). "SELECT ts_canonical('english_stem','chlorin'); --a one to one function to find the english root word (not the lexeme). An example of where this is useful: consider a list of documents, containing a large amount of english text. For this example, consider that the following words are frequent: "the", "kitten", "kittens", "chlorination", "chlorinated", "temperature" and "something". We wish to display a "tag cloud" of the most common terms, excluding stopwords, by means of ts_stat(). At the moment, it lists: "kitten" -- correctly treating "kitten" and "kittens" as the same. "chlorin" -- correctly merging "chlorination" and "chlorinated", but creating a non-word. "temperatur" -- right stem, not a word. "someth" -- mistaken parser, has removed the -ing suffix. So, given the array ["kitten","chlorin","temperatur","someth"], we wish to un-stem to find the first valid english word whose stem is in that array, i.e. ["kitten", "chlorine", "temperature", "something"] Note that it is intentional to retrieve "chlorine" even though the original inputs were "chlorinated" and "chlorination", and did not necessarily contain "chlorine"] There doesn't seem to be any process for doing this. Not sure whether this is just something for the documentation, or an RFE for (2). Thanks very much.