On Tue, 31 May 2016 12:05:39 -0400 Bill Cole wrote: > On 31 May 2016, at 2:21, Henrik K wrote: > > > On Mon, May 30, 2016 at 06:25:08PM -0400, Dianne Skoll wrote: > >> On Mon, 30 May 2016 17:45:52 -0400 > >> "Bill Cole" <sausers-20150...@billmail.scconsult.com> wrote: > >> > >>> So you could have 'sex' and 'meds' and 'watches' tallied up in > >>> into frequency counts that sum up natural (word) and synthetic > >>> (concept) occurrences, not just as incompatible types of input > >>> feature but as a conflation of incompatible features. > >> > >> That is easy to patch by giving "concepts" a separate namespace. > >> You could do that by picking a character that can't be in a normal > >> token and > >> using something like: concept*meds, concept*sex, etc. as tokens. > > > > This is how the put_metadata stuff already works in concepts and > > other plugins. It sees a "Hx-sa-concepts:foobar" token. > > That's less bad than the description Paul Stead originally gave, > which was to add headers with various simple word tags "which Bayes > can use as tokens." If the actual implementation is doing something > else in a separate Bayes DB, I don't see a problem with it (although > I'd expect it to be less accurate than 1-word Bayes)
It's not in a separate database, it's just that words in headers generate distinct tokens from words in the body.