On Tue, 22 Jan 2008 16:45:36 -0800, Scott Hess wrote
> MATCH "foo -bar" should return "The set of documents which match foo
> but not bar".  I read MATCH "-bar" as "The set of all documents which
> do not match bar".  MATCH "-foo -bar" would be "The set of all
> documents which do not match foo and do not match bar".

Each match word adds a constraint which filters down the results.  Each word
in the match query disqualifies all documents that have or don't have that
word, depending on the presence or absence of a leading -, respectively.  Is
that the right way to look at things?  This seems consistent with SQL's design
of adding constraints to queries which default to yielding all rows, and it
also seems to match up with your above explanation.  But it doesn't explain
the empty match query case, in which no constraints are given.

Previously I had thought that the result set is initially empty, that words
lacking -'s add to the set, and words with -'s remove from the set.  This is
consistent with the fact that an empty match query returns zero results, and
this reasoning predicts that a match query consisting only of -words will also
give zero results.  But this isn't how you explain things.

> I'm not sure how the empty-string results matter, as I don't consider 
> MATCH "-foo" to have an implicit empty term.

Okay, I agree that it's logical for a match query consisting only of negated
words to return all rows lacking those words, and I am fine with this
particular case being unsupported.  (There are other "missing" features in
SQLite that are more important to me, like recursive triggers and foreign key
constraints, so I don't mind waiting on this one.)

However, it's also logical (I think--- show me where I'm wrong) for an empty
match query to return all rows, which is an unsupported operation.  Yet rather
than fail with an SQL logic error, fts3 yields zero rows in this case.  I find
this to be inconsistent, and I'd rather have both throw errors or both return
zero results.

> That's what I'm saying.  Calculating the set of all documents which
> match "foo" and the set of all documents which match "bar" and
> removing the latter from the former is conveniently available from 
> the fts index.  Calculating the set of all documents in the fts 
> index would require running a separate query to figure it out.

I take from your discussion that the fts index keeps track of all rows that
contain a given word.  Is it reasonable to add one more entry to the index
that lists all rows?  The indexed "word" could even be empty string, as in all
rows contain empty string. :^)  Then in any match query lacking nonnegated
words (i.e. empty match query or entirely negative match query), the match
words' index sets are intersected with or subtracted from this index of
everything, as if the match query indeed has an implicit empty term.

-- 
Andy Goth | <[EMAIL PROTECTED]> | http://andy.junkdrome.org/


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to