RE: [sqlite] Adding additional operators to FTS3

2007-09-14 Thread Linus Upson
The query language that fts understands doesn't need to be the same as the
query language the user types. In fact, it is often advantageous to have the
user visible query language be much more lenient than what fts wants to
consume. Most sqlite apps don't force users to type sql!

Linus


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 14, 2007 1:50 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] Adding additional operators to FTS3

"Samuel R. Neff" <[EMAIL PROTECTED]> wrote:
> The /10 syntax makes sense to programmers but I think users are going to
> forget it pretty quickly.  Same with "OR" an "NEAR" being required to be
all
> caps (I didn't know that).  Ideally the UI an application exposes would
show
> the user that OR and NEAR were interpreted as keywords and not tokens (of
> course that's up to individual application developers and not an issue for
> sqlite/fts).
> 

Go to http://www.google.com/ and type in "sqlite or lemon".  Press
"Search".  You'll get a nice little reminder that you need to
uppercase "OR" to search for either of two items.  FTS is not
the only full-text search engine that requires you to uppercase
OR...

Most casual users don't know that you can do a phrase search, or
an OR search, or a negated term search in Google or in any other
search engine.  They just type in a bunch of words and expect
it to work right.  So the main objective of the UI is to stay
out of the way of the casual user and do not surprise them with
special syntax.  If you want to go beyond casual keyword searching,
I do not think it is too much to require NEAR/10 (how else does
it know you don't mean NEAR/15?) or uppercase OR.

--
D. Richard Hipp <[EMAIL PROTECTED]>



-
To unsubscribe, send email to [EMAIL PROTECTED]

-


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Adding additional operators to FTS3

2007-09-14 Thread drh
"Samuel R. Neff" <[EMAIL PROTECTED]> wrote:
> The /10 syntax makes sense to programmers but I think users are going to
> forget it pretty quickly.  Same with "OR" an "NEAR" being required to be all
> caps (I didn't know that).  Ideally the UI an application exposes would show
> the user that OR and NEAR were interpreted as keywords and not tokens (of
> course that's up to individual application developers and not an issue for
> sqlite/fts).
> 

Go to http://www.google.com/ and type in "sqlite or lemon".  Press
"Search".  You'll get a nice little reminder that you need to
uppercase "OR" to search for either of two items.  FTS is not
the only full-text search engine that requires you to uppercase
OR...

Most casual users don't know that you can do a phrase search, or
an OR search, or a negated term search in Google or in any other
search engine.  They just type in a bunch of words and expect
it to work right.  So the main objective of the UI is to stay
out of the way of the casual user and do not surprise them with
special syntax.  If you want to go beyond casual keyword searching,
I do not think it is too much to require NEAR/10 (how else does
it know you don't mean NEAR/15?) or uppercase OR.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] Adding additional operators to FTS3

2007-09-14 Thread Samuel R. Neff

The /10 syntax makes sense to programmers but I think users are going to
forget it pretty quickly.  Same with "OR" an "NEAR" being required to be all
caps (I didn't know that).  Ideally the UI an application exposes would show
the user that OR and NEAR were interpreted as keywords and not tokens (of
course that's up to individual application developers and not an issue for
sqlite/fts).

Sam

---
We're Hiring! Seeking a passionate developer to join our team building
products. Position is in the Washington D.C. metro area. If interested
contact [EMAIL PROTECTED]
 
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 14, 2007 3:45 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] Adding additional operators to FTS3

"Scott Hess" <[EMAIL PROTECTED]> wrote:
> 
> One thing I'll think on in the background as a how-to-integrate
> question is the balance between sophistication for query experts
> versus the approachability for non-experts.  For some systems, having
> things like proximity queries complicates the query language to no
> particular end, while in other systems proximity queries might be
> essential.  Insofar as more sophisticated query forms don't interfere
> with simpler forms, they can just be ignored, but it would be nice if
> they didn't crop up in warts like unexpected results for a search
> 'stoplight near krispy kreme' where you no longer find documents where
> stoplight is more than 10 terms away from krispy.  We've discussed
> having the ability to express both more ad-hoc and more stylized
> queries, maybe this is something to think about along those lines.
> 

The OR operator has to be all caps in order to be recognized as
an operator and not the word "or".  Presumably the NEAR operator
would work the same way.  That would minimize the chance of a
collision with the word "near".  We might also require the "/10"
after the NEAR keyword or else it goes back to just being the
token "near" and not an operator.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Adding additional operators to FTS3

2007-09-14 Thread drh
"Scott Hess" <[EMAIL PROTECTED]> wrote:
> 
> One thing I'll think on in the background as a how-to-integrate
> question is the balance between sophistication for query experts
> versus the approachability for non-experts.  For some systems, having
> things like proximity queries complicates the query language to no
> particular end, while in other systems proximity queries might be
> essential.  Insofar as more sophisticated query forms don't interfere
> with simpler forms, they can just be ignored, but it would be nice if
> they didn't crop up in warts like unexpected results for a search
> 'stoplight near krispy kreme' where you no longer find documents where
> stoplight is more than 10 terms away from krispy.  We've discussed
> having the ability to express both more ad-hoc and more stylized
> queries, maybe this is something to think about along those lines.
> 

The OR operator has to be all caps in order to be recognized as
an operator and not the word "or".  Presumably the NEAR operator
would work the same way.  That would minimize the chance of a
collision with the word "near".  We might also require the "/10"
after the NEAR keyword or else it goes back to just being the
token "near" and not an operator.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Adding additional operators to FTS3

2007-09-14 Thread Scott Hess
[Just some background, while I'm thinking about it.]

Google has placeholder syntax, if you search for 'full * search', the
first hit is a Wikipedia article on 'full text search'.  It's not the
same, but it is in a similar ballpark.  For instance, you might have
'full */10 search' (Google doesn't, but it might be a reasonable
extension).  Doing the fixed-distance thing is easier than allowing
for variable distance.

Lucene (*) has '"full search"~10' as a proximity search for "full" and
"search" within 10 of each other.  Without more research, I'm not
clear on whether this has ordering implications, and I'm not entirely
certain how you would compose a sequence of such things.

Xapian (**) has two variants.  "full NEAR/10 search" and "full ADJ/10
search", the latter enforces the order given.

I think that being able to do things like "full NEAR/5 text NEAR/5
search" feels more approachable than "full text search"~5.  I won't
say it's more understandable, because I think any of these kinds of
search may have non-obvious subtleties, but I think the infix version
might make it easier to generate useful results for ad-hoc queries
without having to step back too far.

One thing I'll think on in the background as a how-to-integrate
question is the balance between sophistication for query experts
versus the approachability for non-experts.  For some systems, having
things like proximity queries complicates the query language to no
particular end, while in other systems proximity queries might be
essential.  Insofar as more sophisticated query forms don't interfere
with simpler forms, they can just be ignored, but it would be nice if
they didn't crop up in warts like unexpected results for a search
'stoplight near krispy kreme' where you no longer find documents where
stoplight is more than 10 terms away from krispy.  We've discussed
having the ability to express both more ad-hoc and more stylized
queries, maybe this is something to think about along those lines.

-scott

(*) 
http://lucene.apache.org/java/docs/queryparsersyntax.html#Proximity%20Searches
(**) http://www.xapian.org/docs/queryparser.html



On 9/14/07, Mike Marshall <[EMAIL PROTECTED]> wrote:
> Thanks for the quick response, much appreciated.  Guess I better go and look
> at the query parser.
>
> Thanks again
>
> Mike
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: 14 September 2007 19:22
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] Adding additional operators to FTS3
>
> "Mike Marshall" <[EMAIL PROTECTED]> wrote:
> >
> > 1)   We need to be able to index items such as AT, this seems like
> > it's a case of replacing the default tokeniser with our own implementation
>
> Correct.
>
> >
> > 2)   A NEAR query operator  so we can do things like 'foo NEAR10 bar'
> > which will bring back all documents that have bar within 10 words of foo
> > (either direction).  This is the one that I'm really not sure on and
> having
> > looked at the code don't really have a clue where to start.
> >
>
> A NEAR operator is just a generalization of a phrase
> search.  A phrase search is when you put two keywords in
> doublequotes:   '"foo bar"'  FTS looks for documents that
> contain the words foo and bar such that bar occurs immediately
> after foo.  FTS records the index of each word in each document,
> so what phrase search is really doing is looking for instances
> of foo and bar where the index of bar is exactly one more than the
> index of foo.  To implement NEAR10 you just have to look for
> instances of bar with an index that is not more than 10 different
> from the index on foo.  Not such a big change, really.  The
> hard part will be parsing out the NEAR10 operator.
>
> --
> D. Richard Hipp <[EMAIL PROTECTED]>
>
>
> 
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> 
> -
>
>
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>
>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] Adding additional operators to FTS3

2007-09-14 Thread Mike Marshall
Thanks for the quick response, much appreciated.  Guess I better go and look
at the query parser.

Thanks again

Mike

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: 14 September 2007 19:22
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] Adding additional operators to FTS3

"Mike Marshall" <[EMAIL PROTECTED]> wrote:
> 
> 1)   We need to be able to index items such as AT, this seems like
> it's a case of replacing the default tokeniser with our own implementation

Correct.

> 
> 2)   A NEAR query operator  so we can do things like 'foo NEAR10 bar'
> which will bring back all documents that have bar within 10 words of foo
> (either direction).  This is the one that I'm really not sure on and
having
> looked at the code don't really have a clue where to start.
> 

A NEAR operator is just a generalization of a phrase
search.  A phrase search is when you put two keywords in
doublequotes:   '"foo bar"'  FTS looks for documents that
contain the words foo and bar such that bar occurs immediately
after foo.  FTS records the index of each word in each document,
so what phrase search is really doing is looking for instances
of foo and bar where the index of bar is exactly one more than the
index of foo.  To implement NEAR10 you just have to look for
instances of bar with an index that is not more than 10 different
from the index on foo.  Not such a big change, really.  The
hard part will be parsing out the NEAR10 operator.

--
D. Richard Hipp <[EMAIL PROTECTED]>



-
To unsubscribe, send email to [EMAIL PROTECTED]

-



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Adding additional operators to FTS3

2007-09-14 Thread drh
"Mike Marshall" <[EMAIL PROTECTED]> wrote:
> 
> 1)   We need to be able to index items such as AT, this seems like
> it's a case of replacing the default tokeniser with our own implementation

Correct.

> 
> 2)   A NEAR query operator  so we can do things like 'foo NEAR10 bar'
> which will bring back all documents that have bar within 10 words of foo
> (either direction).  This is the one that I'm really not sure on and having
> looked at the code don't really have a clue where to start.
> 

A NEAR operator is just a generalization of a phrase
search.  A phrase search is when you put two keywords in
doublequotes:   '"foo bar"'  FTS looks for documents that
contain the words foo and bar such that bar occurs immediately
after foo.  FTS records the index of each word in each document,
so what phrase search is really doing is looking for instances
of foo and bar where the index of bar is exactly one more than the
index of foo.  To implement NEAR10 you just have to look for
instances of bar with an index that is not more than 10 different
from the index on foo.  Not such a big change, really.  The
hard part will be parsing out the NEAR10 operator.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-