Re: [sqlite] queries for a fulltext-engine

2009-05-25 Thread Igor Tandetnik
"Lukas Haase"  wrote in
message news:gve1lh$30...@ger.gmane.org
>> SELECT topic_fulltext.topicID FROM topic_fulltext
>> where exists (select 1 from fulltext
>> WHERE topic_fulltext.fulltextID = fulltext.fulltextID and word
>> LIKE 'word1%')
>> and exists (select 1 from fulltext
>> WHERE topic_fulltext.fulltextID = fulltext.fulltextID and word
>> LIKE 'word2%')
>
> Unfortunately this does not work completely :(
>
> In the first WHERE clause I restrict to entried containing only the
> "word1" (the resultset will contain only entries with "word1"). So the
> second WHERE clause will always fail as there are no rows with
> "word2" left.

Right. I was thinking about a third table, topics, that lists all topics 
(and likely additional information about them), so that you have a 
classic many-to-many relationship. I suspect you have one. In this case 
you can do

SELECT topics.topicID FROM topics
where exists (select 1 from topic_fulltext join fulltext on 
(topic_fulltext.fulltextID = fulltext.fulltextID)
WHERE topic_fulltext.topicID = topics.topicID and word LIKE 
'word1%')
and exists (select 1 from topic_fulltext join fulltext on 
(topic_fulltext.fulltextID = fulltext.fulltextID)
WHERE topic_fulltext.topicID = topics.topicID and word LIKE 
'word2%');

If for some strange reason you don't have topics table, then you can do

SELECT distinct tf1.topicID from topic_fulltext tf1
where exists (select 1 from topic_fulltext tf2 join fulltext on 
(tf2.fulltextID = fulltext.fulltextID)
WHERE tf1.topicID = tf2.topicID and word LIKE 'word1%')
and exists (select 1 from topic_fulltext tf2 join fulltext on 
(tf2.fulltextID = fulltext.fulltextID)
WHERE tf1.topicID = tf2.topicID and word LIKE 'word2%')

Basically, (select distinct topicID from topic_fulltext) plays the role 
of topics table.

>> SELECT topic_fulltext.topicID FROM topic_fulltext
>> where fulltextID in (
>> select fulltextID from topic_fulltext where word LIKE 'word1%'
>> intersect
>> select fulltextID from topic_fulltext where word LIKE 'word2%');
>
> Unfortunately this does not work either.

For the same reason. Make it

select topicID from topic_fulltext join fulltext
  on ( topic_fulltext.fulltextID=fulltext.fulltextID)
  where word LIKE 'word1%'
intersect
select topicID from topic_fulltext join fulltext
  on ( topic_fulltext.fulltextID=fulltext.fulltextID)
  where word LIKE 'word2%'

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] queries for a fulltext-engine

2009-05-25 Thread Lukas Haase
Igor Tandetnik schrieb:
> "Lukas Haase"  wrote in
> message news:gv9fcm$5r...@ger.gmane.org
>> I have a database containing thousands of HTML pages ("topics"). There
>> is a fulltext index for these topics. First there is a table
>> containing all single words. Each word is identified by its
>> "fulltextID":
>>
>> CREATE TABLE fulltext(
>> fulltextID INTEGER PRIMARY KEY,
>> word VARCHAR(100) COLLATE NOCASE
>> );
>>
>> Now there is a linking table between the words and the HTML pages
>> (topics):
>>
>> CREATE TABLE topic_fulltext(
>> topicID INTEGER,
>> fulltextID INTEGER,
>> PRIMARY KEY(topicID, fulltextID)
>> );
>>
>> Finding a topic containing a specific word is not too hard:
>>
>> SELECT topic_fulltext.topicID
>> FROM fulltext
>> JOIN topic_fulltext ON topic_fulltext.fulltextID = fulltext.fulltextID
>> WHERE word LIKE 'Word%';
>>
>> But now I want to be able to search with more complex queries. For
>> example:
>>
>> * List all topics containing (word1 AND word2)
> 
> You could do something like this:

Oh, thank you, this seems more like what I am looking for :-) :-)

> SELECT topic_fulltext.topicID FROM topic_fulltext
> where exists (select 1 from fulltext
> WHERE topic_fulltext.fulltextID = fulltext.fulltextID and word LIKE 
> 'word1%')
> and exists (select 1 from fulltext
> WHERE topic_fulltext.fulltextID = fulltext.fulltextID and word LIKE 
> 'word2%')

Unfortunately this does not work completely :(

In the first WHERE clause I restrict to entried containing only the 
"word1" (the resultset will contain only entries with "word1"). So the 
second WHERE clause will always fail as there are no rows with "word2" left.

This seems to work only for my OR-requirement (topics containg either 
word1 OR word2) when replacing AND with OR.

> SELECT topic_fulltext.topicID FROM topic_fulltext
> where fulltextID in (
> select fulltextID from topic_fulltext where word LIKE 'word1%'
> intersect
> select fulltextID from topic_fulltext where word LIKE 'word2%');

Unfortunately this does not work either.

The set is taken from the fullwords, i.e. the result of the inner SELECT 
clause will contain fulltextIDs. And they will obviosly never intersect.

So I have the same problem as above: Replacing with UNION ALL yields my 
OR-requirement but I can't get working it with AND :-(

> Test it, see which one works faster.

The second one seems to be much faster. Though it's too slow (3s or so), 
but I hope I can tune up the query on the end...

> [...]
>> * List all topics containing (word1 AND word2 AND ... AND word10)
>> * List all topics containing ((word1 OR word2) AND word3 OR word3)
> 
> The approach above should work for any boolean combination.

I have forgotten one third type: the NOT. E.g.:

* List all topics containing (NOT(word1 OR word2) AND word3)

But this is no hard requirement, but boolean AND and OR combinations are.

Thank you for your approaches, I tried to get the AND and OR working 
with it but I still do not figure it out :-(

By the way: If there is a better way to organize the index in the 
database: This would be no problem if the queries will get simpler and 
faster. (As long as the memeory requirement stays approx. the same)

Thank you again Igor,
Luke

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] queries for a fulltext-engine

2009-05-24 Thread John Machin
On 24/05/2009 4:30 AM, Lukas Haase wrote:
> Hi,
> 
> Sorry for the subject - I just do not know for what to search of what to 
> ask - I actually do not know where is exactly my problem :-(
> 
> I have a database containing thousands of HTML pages ("topics"). There 
> is a fulltext index for these topics. First there is a table containing 
> all single words. Each word is identified by its "fulltextID":
> 
> CREATE TABLE fulltext(
>   fulltextID INTEGER PRIMARY KEY,
>   word VARCHAR(100) COLLATE NOCASE
> );
> 
> Now there is a linking table between the words and the HTML pages (topics):
> 
> CREATE TABLE topic_fulltext(
>   topicID INTEGER,
>   fulltextID INTEGER,
>   PRIMARY KEY(topicID, fulltextID)
> );

Call me crazy, but shouldn't the two components of the PK be reversed?

> Finding a topic containing a specific word is not too hard:
> 

Just put EXPLAIN QUERY PLAN in here ...

> SELECT topic_fulltext.topicID
> FROM fulltext
> JOIN topic_fulltext ON topic_fulltext.fulltextID = fulltext.fulltextID
> WHERE word LIKE 'Word%';

and you will see:
0|0|TABLE fulltext
1|1|TABLE topic_fulltext

but if you change the order of the columns in the PK you will see:
0|0|TABLE fulltext
1|1|TABLE topic_fulltext WITH INDEX sqlite_autoindex_topic_fulltext_1

which I'd interpret as a significant improvement.

Second suggestion: Although you may be a careful programmer and quite 
sure that you will never have duplicate or null words in your fulltext 
table, let's just go into nanny mode and tack UNIQUE NOT NULL onto the 
definition of the "word" column ... with this result:

0|0|TABLE fulltext WITH INDEX sqlite_autoindex_fulltext_1
1|1|TABLE topic_fulltext WITH INDEX sqlite_autoindex_topic_fulltext_1

Crazy ** 2 :-)

Third suggestion: read up carefully what has been written about when 
LIKE will use an index.

> 
> But now I want to be able to search with more complex queries. For example:
> 
> * List all topics containing (word1 AND word2)
> * List all topics containing (word1 OR word2)
> * List all topics containing (word1 AND word2 AND ... AND word10)
> * List all topics containing ((word1 OR word2) AND word3 OR word3)
> * ...
> 
> But now I have no clue how to accomplish this query. I just can't extend 
> my single-word query from above :-(
> 
> Can anyone give me a hint?

This problem has been addressed more than once over the last 30 or so 
years ... I'd be very surprised to find that any serious implementation 
used SQL queries for 100% of the work.

HTH,
John
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] queries for a fulltext-engine

2009-05-24 Thread Simon Slavin

On 24 May 2009, at 1:28pm, Lukas Haase wrote:

> Simon Slavin schrieb:
>> On 23 May 2009, at 7:30pm, Lukas Haase wrote:
>>> SELECT topic_fulltext.topicID
>>> FROM fulltext
>>> JOIN topic_fulltext ON topic_fulltext.fulltextID =  
>>> fulltext.fulltextID
>>> WHERE word LIKE 'Word%';
>>
>> WHERE (word LIKE 'Word1%') AND (word LIKE 'Word2%')
>> WHERE (word LIKE 'Word1%') OR (word LIKE 'Word2%')
>
> No, this is unfortunately not the case :-( I know the usage of LIKE, %
> and =.
>
> But the problem is here that
>
> SELECT topic_fulltext.topicID
> FROM fulltext
> JOIN topic_fulltext ON topic_fulltext.fulltextID = fulltext.fulltextID
> WHERE (word LIKE 'Word1%') AND (word LIKE 'Word2%');
>
> would give me no results.

Oh, I get it.  You have indexed your documents and have a many-to-many  
relationship which is something like

column 1 = word
column 2 = document number

So one word appears in many documents and one document contains many  
words.  That makes sense.

In the case it looks like the best way is to build up a TEMPORARY  
table something like

column 1 = document

using INSERT OR IGNORE, parsing your search requirements (AND, OR,  
NOT, etc.) yourself and turning them into successive INSERT and DELETE  
commands.  So an 'OR' relationship would turn into two INSERT  
commands, whereas an 'AND' relationship would be an INSERT and then a  
DELETE for all cases where the document didn't contain the second  
word.  If you want to allow bracketing in your search strings the  
programming can be complicated.  If you don't, and just work from the  
left to the right of your search strings it should be simple.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] queries for a fulltext-engine

2009-05-24 Thread Emil Obermayr
On Sun, May 24, 2009 at 02:28:36PM +0200, Lukas Haase wrote:
> 
> SELECT topic_fulltext.topicID
> FROM fulltext
> JOIN topic_fulltext ON topic_fulltext.fulltextID = fulltext.fulltextID
> WHERE (word LIKE 'Word1%') AND (word LIKE 'Word2%');

Obviously this must be empty because a string can't start with "Word1"
and start with "Word2" at the same time.

Maybe you should try % on both ends of the search string.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] queries for a fulltext-engine

2009-05-24 Thread Igor Tandetnik
"Lukas Haase"  wrote in
message news:gv9fcm$5r...@ger.gmane.org
> I have a database containing thousands of HTML pages ("topics"). There
> is a fulltext index for these topics. First there is a table
> containing all single words. Each word is identified by its
> "fulltextID":
>
> CREATE TABLE fulltext(
> fulltextID INTEGER PRIMARY KEY,
> word VARCHAR(100) COLLATE NOCASE
> );
>
> Now there is a linking table between the words and the HTML pages
> (topics):
>
> CREATE TABLE topic_fulltext(
> topicID INTEGER,
> fulltextID INTEGER,
> PRIMARY KEY(topicID, fulltextID)
> );
>
> Finding a topic containing a specific word is not too hard:
>
> SELECT topic_fulltext.topicID
> FROM fulltext
> JOIN topic_fulltext ON topic_fulltext.fulltextID = fulltext.fulltextID
> WHERE word LIKE 'Word%';
>
> But now I want to be able to search with more complex queries. For
> example:
>
> * List all topics containing (word1 AND word2)

You could do something like this:

SELECT topic_fulltext.topicID FROM topic_fulltext
where exists (select 1 from fulltext
WHERE topic_fulltext.fulltextID = fulltext.fulltextID and word LIKE 
'word1%')
and exists (select 1 from fulltext
WHERE topic_fulltext.fulltextID = fulltext.fulltextID and word LIKE 
'word2%')

-- or

SELECT topic_fulltext.topicID FROM topic_fulltext
where fulltextID in (
select fulltextID from topic_fulltext where word LIKE 'word1%'
intersect
select fulltextID from topic_fulltext where word LIKE 'word2%');

Test it, see which one works faster.

> * List all topics containing (word1 OR word2)

Similar to above, but replace AND with OR, and INTERSECT with UNION ALL.

> * List all topics containing (word1 AND word2 AND ... AND word10)
> * List all topics containing ((word1 OR word2) AND word3 OR word3)

The approach above should work for any boolean combination.

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] queries for a fulltext-engine

2009-05-24 Thread Lukas Haase
Hi,

Igor Tandetnik schrieb:
> "Lukas Haase"  wrote in
> message news:gv9fcm$5r...@ger.gmane.org
>> I have a database containing thousands of HTML pages ("topics"). There
>> is a fulltext index for these topics. First there is a table
>> containing all single words. Each word is identified by its
>> "fulltextID":
> 
> You seem to be reinventing FTS:
> 
> http://www.sqlite.org/cvstrac/wiki?p=FtsUsage

Thank you, but unfortunately not.

First, there are 2 topics, second, these topics are HTML code 
fragments and third, they are all compressed externally with gzip for 
saving memory.

For that reason I built this index, listing all words for a
specific topic...

Luke

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] queries for a fulltext-engine

2009-05-24 Thread Lukas Haase
Hi,

At first, thank you for your answer!

Simon Slavin schrieb:
> On 23 May 2009, at 7:30pm, Lukas Haase wrote:
>> SELECT topic_fulltext.topicID
>> FROM fulltext
>> JOIN topic_fulltext ON topic_fulltext.fulltextID = fulltext.fulltextID
>> WHERE word LIKE 'Word%';
>>
>> But now I want to be able to search with more complex queries. For  
>> example:
>>
>> * List all topics containing (word1 AND word2)
>> * List all topics containing (word1 OR word2)
> 
> It works perfectly to do things like
> 
> WHERE (word LIKE 'Word1%') AND (word LIKE 'Word2%')
> WHERE (word LIKE 'Word1%') OR (word LIKE 'Word2%')

No, this is unfortunately not the case :-( I know the usage of LIKE, % 
and =.

But the problem is here that

SELECT topic_fulltext.topicID
FROM fulltext
JOIN topic_fulltext ON topic_fulltext.fulltextID = fulltext.fulltextID
WHERE (word LIKE 'Word1%') AND (word LIKE 'Word2%');

would give me no results.

And the one statement with

WHERE (word LIKE 'Word1%') OR (word LIKE 'Word2%');

would work...

And, I have forgotten some other search cirteria:

* List all topics containing (word1 AND NOT word2)

> [...]
>> * List all topics containing (word1 AND word2 AND ... AND word10)
>> * List all topics containing ((word1 OR word2) AND word3 OR word3)
> 
> You could write something to transform those into the format I showed  
> above.  And you could get the results directly using SELECT or use  
> CREATE VIEW to reflect them.

This indeed no problem but the query does not work. The problem is that 
I have a list of words and then a table which topic contains which 
words. So if I want to know which *topics* are linked with

* word1 OR word2
* word1 AND word2
* word1 AND NOT word2

I need a completely different query.

> But there are other ways to do it that might be more efficient (i.e.  
> faster) or simpler to program and debug.  For instance, if you have a  
> chain of conditions you could CREATE TEMPORARY a table to then use it  
> to accumulate (OR) or eliminate (AND) the pages you want using the  
> form of INSERT that takes a SELECT argument.  Depending on the size  
> and shape of your database this may or may not be faster.

Yes, the DB is very huge. There are 20k topics and the fulltext and 
topic_fulltext tables are approx. 50MB.

> [...]
> Simon.

Luke

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] queries for a fulltext-engine

2009-05-23 Thread Igor Tandetnik
"Lukas Haase"  wrote in
message news:gv9fcm$5r...@ger.gmane.org
> I have a database containing thousands of HTML pages ("topics"). There
> is a fulltext index for these topics. First there is a table
> containing all single words. Each word is identified by its
> "fulltextID":

You seem to be reinventing FTS:

http://www.sqlite.org/cvstrac/wiki?p=FtsUsage

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] queries for a fulltext-engine

2009-05-23 Thread Simon Slavin

On 23 May 2009, at 7:30pm, Lukas Haase wrote:

> SELECT topic_fulltext.topicID
> FROM fulltext
> JOIN topic_fulltext ON topic_fulltext.fulltextID = fulltext.fulltextID
> WHERE word LIKE 'Word%';
>
> But now I want to be able to search with more complex queries. For  
> example:
>
> * List all topics containing (word1 AND word2)
> * List all topics containing (word1 OR word2)

It works perfectly to do things like

WHERE (word LIKE 'Word1%') AND (word LIKE 'Word2%')
WHERE (word LIKE 'Word1%') OR (word LIKE 'Word2%')

But chaining too many of these together might make for slow searches.   
I don't know how good sqlite is at doing this stuff.

> * List all topics containing (word1 AND word2 AND ... AND word10)
> * List all topics containing ((word1 OR word2) AND word3 OR word3)

You could write something to transform those into the format I showed  
above.  And you could get the results directly using SELECT or use  
CREATE VIEW to reflect them.

But there are other ways to do it that might be more efficient (i.e.  
faster) or simpler to program and debug.  For instance, if you have a  
chain of conditions you could CREATE TEMPORARY a table to then use it  
to accumulate (OR) or eliminate (AND) the pages you want using the  
form of INSERT that takes a SELECT argument.  Depending on the size  
and shape of your database this may or may not be faster.

What you've hit is the standard problem of flexible search fields and  
doing it well is one reason Google is the success it is.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] queries for a fulltext-engine

2009-05-23 Thread Lukas Haase
Hi,

Sorry for the subject - I just do not know for what to search of what to 
ask - I actually do not know where is exactly my problem :-(

I have a database containing thousands of HTML pages ("topics"). There 
is a fulltext index for these topics. First there is a table containing 
all single words. Each word is identified by its "fulltextID":

CREATE TABLE fulltext(
fulltextID INTEGER PRIMARY KEY,
word VARCHAR(100) COLLATE NOCASE
);

Now there is a linking table between the words and the HTML pages (topics):

CREATE TABLE topic_fulltext(
topicID INTEGER,
fulltextID INTEGER,
PRIMARY KEY(topicID, fulltextID)
);

Finding a topic containing a specific word is not too hard:

SELECT topic_fulltext.topicID
FROM fulltext
JOIN topic_fulltext ON topic_fulltext.fulltextID = fulltext.fulltextID
WHERE word LIKE 'Word%';

But now I want to be able to search with more complex queries. For example:

* List all topics containing (word1 AND word2)
* List all topics containing (word1 OR word2)
* List all topics containing (word1 AND word2 AND ... AND word10)
* List all topics containing ((word1 OR word2) AND word3 OR word3)
* ...

But now I have no clue how to accomplish this query. I just can't extend 
my single-word query from above :-(

Can anyone give me a hint?

Thank you very much,
Luke

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users