Philip Bennefall wrote:
> I hate to be cluttering up the list in this fashion, but I have come across 
> an issue that I cannot seem to find a solution for.
> 
> I am using two fts tables, one that uses the normal tokenizer and another 
> that uses the porter stemmer, so that I can search the same dataset with and 
> without porter. For the porter stemmer table, I have set the content option 
> to point to the other fts table. Like this:
> 
> CREATE VIRTUAL TABLE main_brain USING fts4(phrase, response, matchinfo=fts3, 
> tokenize=simple, order=desc);
> 
> CREATE VIRTUAL TABLE porter_brain USING fts4(content="main_brain", phrase, 
> response, matchinfo=fts3, tokenize=porter, order=desc);
> 
> Then I do the following:
> 
> INSERT INTO main_brain(phrase, response) VALUES('hello', 'Hi there!');
> 
> So you'd now think that main_brain should have this content in it, but the 
> porter_brain table should be empty. At least, that is what the documentation 
> on fts4 seems to indicate as it points out that it is my own responsibility 
> to make sure the tables are in sync. But:
> 
> SELECT * FROM porter_brain;
> 
> Produces:
> 
> hello|Hi there!

As porter_brain takes all content from main_brain, full scan result looks
"correct" ("SELECT FROM porter_brain" internally replaced with "SELECT FROM
main_brain"). But before you execute INSERT below, all fts indexes are missing,
so any request that utilize them, like

SELECT * FROM porter_brain WHERE phrase MATCH 'hello';

will fail.

> And:
> 
> SELECT * FROM main_brain;
> 
> Gives the same result. So it seems as though some internal synchronization 
> between these tables is going on. How should I handle this? The documentation
>  suggests a statement like the following in one of its examples:
> 
> INSERT INTO t3(docid, b, c) SELECT id, b, c FROM t2;

> Where I guess t3 would correspond to porter_brain and t2 would be main_brain 
> in my case. But I don't seem to need to do this at all.

INSERT INTO porter_brain (docid, phrase, response)
                   SELECT docid, phrase, response FROM main_brain;

Before you execute this insert statement, porter_brain state is *inconsistent*
and it cannot be used properly.

[FWIW, most natural place for this insert would be AFTER INSERT trigger ... but
as main_brain is VIRTUAL TABLE, it is impossible; you can consider using
external content table for both virtual tables with trigger:

CREATE TABLE tblContent(phrase TEXT, response TEXT);
CREATE VIRTUAL TABLE main_brain USING fts4(content="tblContent", phrase,
response, matchinfo=fts3, tokenize=simple, order=desc);
CREATE VIRTUAL TABLE porter_brain USING fts4(content="tblContent", phrase,
response, matchinfo=fts3, tokenize=porter, order=desc);
CREATE TRIGGER trgSync AFTER INSERT ON tblContent FOR EACH ROW
 BEGIN
      INSERT INTO main_brain (docid, phrase, response)
           VALUES (NEW.rowid, NEW.phrase, NEW.response);
      INSERT INTO porter_brain (docid, phrase, response)
           VALUES (NEW.rowid, NEW.phrase, NEW.response);
   END;
].

> Can anyone throw some light on this?
> 
> Thanks in advance.

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to