Greetings All, I use the Full Text Search facilities extensively. The feature is intuitive and works beautifully and super fast even with 100GB+ databases. I must applaud the SQLITE dev team on such amazing work.
I have two questions with regards to FTS. 1) I'm going out on a limb here: As some of you may know, there is a undocumented, and not officially supported feature/bug where you can specify your own delimiters using the "simple" tokenizer: http://article.gmane.org/gmane.comp.db.sqlite.general/74199 for example to create an table where tokens are split by an exclamation point and carat, one would execute: create virtual table TEST using FTS4(title, body, tokenize=simple "" "!^"); Note the first argument is ignored, the second argument is where the delimiters should be specified in quotes. This is very, very convenient and I've used it extensively without issue (I hope the feature never gets removed). The question here is, how can one specify non printable characters within the sqlite3 command shell. Often, delimiters are non-printable characters such as NUL (0x00) or SOH (0x01) or even the newline character. If I wanted to specify either one of those in my delimiter string parameter above, how could it successfully be done? I've tried using hex literals (see https://www.sqlite.org/lang_expr.html -- literal values (Constants) section) but it appears the interpretations are not respected and in fact, each character specified became interpreted as a separate delimiter: Below I try to use the character 'a' (but specified as hex) sqlite> CREATE VIRTUAL TABLE ft USING fts4(body, tokenize=simple "" '0x61'); sqlite> insert into ft values('ct6axps0189'); sqlite> --lets check the indexing sqlite> CREATE VIRTUAL TABLE ft_terms using fts4aux(ft); sqlite> select * from ft_terms; 89|*|1|1 89|0|1|1 a|*|1|1 a|0|1|1 ct|*|1|1 ct|0|1|1 ps|*|1|1 ps|0|1|1 And the results above have split my data by tokens 0, x 6, 1 instead of 'a'; If I remove the surrounding quotes, the interpreter throws an error: sqlite> CREATE VIRTUAL TABLE ft2 USING fts4(body, tokenize=simple "" 0x61); Error: unrecognized token: "0x61" Keep in mind, the whole purpose of this question is how to specify non-printable characters at the interpreter. However, I used the character 'a' (specified in hex') to clarify the example since 'a' is a visible character. 2) The colon ':' is used in the FTS engine as a column specifier: select * from FTS_TABLE where COL:foo MATCH "test123". This means show me any rows that have test123 in the COL column only (and not any other column). Is there a way to adjust/change this column delimiter during table setup or is this a hard-coded value? I ask because sometimes, I need to be able to perform searches with the colon character as part of a string. For example, when searching on NETWORK MAC addresses. At the same time, I don't want to lose the ability to utilize column specifiers. Thanks in advance for any help.