Re: [sqlite] FTS5: how to find special character sequence containing '.' and '"'

2019-11-01 Thread mailing lists
Hi sorry,

just forget it. I have overseen a typo.

> Am 2019-11-01 um 15:00 schrieb mailing lists :
> 
> Assume the following statements
> 
> CREATE VIRTUAL TABLE NamesFTS USING FTS5 (LastName);
> INSERT INTO NamesFTS (LastName) VALUES('L.');
> INSERT INTO NamesFTS (LastName) VALUES('"L."');
> 
> SELECT rowid FROM NamesFTS;
>> 1
>> 2
> 
> SELECT rowid FROM NamesFTS WHERE NamesFTS MATCH '"L."';
>> 1
>> 2
> 
> How do I get only the record with rowid 2 using a MATCH filter statement? How 
> to escape '"'?
> 
> Regards,
> Hartwig
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5: how to find special character sequence containing '.' and '"'

2019-11-01 Thread mailing lists
Assume the following statements

CREATE VIRTUAL TABLE NamesFTS USING FTS5 (LastName);
INSERT INTO NamesFTS (LastName) VALUES('L.');
INSERT INTO NamesFTS (LastName) VALUES('"L."');

SELECT rowid FROM NamesFTS;
> 1
> 2

SELECT rowid FROM NamesFTS WHERE NamesFTS MATCH '"L."';
> 1
> 2

How do I get only the record with rowid 2 using a MATCH filter statement? How 
to escape '"'?

Regards,
Hartwig

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [FTS5] Potential table name escape issue with ORDER BY rank

2019-08-20 Thread Dan Kennedy


On 21/8/62 01:06, Matt Haynie wrote:

Hello sqlite-users,

Apologies if this isn’t formatted correctly, I’m not used to submitting bug 
reports via mailing lists.

Although I’m sure some people will be shaking their head, I chose to use 
periods between words in table names. I’ve been careful to escape table names 
everywhere, so this has worked out fine for the most part. However, there is an 
issue when attempting to search an FTS5 table with ORDER BY rank. From the 
below example:

SELECT * FROM "My.Table" WHERE Text MATCH 'table' ORDER BY rank;  -- BUG: near 
"Table": syntax error



Thanks for the thorough bug report. Now fixed here:

  https://sqlite.org/src/info/00e9a8f2730eb723

Dan.




As the comment indicates, this produces the message “near ‘Table’: syntax 
error”. My armchair debugging skills are telling me that it seems to be an 
issue with the table name not being properly escaped. It’s my understanding 
that using ORDER BY bm25(`My.Table`) should be functionally identical to ORDER 
BY rank:

SELECT * FROM "My.Table" WHERE Text MATCH 'table' ORDER BY bm25(`My.Table`);  
-- Works fine



Full example below (Ctrl+F “BUG” for the line that produces the error)

-- Create table
CREATE VIRTUAL TABLE "My.Table" USING fts5(Text);

-- Insert some data
INSERT INTO "My.Table" VALUES ('hello this is a test');
INSERT INTO "My.Table" VALUES ('of trying to order by');
INSERT INTO "My.Table" VALUES ('rank on an fts5 table');
INSERT INTO "My.Table" VALUES ('that have periods in');
INSERT INTO "My.Table" VALUES ('the table names.');
INSERT INTO "My.Table" VALUES ('table table table');

-- Search FTS table - works fine
SELECT * FROM "My.Table" WHERE Text MATCH 'table';

-- Search FTS table with ordering
SELECT * FROM "My.Table" WHERE Text MATCH 'table' ORDER BY bm25(`My.Table`);  
-- Works fine
SELECT * FROM "My.Table" WHERE Text MATCH 'table' ORDER BY rank;  -- BUG: near 
"Table": syntax error

-- Change the table name to remove the period
ALTER TABLE "My.Table" RENAME TO "My_Table";

-- Search FTS table - all of these work perfectly now
SELECT * FROM "My_Table" WHERE Text MATCH 'table';
SELECT * FROM "My_Table" WHERE Text MATCH 'table' ORDER BY rank;
SELECT * FROM "My_Table" WHERE Text MATCH 'table' ORDER BY bm25(`My_Table`);

Thanks,
Matt

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] [FTS5] Potential table name escape issue with ORDER BY rank

2019-08-20 Thread Matt Haynie
Hello sqlite-users,

Apologies if this isn’t formatted correctly, I’m not used to submitting bug 
reports via mailing lists.

Although I’m sure some people will be shaking their head, I chose to use 
periods between words in table names. I’ve been careful to escape table names 
everywhere, so this has worked out fine for the most part. However, there is an 
issue when attempting to search an FTS5 table with ORDER BY rank. From the 
below example:

SELECT * FROM "My.Table" WHERE Text MATCH 'table' ORDER BY rank;  -- BUG: near 
"Table": syntax error

As the comment indicates, this produces the message “near ‘Table’: syntax 
error”. My armchair debugging skills are telling me that it seems to be an 
issue with the table name not being properly escaped. It’s my understanding 
that using ORDER BY bm25(`My.Table`) should be functionally identical to ORDER 
BY rank:

SELECT * FROM "My.Table" WHERE Text MATCH 'table' ORDER BY bm25(`My.Table`);  
-- Works fine



Full example below (Ctrl+F “BUG” for the line that produces the error)

-- Create table
CREATE VIRTUAL TABLE "My.Table" USING fts5(Text);

-- Insert some data
INSERT INTO "My.Table" VALUES ('hello this is a test');
INSERT INTO "My.Table" VALUES ('of trying to order by');
INSERT INTO "My.Table" VALUES ('rank on an fts5 table');
INSERT INTO "My.Table" VALUES ('that have periods in');
INSERT INTO "My.Table" VALUES ('the table names.');
INSERT INTO "My.Table" VALUES ('table table table');

-- Search FTS table - works fine
SELECT * FROM "My.Table" WHERE Text MATCH 'table';

-- Search FTS table with ordering
SELECT * FROM "My.Table" WHERE Text MATCH 'table' ORDER BY bm25(`My.Table`);  
-- Works fine
SELECT * FROM "My.Table" WHERE Text MATCH 'table' ORDER BY rank;  -- BUG: near 
"Table": syntax error

-- Change the table name to remove the period
ALTER TABLE "My.Table" RENAME TO "My_Table";

-- Search FTS table - all of these work perfectly now
SELECT * FROM "My_Table" WHERE Text MATCH 'table';
SELECT * FROM "My_Table" WHERE Text MATCH 'table' ORDER BY rank;
SELECT * FROM "My_Table" WHERE Text MATCH 'table' ORDER BY bm25(`My_Table`);

Thanks,
Matt

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 Transaction Leads to NULL Pointer

2019-03-31 Thread Philip Newton
On Tue, 19 Mar 2019 at 07:35, Dan Kennedy  wrote:
>
> Now fixed here:
>
>https://sqlite.org/src/info/45c73deb440496e8

From that diff, it seems that you changed the documentation of the
function's parameters where the function was defined (see fts5_hash.c
line 489 at the right) but not where the prototype was declared (see
fts5Int.h line 588 at the right, which still talks about a "pointer to
doclist").
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 Transaction Leads to OOB Read

2019-03-19 Thread Chu
YES. AFL with ASAN.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 Transaction Leads to OOB Read

2019-03-19 Thread Dan Kennedy


That's the error I got too. Now fixed here:

  https://sqlite.org/src/info/b3fa58dd7403dbd4

Dan.



On 19/3/62 04:22, Dominique Pellé wrote:

Chu  wrote:


The code:

```
CREATE VIRTUAL TABLE t1 USING fts5(content);

BEGIN;
 INSERT INTO t1 (content) VALUES('');
 SELECT * FROM 
t1('*');
END;
```

As you can see, it creates a virtual table with fts5, and run a transaction on 
it, this will leads to a OOB READ. The ASAN report:

```
➜  sqlite-crashes ../sqlite-autoconf-3270200/sqlite3 < 2-oob-read.sql
=
==21007==ERROR: AddressSanitizer: heap-buffer-overflow on address 
0x60d02898 at pc 0x7f0cad16e6a3 bp 0x7ffdc88ddc80 sp 0x7ffdc88dd430
READ of size 81 at 0x60d02898 thread T0
 #0 0x7f0cad16e6a2  (/lib/x86_64-linux-gnu/libasan.so.5+0xb86a2)
 #1 0x563324ca4013 in fts5HashEntrySort 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:207762
 #2 0x563324e685f9 in sqlite3Fts5HashScanInit 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:207820
 #3 0x563324e685f9 in fts5SegIterHashInit 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210321

...snip..

Just to confirm that there is a bug when running your
queries in the SQLite-3.27.2 shell with valgrind.

I did not get a heap overflow, but valgrind complains
about uninitialized memory in the same fts5HashEntrySort
function as in your stack:

$ valgrind --track-origins=yes --num-callers=50 sqlite3_shell
==10856== Memcheck, a memory error detector
==10856== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==10856== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==10856== Command: ./Output/Binary/x86_64-Linux-clang/Debug/bin/sqlite3_shell
==10856==
SQLite version 3.27.2 2019-02-25 16:06:06
NDSeV devkit 3.27.2.1 2019-02-26 16:04:39 990c4f90c3340db5
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE VIRTUAL TABLE t1 USING fts5(content);
sqlite> BEGIN;
sqlite> INSERT INTO t1 (content) VALUES('');
sqlite> SELECT * FROM
t1('*');
==10856== Conditional jump or move depends on uninitialised value(s)
==10856==at 0x4C362B2: __memcmp_sse4_1 (vg_replace_strmem.c:)
==10856==by 0x4ECC86: fts5HashEntrySort (nds_sqlite3.c:207811)
==10856==by 0x4EC55D: sqlite3Fts5HashScanInit (nds_sqlite3.c:207869)
==10856==by 0x4EBB45: fts5SegIterHashInit (nds_sqlite3.c:210370)
==10856==by 0x4EB1BE: fts5MultiIterNew (nds_sqlite3.c:211319)
==10856==by 0x4EB5A8: fts5SetupPrefixIter (nds_sqlite3.c:212995)
==10856==by 0x4EAE4A: sqlite3Fts5IndexQuery (nds_sqlite3.c:213324)
==10856==by 0x4EAA61: fts5ExprNearInitAll (nds_sqlite3.c:205310)
==10856==by 0x4EA6F3: fts5ExprNodeFirst (nds_sqlite3.c:205827)
==10856==by 0x4EA5B4: sqlite3Fts5ExprFirst (nds_sqlite3.c:205885)
==10856==by 0x4E9D1A: fts5CursorFirst (nds_sqlite3.c:215420)
==10856==by 0x4E2DD3: fts5FilterMethod (nds_sqlite3.c:215702)
==10856==by 0x4672A7: sqlite3VdbeExec (nds_sqlite3.c:90382)
==10856==by 0x42F876: sqlite3Step (nds_sqlite3.c:81765)
==10856==by 0x42F51C: sqlite3_step (nds_sqlite3.c:81830)
==10856==by 0x4236B8: exec_prepared_stmt (shell.c:10469)
==10856==by 0x4104EB: shell_exec (shell.c:10776)
==10856==by 0x42414E: runOneSqlLine (shell.c:16136)
==10856==by 0x410C5A: process_input (shell.c:16236)
==10856==by 0x40752D: main (shell.c:16995)
==10856==  Uninitialised value was created by a heap allocation
==10856==at 0x4C2FE56: malloc (vg_replace_malloc.c:299)
==10856==by 0x4E0098: sqlite3MemMalloc (nds_sqlite3.c:22886)
==10856==by 0x441ED0: mallocWithAlarm (nds_sqlite3.c:26718)
==10856==by 0x425E0B: sqlite3Malloc (nds_sqlite3.c:26748)
==10856==by 0x425E7F: sqlite3_malloc64 (nds_sqlite3.c:26772)
==10856==by 0x4FE464: sqlite3Fts5HashWrite (nds_sqlite3.c:207636)
==10856==by 0x4FE1F0: sqlite3Fts5IndexWrite (nds_sqlite3.c:213247)
==10856==by 0x4FACB9: fts5StorageInsertCallback (nds_sqlite3.c:217629)
==10856==by 0x505B04: fts5UnicodeTokenize (nds_sqlite3.c:218923)
==10856==by 0x4F623D: sqlite3Fts5Tokenize (nds_sqlite3.c:204268)
==10856==by 0x4FA0DD: sqlite3Fts5StorageIndexInsert (nds_sqlite3.c:217984)
==10856==by 0x4F9E44: fts5StorageInsert (nds_sqlite3.c:215929)
==10856==by 0x4E359D: fts5UpdateMethod (nds_sqlite3.c:216036)
==10856==by 0x4677E6: sqlite3VdbeExec (nds_sqlite3.c:90593)
==10856==by 0x42F876: sqlite3Step (nds_sqlite3.c:81765)
==10856==by 0x42F51C: sqlite3_step (nds_sqlite3.c:81830)
==10856==by 0x4236B8: exec_prepared_stmt (shell.c:10469)
==10856==by 0x4104EB: shell_exec (shell.c:10776)
==10856==by 0x42414E: runOneSqlLine (shell.c:16136)
==10856==by 0x410C5A: 

Re: [sqlite] FTS5 Transaction Leads to NULL Pointer

2019-03-19 Thread Dan Kennedy


On 18/3/62 15:48, Chu wrote:

The code:

```
CREATE VIRTUAL TABLE t1 USING fts5(content);

INSERT INTO t1 VALUES('');

BEGIN ;
DELETE FROM t1 WHERE rowid = 1;
SELECT * FROM t1 WHERE content MATCH '';
INSERT INTO t1 VALUES('');
SELECT * FROM t1 WHERE content MATCH '';
END;
``



Thanks very much for isolating and reporting this problem, and the other 
one. Now fixed here:


  https://sqlite.org/src/info/45c73deb440496e8

Dan.




As you can see, it creates a virtual table with fts5, and run a transaction on 
it, this will leads to a crash because of null pointer. The ASAN report:

```
➜  sqlite-crashes ../sqlite-autoconf-3270200/sqlite3 < 1-null-pointer.sql
AddressSanitizer:DEADLYSIGNAL
=
==20822==ERROR: AddressSanitizer: SEGV on unknown address 0x (pc 
0x55df5393c60a bp 0x0001 sp 0x706021b0 T0)
==20822==The signal is caused by a READ memory access.
==20822==Hint: address points to the zero page.
 #0 0x55df5393c609 in fts5ChunkIterate 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210934
 #1 0x55df5393ca5e in fts5SegiterPoslist 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210970
 #2 0x55df5393d65d in fts5IterSetOutputs_Full 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:211177
 #3 0x55df5393f17e in fts5MultiIterNext 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210732
 #4 0x55df539444e9 in fts5MultiIterNew 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:211309
 #5 0x55df5394702f in sqlite3Fts5IndexQuery 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:213266
 #6 0x55df5398a566 in fts5ExprNearInitAll 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205261
 #7 0x55df5398a566 in fts5ExprNodeFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205778
 #8 0x55df5398ad3d in sqlite3Fts5ExprFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205836
 #9 0x55df5398af0d in fts5CursorFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215371
 #10 0x55df5398cc9d in fts5FilterMethod 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215653
 #11 0x55df538a973a in sqlite3VdbeExec 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:90333
 #12 0x55df538c5439 in sqlite3Step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81716
 #13 0x55df538c5439 in sqlite3_step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81781
 #14 0x55df536f9662 in exec_prepared_stmt 
/root/Documents/sqlite-autoconf-3270200/shell.c:10445
 #15 0x55df536f9662 in shell_exec 
/root/Documents/sqlite-autoconf-3270200/shell.c:10752
 #16 0x55df536fbdf3 in runOneSqlLine 
/root/Documents/sqlite-autoconf-3270200/shell.c:16106
 #17 0x55df5370b466 in process_input 
/root/Documents/sqlite-autoconf-3270200/shell.c:16206
 #18 0x55df536d6c98 in main 
/root/Documents/sqlite-autoconf-3270200/shell.c:16967
 #19 0x7f5c4f52809a in __libc_start_main ../csu/libc-start.c:308
 #20 0x55df536d8599 in _start 
(/root/Documents/sqlite-autoconf-3270200/sqlite3+0x46599)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210934 in fts5ChunkIterate
==20822==ABORTING
```

View detail In gdb:

```
(gdb) r < 1-null-pointer.sql
The program being debugged has been started already.
Start it from the beginning? (y or n) Y
Starting program: /root/Documents/sqlite-autoconf-3270200/sqlite3 < 
1-null-pointer.sql
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x557fe60a in fts5ChunkIterate (p=p@entry=0x60d00ad8, 
pSeg=pSeg@entry=0x61300b28, pCtx=0x7fffac00,
 xChunk=xChunk@entry=0x55622dc0 ) at 
sqlite3.c:210934
210934pData = fts5LeafRead(p, FTS5_SEGMENT_ROWID(pSeg->pSeg->iSegid, 
pgno));
(gdb) bt
#0  0x557fe60a in fts5ChunkIterate (p=p@entry=0x60d00ad8, 
pSeg=pSeg@entry=0x61300b28, pCtx=0x7fffac00,
 xChunk=xChunk@entry=0x55622dc0 ) at 
sqlite3.c:210934
#1  0x557fea5f in fts5SegiterPoslist (p=0x60d00ad8, 
pSeg=0x61300b28, pColset=pColset@entry=0x602014b8, 
pBuf=pBuf@entry=0x61300ae8)
 at sqlite3.c:210970
#2  0x557ff65e in fts5IterSetOutputs_Full (pIter=0x61300ac8, 
pSeg=) at sqlite3.c:211177
#3  0x5580117f in fts5MultiIterNext (p=p@entry=0x60d00ad8, 
pIter=pIter@entry=0x61300ac8, bFrom=bFrom@entry=0, iFrom=iFrom@entry=0)
 at sqlite3.c:210732
#4  0x558064ea in fts5MultiIterNew (p=p@entry=0x60d00ad8, 
pStruct=pStruct@entry=0x60402458, flags=flags@entry=16,
 pColset=pColset@entry=0x602014b8, pTerm=, nTerm=nTerm@entry=5, 
iLevel=, nSegment=, ppOut=)
 at sqlite3.c:211309
#5  0x55809030 in sqlite3Fts5IndexQuery (p=0x60d00ad8, 
pToken=pToken@entry=0x60201498 "", nToken=4, 

Re: [sqlite] FTS5 Transaction Leads to OOB Read

2019-03-18 Thread Dominique Pellé
Chu  wrote:

> The code:
>
> ```
> CREATE VIRTUAL TABLE t1 USING fts5(content);
>
> BEGIN;
> INSERT INTO t1 (content) VALUES('');
> SELECT * FROM 
> t1('*');
> END;
> ```
>
> As you can see, it creates a virtual table with fts5, and run a transaction 
> on it, this will leads to a OOB READ. The ASAN report:
>
> ```
> ➜  sqlite-crashes ../sqlite-autoconf-3270200/sqlite3 < 2-oob-read.sql
> =
> ==21007==ERROR: AddressSanitizer: heap-buffer-overflow on address 
> 0x60d02898 at pc 0x7f0cad16e6a3 bp 0x7ffdc88ddc80 sp 0x7ffdc88dd430
> READ of size 81 at 0x60d02898 thread T0
> #0 0x7f0cad16e6a2  (/lib/x86_64-linux-gnu/libasan.so.5+0xb86a2)
> #1 0x563324ca4013 in fts5HashEntrySort 
> /root/Documents/sqlite-autoconf-3270200/sqlite3.c:207762
> #2 0x563324e685f9 in sqlite3Fts5HashScanInit 
> /root/Documents/sqlite-autoconf-3270200/sqlite3.c:207820
> #3 0x563324e685f9 in fts5SegIterHashInit 
> /root/Documents/sqlite-autoconf-3270200/sqlite3.c:210321

...snip..

Just to confirm that there is a bug when running your
queries in the SQLite-3.27.2 shell with valgrind.

I did not get a heap overflow, but valgrind complains
about uninitialized memory in the same fts5HashEntrySort
function as in your stack:

$ valgrind --track-origins=yes --num-callers=50 sqlite3_shell
==10856== Memcheck, a memory error detector
==10856== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==10856== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==10856== Command: ./Output/Binary/x86_64-Linux-clang/Debug/bin/sqlite3_shell
==10856==
SQLite version 3.27.2 2019-02-25 16:06:06
NDSeV devkit 3.27.2.1 2019-02-26 16:04:39 990c4f90c3340db5
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE VIRTUAL TABLE t1 USING fts5(content);
sqlite> BEGIN;
sqlite> INSERT INTO t1 (content) VALUES('');
sqlite> SELECT * FROM
t1('*');
==10856== Conditional jump or move depends on uninitialised value(s)
==10856==at 0x4C362B2: __memcmp_sse4_1 (vg_replace_strmem.c:)
==10856==by 0x4ECC86: fts5HashEntrySort (nds_sqlite3.c:207811)
==10856==by 0x4EC55D: sqlite3Fts5HashScanInit (nds_sqlite3.c:207869)
==10856==by 0x4EBB45: fts5SegIterHashInit (nds_sqlite3.c:210370)
==10856==by 0x4EB1BE: fts5MultiIterNew (nds_sqlite3.c:211319)
==10856==by 0x4EB5A8: fts5SetupPrefixIter (nds_sqlite3.c:212995)
==10856==by 0x4EAE4A: sqlite3Fts5IndexQuery (nds_sqlite3.c:213324)
==10856==by 0x4EAA61: fts5ExprNearInitAll (nds_sqlite3.c:205310)
==10856==by 0x4EA6F3: fts5ExprNodeFirst (nds_sqlite3.c:205827)
==10856==by 0x4EA5B4: sqlite3Fts5ExprFirst (nds_sqlite3.c:205885)
==10856==by 0x4E9D1A: fts5CursorFirst (nds_sqlite3.c:215420)
==10856==by 0x4E2DD3: fts5FilterMethod (nds_sqlite3.c:215702)
==10856==by 0x4672A7: sqlite3VdbeExec (nds_sqlite3.c:90382)
==10856==by 0x42F876: sqlite3Step (nds_sqlite3.c:81765)
==10856==by 0x42F51C: sqlite3_step (nds_sqlite3.c:81830)
==10856==by 0x4236B8: exec_prepared_stmt (shell.c:10469)
==10856==by 0x4104EB: shell_exec (shell.c:10776)
==10856==by 0x42414E: runOneSqlLine (shell.c:16136)
==10856==by 0x410C5A: process_input (shell.c:16236)
==10856==by 0x40752D: main (shell.c:16995)
==10856==  Uninitialised value was created by a heap allocation
==10856==at 0x4C2FE56: malloc (vg_replace_malloc.c:299)
==10856==by 0x4E0098: sqlite3MemMalloc (nds_sqlite3.c:22886)
==10856==by 0x441ED0: mallocWithAlarm (nds_sqlite3.c:26718)
==10856==by 0x425E0B: sqlite3Malloc (nds_sqlite3.c:26748)
==10856==by 0x425E7F: sqlite3_malloc64 (nds_sqlite3.c:26772)
==10856==by 0x4FE464: sqlite3Fts5HashWrite (nds_sqlite3.c:207636)
==10856==by 0x4FE1F0: sqlite3Fts5IndexWrite (nds_sqlite3.c:213247)
==10856==by 0x4FACB9: fts5StorageInsertCallback (nds_sqlite3.c:217629)
==10856==by 0x505B04: fts5UnicodeTokenize (nds_sqlite3.c:218923)
==10856==by 0x4F623D: sqlite3Fts5Tokenize (nds_sqlite3.c:204268)
==10856==by 0x4FA0DD: sqlite3Fts5StorageIndexInsert (nds_sqlite3.c:217984)
==10856==by 0x4F9E44: fts5StorageInsert (nds_sqlite3.c:215929)
==10856==by 0x4E359D: fts5UpdateMethod (nds_sqlite3.c:216036)
==10856==by 0x4677E6: sqlite3VdbeExec (nds_sqlite3.c:90593)
==10856==by 0x42F876: sqlite3Step (nds_sqlite3.c:81765)
==10856==by 0x42F51C: sqlite3_step (nds_sqlite3.c:81830)
==10856==by 0x4236B8: exec_prepared_stmt (shell.c:10469)
==10856==by 0x4104EB: shell_exec (shell.c:10776)
==10856==by 0x42414E: runOneSqlLine (shell.c:16136)
==10856==by 0x410C5A: process_input (shell.c:16236)
==10856==by 0x40752D: main (shell.c:16995)
==10856==
sqlite> END;

Out 

[sqlite] FTS5 Transaction Leads to NULL Pointer

2019-03-18 Thread Chu
The code:

```
CREATE VIRTUAL TABLE t1 USING fts5(content);

INSERT INTO t1 VALUES('');

BEGIN ;
DELETE FROM t1 WHERE rowid = 1;
SELECT * FROM t1 WHERE content MATCH '';
INSERT INTO t1 VALUES('');
SELECT * FROM t1 WHERE content MATCH '';
END;
``

As you can see, it creates a virtual table with fts5, and run a transaction on 
it, this will leads to a crash because of null pointer. The ASAN report:

```
➜  sqlite-crashes ../sqlite-autoconf-3270200/sqlite3 < 1-null-pointer.sql
AddressSanitizer:DEADLYSIGNAL
=
==20822==ERROR: AddressSanitizer: SEGV on unknown address 0x (pc 
0x55df5393c60a bp 0x0001 sp 0x706021b0 T0)
==20822==The signal is caused by a READ memory access.
==20822==Hint: address points to the zero page.
#0 0x55df5393c609 in fts5ChunkIterate 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210934
#1 0x55df5393ca5e in fts5SegiterPoslist 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210970
#2 0x55df5393d65d in fts5IterSetOutputs_Full 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:211177
#3 0x55df5393f17e in fts5MultiIterNext 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210732
#4 0x55df539444e9 in fts5MultiIterNew 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:211309
#5 0x55df5394702f in sqlite3Fts5IndexQuery 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:213266
#6 0x55df5398a566 in fts5ExprNearInitAll 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205261
#7 0x55df5398a566 in fts5ExprNodeFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205778
#8 0x55df5398ad3d in sqlite3Fts5ExprFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205836
#9 0x55df5398af0d in fts5CursorFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215371
#10 0x55df5398cc9d in fts5FilterMethod 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215653
#11 0x55df538a973a in sqlite3VdbeExec 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:90333
#12 0x55df538c5439 in sqlite3Step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81716
#13 0x55df538c5439 in sqlite3_step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81781
#14 0x55df536f9662 in exec_prepared_stmt 
/root/Documents/sqlite-autoconf-3270200/shell.c:10445
#15 0x55df536f9662 in shell_exec 
/root/Documents/sqlite-autoconf-3270200/shell.c:10752
#16 0x55df536fbdf3 in runOneSqlLine 
/root/Documents/sqlite-autoconf-3270200/shell.c:16106
#17 0x55df5370b466 in process_input 
/root/Documents/sqlite-autoconf-3270200/shell.c:16206
#18 0x55df536d6c98 in main 
/root/Documents/sqlite-autoconf-3270200/shell.c:16967
#19 0x7f5c4f52809a in __libc_start_main ../csu/libc-start.c:308
#20 0x55df536d8599 in _start 
(/root/Documents/sqlite-autoconf-3270200/sqlite3+0x46599)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210934 in fts5ChunkIterate
==20822==ABORTING
```

View detail In gdb:

```
(gdb) r < 1-null-pointer.sql 
The program being debugged has been started already.
Start it from the beginning? (y or n) Y
Starting program: /root/Documents/sqlite-autoconf-3270200/sqlite3 < 
1-null-pointer.sql
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x557fe60a in fts5ChunkIterate (p=p@entry=0x60d00ad8, 
pSeg=pSeg@entry=0x61300b28, pCtx=0x7fffac00, 
xChunk=xChunk@entry=0x55622dc0 ) at 
sqlite3.c:210934
210934pData = fts5LeafRead(p, FTS5_SEGMENT_ROWID(pSeg->pSeg->iSegid, 
pgno));
(gdb) bt
#0  0x557fe60a in fts5ChunkIterate (p=p@entry=0x60d00ad8, 
pSeg=pSeg@entry=0x61300b28, pCtx=0x7fffac00, 
xChunk=xChunk@entry=0x55622dc0 ) at 
sqlite3.c:210934
#1  0x557fea5f in fts5SegiterPoslist (p=0x60d00ad8, 
pSeg=0x61300b28, pColset=pColset@entry=0x602014b8, 
pBuf=pBuf@entry=0x61300ae8)
at sqlite3.c:210970
#2  0x557ff65e in fts5IterSetOutputs_Full (pIter=0x61300ac8, 
pSeg=) at sqlite3.c:211177
#3  0x5580117f in fts5MultiIterNext (p=p@entry=0x60d00ad8, 
pIter=pIter@entry=0x61300ac8, bFrom=bFrom@entry=0, iFrom=iFrom@entry=0)
at sqlite3.c:210732
#4  0x558064ea in fts5MultiIterNew (p=p@entry=0x60d00ad8, 
pStruct=pStruct@entry=0x60402458, flags=flags@entry=16, 
pColset=pColset@entry=0x602014b8, pTerm=, 
nTerm=nTerm@entry=5, iLevel=, nSegment=, 
ppOut=)
at sqlite3.c:211309
#5  0x55809030 in sqlite3Fts5IndexQuery (p=0x60d00ad8, 
pToken=pToken@entry=0x60201498 "", nToken=4, flags=flags@entry=0, 
pColset=pColset@entry=0x602014b8, ppIter=ppIter@entry=0x61300938) 
at sqlite3.c:213266
#6  0x5584c567 in fts5ExprNearInitAll (pExpr=0x60402598, 
pExpr=0x60402598, 

[sqlite] FTS5 Transaction Leads to OOB Read

2019-03-18 Thread Chu
The code:

```
CREATE VIRTUAL TABLE t1 USING fts5(content);

BEGIN;
INSERT INTO t1 (content) VALUES('');
SELECT * FROM 
t1('*');
END;
```

As you can see, it creates a virtual table with fts5, and run a transaction on 
it, this will leads to a OOB READ. The ASAN report:

```
➜  sqlite-crashes ../sqlite-autoconf-3270200/sqlite3 < 2-oob-read.sql
=
==21007==ERROR: AddressSanitizer: heap-buffer-overflow on address 
0x60d02898 at pc 0x7f0cad16e6a3 bp 0x7ffdc88ddc80 sp 0x7ffdc88dd430
READ of size 81 at 0x60d02898 thread T0
#0 0x7f0cad16e6a2  (/lib/x86_64-linux-gnu/libasan.so.5+0xb86a2)
#1 0x563324ca4013 in fts5HashEntrySort 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:207762
#2 0x563324e685f9 in sqlite3Fts5HashScanInit 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:207820
#3 0x563324e685f9 in fts5SegIterHashInit 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210321
#4 0x563324e685f9 in fts5MultiIterNew 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:211270
#5 0x563324e6b380 in fts5SetupPrefixIter 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:212946
#6 0x563324e6b380 in sqlite3Fts5IndexQuery 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:213275
#7 0x563324eae566 in fts5ExprNearInitAll 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205261
#8 0x563324eae566 in fts5ExprNodeFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205778
#9 0x563324eaed3d in sqlite3Fts5ExprFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205836
#10 0x563324eaef0d in fts5CursorFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215371
#11 0x563324eb0c9d in fts5FilterMethod 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215653
#12 0x563324dcd73a in sqlite3VdbeExec 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:90333
#13 0x563324de9439 in sqlite3Step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81716
#14 0x563324de9439 in sqlite3_step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81781
#15 0x563324c1d662 in exec_prepared_stmt 
/root/Documents/sqlite-autoconf-3270200/shell.c:10445
#16 0x563324c1d662 in shell_exec 
/root/Documents/sqlite-autoconf-3270200/shell.c:10752
#17 0x563324c1fdf3 in runOneSqlLine 
/root/Documents/sqlite-autoconf-3270200/shell.c:16106
#18 0x563324c2f466 in process_input 
/root/Documents/sqlite-autoconf-3270200/shell.c:16206
#19 0x563324bfac98 in main 
/root/Documents/sqlite-autoconf-3270200/shell.c:16967
#20 0x7f0cacd7009a in __libc_start_main ../csu/libc-start.c:308
#21 0x563324bfc599 in _start 
(/root/Documents/sqlite-autoconf-3270200/sqlite3+0x46599)

0x60d02898 is located 0 bytes to the right of 136-byte region 
[0x60d02810,0x60d02898)
allocated by thread T0 here:
#0 0x7f0cad19f350 in __interceptor_malloc 
(/lib/x86_64-linux-gnu/libasan.so.5+0xe9350)
#1 0x563324cdccf3 in sqlite3MemMalloc 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:22837

SUMMARY: AddressSanitizer: heap-buffer-overflow 
(/lib/x86_64-linux-gnu/libasan.so.5+0xb86a2) 
Shadow bytes around the buggy address:
  0x0c1a7fff84c0: fd fd fd fd fd fd fa fa fa fa fa fa fa fa fd fd
  0x0c1a7fff84d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c1a7fff84e0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c1a7fff84f0: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
  0x0c1a7fff8500: fa fa 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c1a7fff8510: 00 00 00[fa]fa fa fa fa fa fa fa fa fd fd fd fd
  0x0c1a7fff8520: fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa
  0x0c1a7fff8530: fa fa fa fa fa fa 00 00 00 00 00 00 00 00 00 00
  0x0c1a7fff8540: 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa fa
  0x0c1a7fff8550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c1a7fff8560: 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:   fa
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init order:   f6
  Poisoned by user:f7
  Container overflow:  fc
  Array cookie:ac
  Intra object redzone:bb
  ASan internal:   fe
  Left alloca redzone: ca
  Right alloca redzone:cb
==21007==ABORTING
```

View detail in gdb:

```
(gdb) r < 2-oob-read.sql 
Starting program: /root/Documents/sqlite-autoconf-3270200/sqlite3 < 
2-oob-read.sql
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x5564200f in fts5HashEntrySort (pHash=0x60402018, 

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-23 Thread 邱朗
Hi Hideaki,

Thanks for your reply which made me figure out why I said icu version does 
"not" support Chinese: b/c in Chinese '中文' can be tokenize as either '中文' or 
'中' or '文' so when query '中文' or '中*' I can get the result but no result when 
query '文'. The same goes to '为什么', which can be be tokenize as either '为什么' or 
'为' or '什么' so no result for when query '什么'

And sadly fts5+unicode 61 definitely does not support Chinese.



BTW, it also helps me realize that I had answered this question myself at 2014 
here, https://stackoverflow.com/a/31396975/301513. So basically icu does the 
same as iOS CFStringTokenizer 


Qiulang 

在 2018-09-22 22:49:24,"Hideaki Takahashi"  写道:
>Hello,
>
>full text search index can be used to see how the text is tokenized for
>both FTS4 and FTS5.
>for FTS4, fts3tokenize can be used too.
>
>sqlite> CREATE VIRTUAL TABLE icu_zh_cn USING fts3tokenize(icu, zh_CN);
>sqlite> SELECT token, start, end, position FROM icu_zh_cn WHERE
>INPUT='为什么不支持中文 fts5 does not seem to work for chinese';
>为什么|0|9|0
>不|9|12|1
>支持|12|18|2
>中文|18|24|3
>fts5|25|29|4
>does|30|34|5
>not|35|38|6
>seem|39|43|7
>to|44|46|8
>work|47|51|9
>for|52|55|10
>chinese|56|63|11
>
>based on the output above, the query below works and makes sense to me.
>sqlite> select * from zh_text where text match '中文';
>为什么不支持中文 icu does not seem to work for chinese
>
>
>FTS5 + unicode61
>sqlite> CREATE VIRTUAL TABLE ft5_test USING fts5(content, tokenize =
>'porter unicode61 remove_diacritics 1');
>sqlite> INSERT INTO ft5_test values('为什么不支持中文 fts5 does not seem to work
>for chinese');
>sqlite> CREATE VIRTUAL TABLE ft5_test_vocab_i USING fts5vocab(ft5_test,
>'instance');
>sqlite> SELECT term, doc, col, offset FROM ft5_test_vocab_i;
>(snip non-Chinese portion)
>为什么不支持中文|1|content|0
>
>FTS4 + ICU(zh_CN)
>sqlite> CREATE VIRTUAL TABLE zh_text USING fts4(text, tokenize=icu zh_CN);
>sqlite> INSERT INTO zh_text values('为什么不支持中文 icu does not seem to work for
>chinese');
>sqlite> CREATE VIRTUAL TABLE zh_terms USING fts4aux(zh_text);
>sqlite> SELECT term, col, documents FROM zh_terms;
>(snip non-Chinese portion)
>不|*|1
>不|0|1
>中文|*|1
>中文|0|1
>为什么|*|1
>为什么|0|1
>支持|*|1
>支持|0|1
>
>Thanks,
>Hideaki
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 MATCH using "string*" matches just the exact string

2018-09-22 Thread Maziar
Hi
Then I can do nothing about it.

Sent from my iPhone

> On Shahrivar 31, 1397 AP, at 7:50 PM, Dan Kennedy  
> wrote:
> 
>> On 09/22/2018 02:11 PM, Maziar Parsijani wrote:
>> Hi,Dan
>> I have trimmed it here you can download it:
>> https://drive.google.com/file/d/1D0xKF3DaOEbvfe9yFf0aFQvpo2nxWYqF/view?usp=sharing
>> And the problem is searching with MATCH 'أعلم*' it must find 5 rows.
>> it could be the same as I use LIKE "%أعلم%"
> 
> Thaks for the db.
> 
> As far as I can tell it's working correctly.
> 
> The default FTS5 tokenizer splits text on whitespace and punctuation. And 
> 'token*' only matches tokens that begin with "token", not all tokens that 
> contain the string "token".
> 
> In utf-8, your search string is X'D8A3D8B9D984D985'.
> 
> So, for example, the LIKE pattern matches the text that, rendered in utf-8, 
> is:
> 
>  X'20D988D8A3D8B9D984D985D988D8A7'
> 
> because it contains the search string starting at the third byte. But the 
> character before that is U+0648 - "ARABIC LETTER WAW", which is not 
> whitespace or punctuation. So the search string is not at the start of the 
> token and the query does not match.
> 
> Dan.
> 
> 
> 
> 
> 
>> 
>> 
>>> On Tue, Sep 11, 2018 at 8:35 PM Dan Kennedy  wrote:
>>> 
 On 09/11/2018 01:26 PM, Maziar Parsijani wrote:
 Thanks for your answer Eric,Sandu
 About the first idea of Eric I have to say its correct for me when I do a
 test with another fts5 table with no huge rows but my database has that
 problem.And about the other possibility I tested that before to set "*"at
 the first and end of the match but I have got an error.Here if there is
>>> no
 problem I will attach an example Arabic text database  if you search for
 'أعلم*' it must find 83 rows but it will find 49 which is the same with
 "أعلم" results.and if you put "*"at the both sides you will get error.I
 don't know if I did something wrong during creation but this is all I
>>> have
 got.
>>> 
>>> The "*" may only occur following a string in an fts5 expression, not
>>> before:
>>> 
>>>   https://sqlite.org/fts5.html#fts5_prefix_queries
>>> 
>>> If possible, can you trim the database down to just two rows that
>>> demonstrate the problem and upload it somewhere? i.e. so that it
>>> contains two rows that should both be matched by 'أعلم*', but for which
>>> the current FTS5 only matches one.
>>> 
>>> Even if the complete database makes the problem obvious to Arabic
>>> readers, it will be quite difficult for non-readers to deal with.
>>> 
>>> Thanks,
>>> Dan.
>>> 
>>> 
>>> 
>>> 
>>> 
 
 On Tue, Sep 11, 2018 at 10:20 AM Sandu Buraga 
 wrote:
 
> In Arabic you have a different direction of the text. Might be that the
> reason?
> 
> Sandu
> 
> Eric Minbiole  schrieb am Di., 11. Sep. 2018,
>>> 00:23:
> 
>>> SELECT rowid,text
>>> FROM table
>>> WHERE table MATCH 'أعلم*';
>>> 
>>> And I have to add this that my data is Arabic text.
>>> 
>>> This method must find words that contains 'أعلم' but it doesn't. What
>>> should I do now.
>> I just tried this in SQLite version 3.24.0 on a Mac, and it seems to
> work.
>> Please see my example below. In particular, the first select doesn't
> have a
>> *, so it doesn't find the word. The second example includes a trailing
>>> *,
>> so it performs a "starts with" search, and does find the match. (Note
> that
>> I simply doubled the letters of your original arabic text to make a
> longer
>> word.)
>> 
>> MacBook-II:Programs eric$ ./sqlite3
>>> SQLite version 3.24.0 2018-06-04 19:24:41
>>> Enter ".help" for usage hints.
>>> Connected to a transient in-memory database.
>>> Use ".open FILENAME" to reopen on a persistent database.
>>> sqlite> CREATE VIRTUAL TABLE fts USING fts5(doc);
>>> sqlite> INSERT INTO fts (rowid, doc) VALUES (1, 'english text');
>>> sqlite> INSERT INTO fts (rowid, doc) VALUES (2, 'arabic أعلمأعلم');
>>> sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم';
>>> sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم*';
>>> 2
>>> sqlite>
>> 
>> One possible explanation: You mentioned that you want a "contains"
> search.
>> However, the trailing asterisk in your example only results in a
>>> "begins
>> with" search. If you really want a "contains" search, you'll need to
>>> put
>> the * at both the beginning and end of the match word. E.g., "WHERE fts
>> MATCH '*asdf*';
>> 
>> Hope this is helpful.
>> 
>> ~Eric
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> 

Re: [sqlite] FTS5 MATCH using "string*" matches just the exact string

2018-09-22 Thread Dan Kennedy

On 09/22/2018 02:11 PM, Maziar Parsijani wrote:

Hi,Dan
I have trimmed it here you can download it:
https://drive.google.com/file/d/1D0xKF3DaOEbvfe9yFf0aFQvpo2nxWYqF/view?usp=sharing
And the problem is searching with MATCH 'أعلم*' it must find 5 rows.
it could be the same as I use LIKE "%أعلم%"


Thaks for the db.

As far as I can tell it's working correctly.

The default FTS5 tokenizer splits text on whitespace and punctuation. 
And 'token*' only matches tokens that begin with "token", not all tokens 
that contain the string "token".


In utf-8, your search string is X'D8A3D8B9D984D985'.

So, for example, the LIKE pattern matches the text that, rendered in 
utf-8, is:


  X'20D988D8A3D8B9D984D985D988D8A7'

because it contains the search string starting at the third byte. But 
the character before that is U+0648 - "ARABIC LETTER WAW", which is not 
whitespace or punctuation. So the search string is not at the start of 
the token and the query does not match.


Dan.








On Tue, Sep 11, 2018 at 8:35 PM Dan Kennedy  wrote:


On 09/11/2018 01:26 PM, Maziar Parsijani wrote:

Thanks for your answer Eric,Sandu
About the first idea of Eric I have to say its correct for me when I do a
test with another fts5 table with no huge rows but my database has that
problem.And about the other possibility I tested that before to set "*"at
the first and end of the match but I have got an error.Here if there is

no

problem I will attach an example Arabic text database  if you search for
'أعلم*' it must find 83 rows but it will find 49 which is the same with
"أعلم" results.and if you put "*"at the both sides you will get error.I
don't know if I did something wrong during creation but this is all I

have

got.


The "*" may only occur following a string in an fts5 expression, not
before:

   https://sqlite.org/fts5.html#fts5_prefix_queries

If possible, can you trim the database down to just two rows that
demonstrate the problem and upload it somewhere? i.e. so that it
contains two rows that should both be matched by 'أعلم*', but for which
the current FTS5 only matches one.

Even if the complete database makes the problem obvious to Arabic
readers, it will be quite difficult for non-readers to deal with.

Thanks,
Dan.







On Tue, Sep 11, 2018 at 10:20 AM Sandu Buraga 
wrote:


In Arabic you have a different direction of the text. Might be that the
reason?

Sandu

Eric Minbiole  schrieb am Di., 11. Sep. 2018,

00:23:



SELECT rowid,text
FROM table
WHERE table MATCH 'أعلم*';

And I have to add this that my data is Arabic text.

This method must find words that contains 'أعلم' but it doesn't. What
should I do now.



I just tried this in SQLite version 3.24.0 on a Mac, and it seems to

work.

Please see my example below. In particular, the first select doesn't

have a

*, so it doesn't find the word. The second example includes a trailing

*,

so it performs a "starts with" search, and does find the match. (Note

that

I simply doubled the letters of your original arabic text to make a

longer

word.)

MacBook-II:Programs eric$ ./sqlite3

SQLite version 3.24.0 2018-06-04 19:24:41
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE VIRTUAL TABLE fts USING fts5(doc);
sqlite> INSERT INTO fts (rowid, doc) VALUES (1, 'english text');
sqlite> INSERT INTO fts (rowid, doc) VALUES (2, 'arabic أعلمأعلم');
sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم';
sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم*';
2
sqlite>



One possible explanation: You mentioned that you want a "contains"

search.

However, the trailing asterisk in your example only results in a

"begins

with" search. If you really want a "contains" search, you'll need to

put

the * at both the beginning and end of the match word. E.g., "WHERE fts
MATCH '*asdf*';

Hope this is helpful.

~Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-22 Thread Hideaki Takahashi
Hello,

full text search index can be used to see how the text is tokenized for
both FTS4 and FTS5.
for FTS4, fts3tokenize can be used too.

sqlite> CREATE VIRTUAL TABLE icu_zh_cn USING fts3tokenize(icu, zh_CN);
sqlite> SELECT token, start, end, position FROM icu_zh_cn WHERE
INPUT='为什么不支持中文 fts5 does not seem to work for chinese';
为什么|0|9|0
不|9|12|1
支持|12|18|2
中文|18|24|3
fts5|25|29|4
does|30|34|5
not|35|38|6
seem|39|43|7
to|44|46|8
work|47|51|9
for|52|55|10
chinese|56|63|11

based on the output above, the query below works and makes sense to me.
sqlite> select * from zh_text where text match '中文';
为什么不支持中文 icu does not seem to work for chinese


FTS5 + unicode61
sqlite> CREATE VIRTUAL TABLE ft5_test USING fts5(content, tokenize =
'porter unicode61 remove_diacritics 1');
sqlite> INSERT INTO ft5_test values('为什么不支持中文 fts5 does not seem to work
for chinese');
sqlite> CREATE VIRTUAL TABLE ft5_test_vocab_i USING fts5vocab(ft5_test,
'instance');
sqlite> SELECT term, doc, col, offset FROM ft5_test_vocab_i;
(snip non-Chinese portion)
为什么不支持中文|1|content|0

FTS4 + ICU(zh_CN)
sqlite> CREATE VIRTUAL TABLE zh_text USING fts4(text, tokenize=icu zh_CN);
sqlite> INSERT INTO zh_text values('为什么不支持中文 icu does not seem to work for
chinese');
sqlite> CREATE VIRTUAL TABLE zh_terms USING fts4aux(zh_text);
sqlite> SELECT term, col, documents FROM zh_terms;
(snip non-Chinese portion)
不|*|1
不|0|1
中文|*|1
中文|0|1
为什么|*|1
为什么|0|1
支持|*|1
支持|0|1

Thanks,
Hideaki

On Sat, Sep 22, 2018 at 12:44 AM 邱朗  wrote:

> Hi,
>
>
> It was exactly like you said, my bad, so now I have built an icu version.
> BUT unfortunately it still does not support CJK, why is that ?
>
>
> qiulangs-MacBook-Pro:sqlite-autoconf-3250100 qiulang$ ./sqlite3
> SQLite version 3.25.1 2018-09-18 20:20:44
> Enter ".help" for usage hints.
> Connected to a transient in-memory database.
> Use ".open FILENAME" to reopen on a persistent database.
> sqlite> CREATE VIRTUAL TABLE zh_text USING fts4(text, tokenize=icu zh_CN);
> sqlite> INSERT INTO zh_text values('为什么不支持中文 icu does not seem to work for
> chinese');
> sqlite> select * from zh_text where text match 'work';
> 为什么不支持中文 icu does not seem to work for chinese
> sqlite> select * from zh_text where text match '中';
> sqlite>
>
>
> BTW, whoever hit the icu4c error it may be because you make the same
> mistake as I did. So I first run brew link icu4c, but brew refused,
> "Warning: Refusing to link macOS-provided software: icu4c", then I forgot
> to add it to my path :$
>
>
> If you run brew info icu4c, it will tell you that but actually I didn't
> set them and compiler still can find them
>
>
> For compilers to find icu4c you may need to set:
>   export LDFLAGS="-L/usr/local/opt/icu4c/lib"
>   export CPPFLAGS="-I/usr/local/opt/icu4c/include"
>
>
> Thanks,
> Qiulang
> At 2018-09-21 23:43:01, "Dan Kennedy"  wrote:
> >On 09/21/2018 09:44 PM, 邱朗 wrote:
> >> I actually first used  ./configure CFLAGS="-DSQLITE_ENABLE_ICU
> `icu-config --cppflags`" LDFLAGS="`icu-config --ldflags`"  But I got the
> error
> >
> >When you ran this configure command, is the first line out output
> >something like the following?
> >
> >   bash: icu-config: command not found
> >
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 MATCH using "string*" matches just the exact string

2018-09-22 Thread Maziar Parsijani
Hi,Dan
I have trimmed it here you can download it:
https://drive.google.com/file/d/1D0xKF3DaOEbvfe9yFf0aFQvpo2nxWYqF/view?usp=sharing
And the problem is searching with MATCH 'أعلم*' it must find 5 rows.
it could be the same as I use LIKE "%أعلم%"


On Tue, Sep 11, 2018 at 8:35 PM Dan Kennedy  wrote:

> On 09/11/2018 01:26 PM, Maziar Parsijani wrote:
> > Thanks for your answer Eric,Sandu
> > About the first idea of Eric I have to say its correct for me when I do a
> > test with another fts5 table with no huge rows but my database has that
> > problem.And about the other possibility I tested that before to set "*"at
> > the first and end of the match but I have got an error.Here if there is
> no
> > problem I will attach an example Arabic text database  if you search for
> > 'أعلم*' it must find 83 rows but it will find 49 which is the same with
> > "أعلم" results.and if you put "*"at the both sides you will get error.I
> > don't know if I did something wrong during creation but this is all I
> have
> > got.
>
> The "*" may only occur following a string in an fts5 expression, not
> before:
>
>https://sqlite.org/fts5.html#fts5_prefix_queries
>
> If possible, can you trim the database down to just two rows that
> demonstrate the problem and upload it somewhere? i.e. so that it
> contains two rows that should both be matched by 'أعلم*', but for which
> the current FTS5 only matches one.
>
> Even if the complete database makes the problem obvious to Arabic
> readers, it will be quite difficult for non-readers to deal with.
>
> Thanks,
> Dan.
>
>
>
>
>
> >
> > On Tue, Sep 11, 2018 at 10:20 AM Sandu Buraga 
> > wrote:
> >
> >> In Arabic you have a different direction of the text. Might be that the
> >> reason?
> >>
> >> Sandu
> >>
> >> Eric Minbiole  schrieb am Di., 11. Sep. 2018,
> 00:23:
> >>
>  SELECT rowid,text
>  FROM table
>  WHERE table MATCH 'أعلم*';
> 
>  And I have to add this that my data is Arabic text.
> 
>  This method must find words that contains 'أعلم' but it doesn't. What
>  should I do now.
> 
> 
> >>> I just tried this in SQLite version 3.24.0 on a Mac, and it seems to
> >> work.
> >>> Please see my example below. In particular, the first select doesn't
> >> have a
> >>> *, so it doesn't find the word. The second example includes a trailing
> *,
> >>> so it performs a "starts with" search, and does find the match. (Note
> >> that
> >>> I simply doubled the letters of your original arabic text to make a
> >> longer
> >>> word.)
> >>>
> >>> MacBook-II:Programs eric$ ./sqlite3
>  SQLite version 3.24.0 2018-06-04 19:24:41
>  Enter ".help" for usage hints.
>  Connected to a transient in-memory database.
>  Use ".open FILENAME" to reopen on a persistent database.
>  sqlite> CREATE VIRTUAL TABLE fts USING fts5(doc);
>  sqlite> INSERT INTO fts (rowid, doc) VALUES (1, 'english text');
>  sqlite> INSERT INTO fts (rowid, doc) VALUES (2, 'arabic أعلمأعلم');
>  sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم';
>  sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم*';
>  2
>  sqlite>
> 
> >>>
> >>> One possible explanation: You mentioned that you want a "contains"
> >> search.
> >>> However, the trailing asterisk in your example only results in a
> "begins
> >>> with" search. If you really want a "contains" search, you'll need to
> put
> >>> the * at both the beginning and end of the match word. E.g., "WHERE fts
> >>> MATCH '*asdf*';
> >>>
> >>> Hope this is helpful.
> >>>
> >>> ~Eric
> >>> ___
> >>> sqlite-users mailing list
> >>> sqlite-users@mailinglists.sqlite.org
> >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>>
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users@mailinglists.sqlite.org
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
Hi,


It was exactly like you said, my bad, so now I have built an icu version. BUT 
unfortunately it still does not support CJK, why is that ?


qiulangs-MacBook-Pro:sqlite-autoconf-3250100 qiulang$ ./sqlite3
SQLite version 3.25.1 2018-09-18 20:20:44
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE VIRTUAL TABLE zh_text USING fts4(text, tokenize=icu zh_CN);
sqlite> INSERT INTO zh_text values('为什么不支持中文 icu does not seem to work for 
chinese');
sqlite> select * from zh_text where text match 'work';
为什么不支持中文 icu does not seem to work for chinese
sqlite> select * from zh_text where text match '中';
sqlite>


BTW, whoever hit the icu4c error it may be because you make the same mistake as 
I did. So I first run brew link icu4c, but brew refused, "Warning: Refusing to 
link macOS-provided software: icu4c", then I forgot to add it to my path :$


If you run brew info icu4c, it will tell you that but actually I didn't set 
them and compiler still can find them


For compilers to find icu4c you may need to set:
  export LDFLAGS="-L/usr/local/opt/icu4c/lib"
  export CPPFLAGS="-I/usr/local/opt/icu4c/include"


Thanks,
Qiulang
At 2018-09-21 23:43:01, "Dan Kennedy"  wrote:
>On 09/21/2018 09:44 PM, 邱朗 wrote:
>> I actually first used  ./configure CFLAGS="-DSQLITE_ENABLE_ICU `icu-config 
>> --cppflags`" LDFLAGS="`icu-config --ldflags`"  But I got the error
>
>When you ran this configure command, is the first line out output 
>something like the following?
>
>   bash: icu-config: command not found
>

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Jens Alfke


> On Sep 20, 2018, at 11:01 PM, 邱朗  wrote:
> 
> https://www.sqlite.org/fts5.html  said " 
> The unicode tokenizer classifies all unicode characters as either "separator" 
> or "token" characters. By default all space and punctuation characters, as 
> defined by Unicode 6.1, are considered separators, and all other characters 
> as token characters... "  I really doubt unicode tokenizer requires white 
> space, that is ascii tokenizer.

Detecting word breaks in many East Asian languages (not just CJK; Thai is 
another) is a rather difficult task and requires having a non-small database of 
character sequences to match. I’m sure the SQLite maintainers considered it too 
large to build into their Unicode tokenizer.

IIRC, ICU can do this, as can special libraries like Mecab.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 minimum number of characters to index ?

2018-09-21 Thread Jens Alfke


> On Sep 21, 2018, at 3:26 AM, Domingo Alvarez Duarte  
> wrote:
> 
> looking at some fts5 tables it seems that an option to limit the minimum 
> number of characters to at least 2 or 3 would be a good shot as stopwords,

A real stop-word list is valuable, but I don’t think a simple minimum-length 
rule would be as useful. Maybe in a few contexts, but not in general. (It’s not 
useful even for English text; for example, I’m very glad that Google indexes 
the word “C” so I can look up questions about C programming!)

> another interest option would be a regex like black/white list of sequence of 
> characters to be indexed.

You can do all this and more with a custom tokenizer :)

(Most real-world uses of FTS for natural language text will end up needing a 
custom tokenizer anyway, because IIRC the default tokenizer is very stupid and 
only breaks at whitespace. At a minimum you need one that can ignore inter-word 
punctuation like periods and commas, and recognize some non-ASCII characters 
like curly quotes and en-dashes.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Dan Kennedy

On 09/21/2018 09:44 PM, 邱朗 wrote:

I actually first used  ./configure CFLAGS="-DSQLITE_ENABLE_ICU `icu-config --cppflags`" 
LDFLAGS="`icu-config --ldflags`"  But I got the error


When you ran this configure command, is the first line out output 
something like the following?


  bash: icu-config: command not found

Is [icu-config] actually in your path? And if so, what does the 
[icu-config --ldflags] command return?


Dan.







sqlite3.c:184184:10: fatal error: 'unicode/utypes.h' file not found
#include 


Then I added -I -L switches and if I remembered correct I used brew to install 
icu4c. The compiler command are  these


qiulangs-MacBook-Pro:sqlite-autoconf-3250100 qiulang$ make
/bin/sh ./libtool  --tag=CC   --mode=compile gcc -DPACKAGE_NAME=\"sqlite\" -DPACKAGE_TARNAME=\"sqlite\" -DPACKAGE_VERSION=\"3.25.1\" 
-DPACKAGE_STRING=\"sqlite\ 3.25.1\" -DPACKAGE_BUGREPORT=\"http://www.sqlite.org\; -DPACKAGE_URL=\"\" -DPACKAGE=\"sqlite\" 
-DVERSION=\"3.25.1\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_FDATASYNC=1 -DHAVE_USLEEP=1 -DHAVE_LOCALTIME_R=1 
-DHAVE_GMTIME_R=1 -DHAVE_DECL_STRERROR_R=1 -DHAVE_STRERROR_R=1 -DHAVE_EDITLINE_READLINE_H=1 -DHAVE_READLINE_READLINE_H=1 -DHAVE_READLINE=1 -DHAVE_ZLIB_H=1 -I.
-D_REENTRANT=1 -DSQLITE_THREADSAFE=1 -DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_RTREE -DSQLITE_HAVE_ZLIB  
-I/usr/local/opt/icu4c/include -DSQLITE_ENABLE_ICU  -MT sqlite3.lo -MD -MP -MF .deps/sqlite3.Tpo -c -o sqlite3.lo sqlite3.c
libtool: compile:  gcc -DPACKAGE_NAME=\"sqlite\" -DPACKAGE_TARNAME=\"sqlite\" -DPACKAGE_VERSION=\"3.25.1\" "-DPACKAGE_STRING=\"sqlite 
3.25.1\"" -DPACKAGE_BUGREPORT=\"http://www.sqlite.org\; -DPACKAGE_URL=\"\" -DPACKAGE=\"sqlite\" -DVERSION=\"3.25.1\" 
-DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
-DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_FDATASYNC=1 -DHAVE_USLEEP=1 -DHAVE_LOCALTIME_R=1 -DHAVE_GMTIME_R=1 -DHAVE_DECL_STRERROR_R=1 
-DHAVE_STRERROR_R=1 -DHAVE_EDITLINE_READLINE_H=1 -DHAVE_READLINE_READLINE_H=1 -DHAVE_READLINE=1 -DHAVE_ZLIB_H=1 -I. -D_REENTRANT=1 -DSQLITE_THREADSAFE=1 -DSQLITE_ENABLE_FTS4 
-DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_RTREE -DSQLITE_HAVE_ZLIB -I/usr/local/opt/icu4c/include -DSQLITE_ENABLE_ICU -MT sqlite3.lo -MD -MP -MF .deps/sqlite3.Tpo 
-c sqlite3.c  -fno-common -DPIC -o .libs/sqlite3.o
libtool: compile:  gcc -DPACKAGE_NAME=\"sqlite\" -DPACKAGE_TARNAME=\"sqlite\" -DPACKAGE_VERSION=\"3.25.1\" "-DPACKAGE_STRING=\"sqlite 
3.25.1\"" -DPACKAGE_BUGREPORT=\"http://www.sqlite.org\; -DPACKAGE_URL=\"\" -DPACKAGE=\"sqlite\" -DVERSION=\"3.25.1\" -DSTDC_HEADERS=1 
-DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
-DLT_OBJDIR=\".libs/\" -DHAVE_FDATASYNC=1 -DHAVE_USLEEP=1 -DHAVE_LOCALTIME_R=1 -DHAVE_GMTIME_R=1 -DHAVE_DECL_STRERROR_R=1 -DHAVE_STRERROR_R=1 -DHAVE_EDITLINE_READLINE_H=1 
-DHAVE_READLINE_READLINE_H=1 -DHAVE_READLINE=1 -DHAVE_ZLIB_H=1 -I. -D_REENTRANT=1 -DSQLITE_THREADSAFE=1 -DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_RTREE 
-DSQLITE_HAVE_ZLIB -I/usr/local/opt/icu4c/include -DSQLITE_ENABLE_ICU -MT sqlite3.lo -MD -MP -MF .deps/sqlite3.Tpo -c sqlite3.c -o sqlite3.o >/dev/null 2>&1
mv -f .deps/sqlite3.Tpo .deps/sqlite3.Plo
/bin/sh ./libtool  --tag=CC   --mode=link gcc -D_REENTRANT=1 
-DSQLITE_THREADSAFE=1 -DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_FTS5 
-DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_RTREE -DSQLITE_HAVE_ZLIB  
-I/usr/local/opt/icu4c/include -DSQLITE_ENABLE_ICU  -no-undefined -version-info 
8:6:8 -L/usr/local/opt/icu4c/lib  -o libsqlite3.la -rpath /usr/local/lib 
sqlite3.lo  -lz
libtool: link: gcc -dynamiclib  -o .libs/libsqlite3.0.dylib  .libs/sqlite3.o   
-L/usr/local/opt/icu4c/lib -lz-install_name  
/usr/local/lib/libsqlite3.0.dylib -compatibility_version 9 -current_version 9.6 
-Wl,-single_module
Undefined symbols for architecture x86_64:
  "_u_errorName_62", referenced from:
  _icuFunctionError in sqlite3.o
  "_u_foldCase_62", referenced from:
...





At 2018-09-21 21:52:30, "Dan Kennedy"  wrote:

On 09/21/2018 05:21 PM, 邱朗 wrote:

Hi,

Thanks for replying my question. Following are the error I got when compiling 
sqlite-autoconf-3250100.tar.gz . The error looks similar to this old discussion
http://sqlite.1065341.n5.nabble.com/compiling-Sqlite-with-ICU-td40641.html


I am using macOS 10.13 & xcode 10


The text below is just the error. If you post the compiler commands that
appear before it in the build log somebody might be able to spot the
problem.

From the error message, it may 

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
I actually first used  ./configure CFLAGS="-DSQLITE_ENABLE_ICU `icu-config 
--cppflags`" LDFLAGS="`icu-config --ldflags`"  But I got the error


sqlite3.c:184184:10: fatal error: 'unicode/utypes.h' file not found
#include 


Then I added -I -L switches and if I remembered correct I used brew to install 
icu4c. The compiler command are  these 


qiulangs-MacBook-Pro:sqlite-autoconf-3250100 qiulang$ make
/bin/sh ./libtool  --tag=CC   --mode=compile gcc -DPACKAGE_NAME=\"sqlite\" 
-DPACKAGE_TARNAME=\"sqlite\" -DPACKAGE_VERSION=\"3.25.1\" 
-DPACKAGE_STRING=\"sqlite\ 3.25.1\" 
-DPACKAGE_BUGREPORT=\"http://www.sqlite.org\; -DPACKAGE_URL=\"\" 
-DPACKAGE=\"sqlite\" -DVERSION=\"3.25.1\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_FDATASYNC=1 -DHAVE_USLEEP=1 
-DHAVE_LOCALTIME_R=1 -DHAVE_GMTIME_R=1 -DHAVE_DECL_STRERROR_R=1 
-DHAVE_STRERROR_R=1 -DHAVE_EDITLINE_READLINE_H=1 -DHAVE_READLINE_READLINE_H=1 
-DHAVE_READLINE=1 -DHAVE_ZLIB_H=1 -I.-D_REENTRANT=1 -DSQLITE_THREADSAFE=1 
-DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1 
-DSQLITE_ENABLE_RTREE -DSQLITE_HAVE_ZLIB  -I/usr/local/opt/icu4c/include 
-DSQLITE_ENABLE_ICU  -MT sqlite3.lo -MD -MP -MF .deps/sqlite3.Tpo -c -o 
sqlite3.lo sqlite3.c
libtool: compile:  gcc -DPACKAGE_NAME=\"sqlite\" -DPACKAGE_TARNAME=\"sqlite\" 
-DPACKAGE_VERSION=\"3.25.1\" "-DPACKAGE_STRING=\"sqlite 3.25.1\"" 
-DPACKAGE_BUGREPORT=\"http://www.sqlite.org\; -DPACKAGE_URL=\"\" 
-DPACKAGE=\"sqlite\" -DVERSION=\"3.25.1\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_FDATASYNC=1 -DHAVE_USLEEP=1 
-DHAVE_LOCALTIME_R=1 -DHAVE_GMTIME_R=1 -DHAVE_DECL_STRERROR_R=1 
-DHAVE_STRERROR_R=1 -DHAVE_EDITLINE_READLINE_H=1 -DHAVE_READLINE_READLINE_H=1 
-DHAVE_READLINE=1 -DHAVE_ZLIB_H=1 -I. -D_REENTRANT=1 -DSQLITE_THREADSAFE=1 
-DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1 
-DSQLITE_ENABLE_RTREE -DSQLITE_HAVE_ZLIB -I/usr/local/opt/icu4c/include 
-DSQLITE_ENABLE_ICU -MT sqlite3.lo -MD -MP -MF .deps/sqlite3.Tpo -c sqlite3.c  
-fno-common -DPIC -o .libs/sqlite3.o
libtool: compile:  gcc -DPACKAGE_NAME=\"sqlite\" -DPACKAGE_TARNAME=\"sqlite\" 
-DPACKAGE_VERSION=\"3.25.1\" "-DPACKAGE_STRING=\"sqlite 3.25.1\"" 
-DPACKAGE_BUGREPORT=\"http://www.sqlite.org\; -DPACKAGE_URL=\"\" 
-DPACKAGE=\"sqlite\" -DVERSION=\"3.25.1\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
-DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
-DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_FDATASYNC=1 -DHAVE_USLEEP=1 
-DHAVE_LOCALTIME_R=1 -DHAVE_GMTIME_R=1 -DHAVE_DECL_STRERROR_R=1 
-DHAVE_STRERROR_R=1 -DHAVE_EDITLINE_READLINE_H=1 -DHAVE_READLINE_READLINE_H=1 
-DHAVE_READLINE=1 -DHAVE_ZLIB_H=1 -I. -D_REENTRANT=1 -DSQLITE_THREADSAFE=1 
-DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1 
-DSQLITE_ENABLE_RTREE -DSQLITE_HAVE_ZLIB -I/usr/local/opt/icu4c/include 
-DSQLITE_ENABLE_ICU -MT sqlite3.lo -MD -MP -MF .deps/sqlite3.Tpo -c sqlite3.c 
-o sqlite3.o >/dev/null 2>&1
mv -f .deps/sqlite3.Tpo .deps/sqlite3.Plo
/bin/sh ./libtool  --tag=CC   --mode=link gcc -D_REENTRANT=1 
-DSQLITE_THREADSAFE=1 -DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_FTS5 
-DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_RTREE -DSQLITE_HAVE_ZLIB  
-I/usr/local/opt/icu4c/include -DSQLITE_ENABLE_ICU  -no-undefined -version-info 
8:6:8 -L/usr/local/opt/icu4c/lib  -o libsqlite3.la -rpath /usr/local/lib 
sqlite3.lo  -lz
libtool: link: gcc -dynamiclib  -o .libs/libsqlite3.0.dylib  .libs/sqlite3.o   
-L/usr/local/opt/icu4c/lib -lz-install_name  
/usr/local/lib/libsqlite3.0.dylib -compatibility_version 9 -current_version 9.6 
-Wl,-single_module
Undefined symbols for architecture x86_64:
  "_u_errorName_62", referenced from:
  _icuFunctionError in sqlite3.o
  "_u_foldCase_62", referenced from:
...





At 2018-09-21 21:52:30, "Dan Kennedy"  wrote:
>On 09/21/2018 05:21 PM, 邱朗 wrote:
>> Hi,
>>
>> Thanks for replying my question. Following are the error I got when 
>> compiling sqlite-autoconf-3250100.tar.gz . The error looks similar to this 
>> old discussion
>> http://sqlite.1065341.n5.nabble.com/compiling-Sqlite-with-ICU-td40641.html
>>
>>
>> I am using macOS 10.13 & xcode 10
>
>The text below is just the error. If you post the compiler commands that 
>appear before it in the build log somebody might be able to spot the 
>problem.
>
> From the error message, it may be that you have mismatched ICU header 
>and library files, or it may be that not all required ICU libraries are 
>being linked. If you remove the -I... and -L... switches from your 
>command line does it make any difference?
>
>Dan.
>

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Dan Kennedy

On 09/21/2018 05:21 PM, 邱朗 wrote:

Hi,

Thanks for replying my question. Following are the error I got when compiling 
sqlite-autoconf-3250100.tar.gz . The error looks similar to this old discussion
http://sqlite.1065341.n5.nabble.com/compiling-Sqlite-with-ICU-td40641.html


I am using macOS 10.13 & xcode 10


The text below is just the error. If you post the compiler commands that 
appear before it in the build log somebody might be able to spot the 
problem.


From the error message, it may be that you have mismatched ICU header 
and library files, or it may be that not all required ICU libraries are 
being linked. If you remove the -I... and -L... switches from your 
command line does it make any difference?


Dan.






Undefined symbols for architecture x86_64:
  "_u_errorName_62", referenced from:
  _icuFunctionError in sqlite3.o
  "_u_foldCase_62", referenced from:
  _icuOpen in sqlite3.o
  _icuLikeCompare in sqlite3.o
  "_u_isspace_62", referenced from:
  _icuNext in sqlite3.o
  "_u_strToLower_62", referenced from:
  _icuCaseFunc16 in sqlite3.o
  "_u_strToUTF8_62", referenced from:
  _icuNext in sqlite3.o
  "_u_strToUpper_62", referenced from:
  _icuCaseFunc16 in sqlite3.o
  "_ubrk_close_62", referenced from:
  _icuClose in sqlite3.o
  "_ubrk_current_62", referenced from:
  _icuNext in sqlite3.o
  "_ubrk_first_62", referenced from:
  _icuOpen in sqlite3.o
  "_ubrk_next_62", referenced from:
  _icuNext in sqlite3.o
  "_ubrk_open_62", referenced from:
  _icuOpen in sqlite3.o
  "_ucol_close_62", referenced from:
  _icuLoadCollation in sqlite3.o
  _icuCollationDel in sqlite3.o
  "_ucol_open_62", referenced from:
  _icuLoadCollation in sqlite3.o
  "_ucol_strcoll_62", referenced from:
  _icuCollationColl in sqlite3.o
  "_uregex_close_62", referenced from:
  _icuRegexpDelete in sqlite3.o
  "_uregex_matches_62", referenced from:
  _icuRegexpFunc in sqlite3.o
  "_uregex_open_62", referenced from:
  _icuRegexpFunc in sqlite3.o
  "_uregex_setText_62", referenced from:
  _icuRegexpFunc in sqlite3.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [libsqlite3.la] Error 1




在 2018-09-21 17:32:38,"Dan Kennedy"  写道:

On 09/21/2018 01:38 PM, 邱朗 wrote:


I think it could be made to work, or at least, I have experience
making it work with CJK based on functionality exposed via ICU. I
don't know if the unicode tokenizer uses ICU or if the functionality
in ICU that I used is available in the unicode tables. Not
understanding any of the languages represented by CJK, I can't say
with any confidence how good my solution was, but it seemed to be good
enough for the use case of my management and customers in the impacted
regions.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


I am Chinese and I know a little bit of Korean, I can help to test your product 
:D  All Jokes aside I also tried to build an ICU SQlite macOS version but I 
failed. All the document I googled seem outdated. e.g. I used this (and other 
solutions) but I just can not build a macOS version. Do you have any experience 
for that ?


./configure CFLAGS="-I/usr/local/opt/icu4c/include -DSQLITE_ENABLE_ICU `icu-config 
--cppflags`" LDFLAGS="-L/usr/local/opt/icu4c/lib `icu-config --ldflags`"


Can you post the complete output of the failed build attempt? Thanks.

Dan.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 min_word_size patch small error

2018-09-21 Thread Domingo Alvarez Duarte

Hello !

On my last post about a patch to fts5 to add an option "min_word_size" 
there is a small mistake on the comparison:


Original with mistake:

if(p->nMinWordSize && p->nMinWordSize >= wsz) continue;

New with mistake fixed (it should be ">" instead of ">="):

if(p->nMinWordSize && p->nMinWordSize > wsz) continue;


Cheers !

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 min_word_size patch

2018-09-21 Thread Domingo Alvarez Duarte

Hello !

After reporting here previously about this issue I've got a working 
implementation of "min_word_size" option to Unicode61Tokenizer see patch 
bellow.


With it here is the result of a simple test:



./sqlite3
SQLite version 3.26.0 2018-09-20 20:43:28
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create virtual table tfts using fts5(data, tokenize = 'unicode61 
min_word_size 3');
sqlite> create virtual table if not exists tfts_vocab_row USING 
fts5vocab('tfts', 'row');
sqlite> insert into tfts(data) values('A new way to tokenize using fts5 
from sqlite, we can discard n letters word');

sqlite> select * from tfts_vocab_row;
discard|1|1
from|1|1
fts5|1|1
letters|1|1
sqlite|1|1
tokenize|1|1
using|1|1
word|1|1





fossil diff fts5_tokenize.c
Index: ext/fts5/fts5_tokenize.c
==
--- ext/fts5/fts5_tokenize.c
+++ ext/fts5/fts5_tokenize.c
@@ -233,10 +233,11 @@
 struct Unicode61Tokenizer {
   unsigned char aTokenChar[128];  /* ASCII range token characters */
   char *aFold;    /* Buffer to fold text into */
   int nFold;  /* Size of aFold[] in bytes */
   int bRemoveDiacritic;   /* True if remove_diacritics=1 is set */
+  int nMinWordSize;   /* Min size of a word to be indexed */
   int nException;
   int *aiException;

   unsigned char aCategory[32];    /* True for token char categories */
 };
@@ -360,10 +361,11 @@
   const char *zCat = "L* N* Co";
   int i;
   memset(p, 0, sizeof(Unicode61Tokenizer));

   p->bRemoveDiacritic = 1;
+  p->nMinWordSize = 0;
   p->nFold = 64;
   p->aFold = sqlite3_malloc(p->nFold * sizeof(char));
   if( p->aFold==0 ){
 rc = SQLITE_NOMEM;
   }
@@ -393,10 +395,14 @@
 if( 0==sqlite3_stricmp(azArg[i], "separators") ){
   rc = fts5UnicodeAddExceptions(p, zArg, 0);
 }else
 if( 0==sqlite3_stricmp(azArg[i], "categories") ){
   /* no-op */
+    }else
+    if( 0==sqlite3_stricmp(azArg[i], "min_word_size") ){
+  int mwsz;
+  if( sqlite3GetInt32(zArg, ) ) p->nMinWordSize = mwsz;
 }else{
   rc = SQLITE_ERROR;
 }
   }

@@ -450,10 +456,11 @@
   while( rc==SQLITE_OK ){
 int iCode;    /* non-ASCII codepoint read from 
input */

 char *zOut = aFold;
 int is;
 int ie;
+    int wsz;

 /* Skip any separator characters. */
 while( 1 ){
   if( zCsr>=zTerm ) goto tokenize_done;
   if( *zCsr & 0x80 ) {
@@ -517,12 +524,15 @@
 zCsr++;
   }
   ie = zCsr - (unsigned char*)pText;
 }

+    wsz = zOut-aFold;
+    /* Check min word size */
+    if(p->nMinWordSize && p->nMinWordSize >= wsz) continue;
 /* Invoke the token callback */
-    rc = xToken(pCtx, 0, aFold, zOut-aFold, is, ie);
+    rc = xToken(pCtx, 0, aFold, wsz, is, ie);
   }

  tokenize_done:
   if( rc==SQLITE_DONE ) rc = SQLITE_OK;
   return rc;



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 minimum number of characters to index ?

2018-09-21 Thread Domingo Alvarez Duarte

Hello !

I'm looking in the documentation and it doesn't seem to mention any 
option to specify a minimum number of characters to index, looking at 
some fts5 tables it seems that an option to limit the minimum number of 
characters to at least 2 or 3 would be a good shot as stopwords, another 
interest option would be a regex like black/white list of sequence of 
characters to be indexed.


Something like:

create virtual table if not exists pdfs_fts using fts5(pdf_name 
UNINDEXED, data,


    tokenize = 'unicode61 remove_diacritics 1 min_word_size 3 
word_black_list [\d\.\d\d\w \a\d\d\d] word_white_list [\(\d+\) 
\d\d\.\d\d\d\.\d\d\a]');


The idea is to allow/disallow some specific domain sequences to be 
included/excluded from indexing.


Any idea on how to obtain that ?

Cheers !

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
Hi,

Thanks for replying my question. Following are the error I got when compiling 
sqlite-autoconf-3250100.tar.gz . The error looks similar to this old discussion 
 
http://sqlite.1065341.n5.nabble.com/compiling-Sqlite-with-ICU-td40641.html


I am using macOS 10.13 & xcode 10


Undefined symbols for architecture x86_64:
  "_u_errorName_62", referenced from:
  _icuFunctionError in sqlite3.o
  "_u_foldCase_62", referenced from:
  _icuOpen in sqlite3.o
  _icuLikeCompare in sqlite3.o
  "_u_isspace_62", referenced from:
  _icuNext in sqlite3.o
  "_u_strToLower_62", referenced from:
  _icuCaseFunc16 in sqlite3.o
  "_u_strToUTF8_62", referenced from:
  _icuNext in sqlite3.o
  "_u_strToUpper_62", referenced from:
  _icuCaseFunc16 in sqlite3.o
  "_ubrk_close_62", referenced from:
  _icuClose in sqlite3.o
  "_ubrk_current_62", referenced from:
  _icuNext in sqlite3.o
  "_ubrk_first_62", referenced from:
  _icuOpen in sqlite3.o
  "_ubrk_next_62", referenced from:
  _icuNext in sqlite3.o
  "_ubrk_open_62", referenced from:
  _icuOpen in sqlite3.o
  "_ucol_close_62", referenced from:
  _icuLoadCollation in sqlite3.o
  _icuCollationDel in sqlite3.o
  "_ucol_open_62", referenced from:
  _icuLoadCollation in sqlite3.o
  "_ucol_strcoll_62", referenced from:
  _icuCollationColl in sqlite3.o
  "_uregex_close_62", referenced from:
  _icuRegexpDelete in sqlite3.o
  "_uregex_matches_62", referenced from:
  _icuRegexpFunc in sqlite3.o
  "_uregex_open_62", referenced from:
  _icuRegexpFunc in sqlite3.o
  "_uregex_setText_62", referenced from:
  _icuRegexpFunc in sqlite3.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [libsqlite3.la] Error 1




在 2018-09-21 17:32:38,"Dan Kennedy"  写道:
>On 09/21/2018 01:38 PM, 邱朗 wrote:
>>>
>>> I think it could be made to work, or at least, I have experience
>>> making it work with CJK based on functionality exposed via ICU. I
>>> don't know if the unicode tokenizer uses ICU or if the functionality
>>> in ICU that I used is available in the unicode tables. Not
>>> understanding any of the languages represented by CJK, I can't say
>>> with any confidence how good my solution was, but it seemed to be good
>>> enough for the use case of my management and customers in the impacted
>>> regions.
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>> I am Chinese and I know a little bit of Korean, I can help to test your 
>> product :D  All Jokes aside I also tried to build an ICU SQlite macOS 
>> version but I failed. All the document I googled seem outdated. e.g. I used 
>> this (and other solutions) but I just can not build a macOS version. Do you 
>> have any experience for that ?
>>
>>
>> ./configure CFLAGS="-I/usr/local/opt/icu4c/include -DSQLITE_ENABLE_ICU 
>> `icu-config --cppflags`" LDFLAGS="-L/usr/local/opt/icu4c/lib `icu-config 
>> --ldflags`"
>
>Can you post the complete output of the failed build attempt? Thanks.
>
>Dan.
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Dan Kennedy

On 09/21/2018 01:38 PM, 邱朗 wrote:


I think it could be made to work, or at least, I have experience
making it work with CJK based on functionality exposed via ICU. I
don't know if the unicode tokenizer uses ICU or if the functionality
in ICU that I used is available in the unicode tables. Not
understanding any of the languages represented by CJK, I can't say
with any confidence how good my solution was, but it seemed to be good
enough for the use case of my management and customers in the impacted
regions.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


I am Chinese and I know a little bit of Korean, I can help to test your product 
:D  All Jokes aside I also tried to build an ICU SQlite macOS version but I 
failed. All the document I googled seem outdated. e.g. I used this (and other 
solutions) but I just can not build a macOS version. Do you have any experience 
for that ?


./configure CFLAGS="-I/usr/local/opt/icu4c/include -DSQLITE_ENABLE_ICU `icu-config 
--cppflags`" LDFLAGS="-L/usr/local/opt/icu4c/lib `icu-config --ldflags`"





Can you post the complete output of the failed build attempt? Thanks.

Dan.






Thanks,
Qiulang
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Scott Robison
On Fri, Sep 21, 2018 at 12:39 AM 邱朗  wrote:
>
> >I think it could be made to work, or at least, I have experience
> >making it work with CJK based on functionality exposed via ICU. I
> >don't know if the unicode tokenizer uses ICU or if the functionality
> >in ICU that I used is available in the unicode tables. Not
> >understanding any of the languages represented by CJK, I can't say
> >with any confidence how good my solution was, but it seemed to be good
> >enough for the use case of my management and customers in the impacted
> >regions.
>
> I am Chinese and I know a little bit of Korean, I can help to test your 
> product :D  All Jokes aside I also tried to build an ICU SQlite macOS version 
> but I failed. All the document I googled seem outdated. e.g. I used this (and 
> other solutions) but I just can not build a macOS version. Do you have any 
> experience for that ?
>
>
> ./configure CFLAGS="-I/usr/local/opt/icu4c/include -DSQLITE_ENABLE_ICU 
> `icu-config --cppflags`" LDFLAGS="-L/usr/local/opt/icu4c/lib `icu-config 
> --ldflags`"

I do not have macOS experience, and I don't have a lot of free time
either. Mainly I was trying to explain what I had done in the past and
what I used. Hopefully someone else will chime in with more details
about the Unicode61 tokenizer and whether it is standalone or depends
on ICU.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
>
>I think it could be made to work, or at least, I have experience
>making it work with CJK based on functionality exposed via ICU. I
>don't know if the unicode tokenizer uses ICU or if the functionality
>in ICU that I used is available in the unicode tables. Not
>understanding any of the languages represented by CJK, I can't say
>with any confidence how good my solution was, but it seemed to be good
>enough for the use case of my management and customers in the impacted
>regions.
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

I am Chinese and I know a little bit of Korean, I can help to test your product 
:D  All Jokes aside I also tried to build an ICU SQlite macOS version but I 
failed. All the document I googled seem outdated. e.g. I used this (and other 
solutions) but I just can not build a macOS version. Do you have any experience 
for that ?


./configure CFLAGS="-I/usr/local/opt/icu4c/include -DSQLITE_ENABLE_ICU 
`icu-config --cppflags`" LDFLAGS="-L/usr/local/opt/icu4c/lib `icu-config 
--ldflags`"


Thanks,
Qiulang
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Scott Robison
On Fri, Sep 21, 2018 at 12:02 AM 邱朗  wrote:
>
> https://www.sqlite.org/fts5.html said " The unicode tokenizer classifies all 
> unicode characters as either "separator" or "token" characters. By default 
> all space and punctuation characters, as defined by Unicode 6.1, are 
> considered separators, and all other characters as token characters... "  I 
> really doubt unicode tokenizer requires white space, that is ascii tokenizer.

Forgive my imprecise use of language. I should have said separators
instead of whitespace. Regardless, CJK uses implicit separation
between words, and that description seems to indicate that the unicode
tokenizer expects explicit separators (be they whitespace or
punctuation or something else) between tokens.

> That was why I thought it might work for CJK.

I think it could be made to work, or at least, I have experience
making it work with CJK based on functionality exposed via ICU. I
don't know if the unicode tokenizer uses ICU or if the functionality
in ICU that I used is available in the unicode tables. Not
understanding any of the languages represented by CJK, I can't say
with any confidence how good my solution was, but it seemed to be good
enough for the use case of my management and customers in the impacted
regions.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
https://www.sqlite.org/fts5.html said " The unicode tokenizer classifies all 
unicode characters as either "separator" or "token" characters. By default all 
space and punctuation characters, as defined by Unicode 6.1, are considered 
separators, and all other characters as token characters... "  I really doubt 
unicode tokenizer requires white space, that is ascii tokenizer.


That was why I thought it might work for CJK. 


Qiulang


At 2018-09-21 13:03:54, "Scott Robison"  wrote:
>On Thu, Sep 20, 2018, 8:21 PM 邱朗  wrote:
>
>> Hi,
>> I had thought Unicode61 Tokenizer can support CJK -- Chinese Japanese
>> Korean I verify my sqlite supports fts5
>>
>> {snipped}
>>
>> But to my surprise it can't find any CJK word at all. Why is that ?
>
>
>Based on my experience with such things, I suspect that the tokenizer
>requires whitespace between adjacent words, which is not the case with CJK.
>Word breaks are implicit, not explicit.
>
>Is the Unicode61 tokenizer based on ICU? I had to implement an algorithm
>for software at work that used functionality from ICU to find CJK word
>boundaries, so I believe it is possible, just not as straightforward as
>whitespace delimited words.
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-20 Thread Scott Robison
On Thu, Sep 20, 2018, 8:21 PM 邱朗  wrote:

> Hi,
> I had thought Unicode61 Tokenizer can support CJK -- Chinese Japanese
> Korean I verify my sqlite supports fts5
>
> {snipped}
>
> But to my surprise it can't find any CJK word at all. Why is that ?


Based on my experience with such things, I suspect that the tokenizer
requires whitespace between adjacent words, which is not the case with CJK.
Word breaks are implicit, not explicit.

Is the Unicode61 tokenizer based on ICU? I had to implement an algorithm
for software at work that used functionality from ICU to find CJK word
boundaries, so I believe it is possible, just not as straightforward as
whitespace delimited words.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-20 Thread 邱朗
Hi,
I had thought Unicode61 Tokenizer can support CJK -- Chinese Japanese Korean I 
verify my sqlite supports fts5


sqlite> pragma compile_options;
BUG_COMPATIBLE_20160819
COMPILER=clang-9.0.0
DEFAULT_CACHE_SIZE=2000
DEFAULT_CKPTFULLFSYNC
DEFAULT_JOURNAL_SIZE_LIMIT=32768
DEFAULT_PAGE_SIZE=4096
DEFAULT_SYNCHRONOUS=2
DEFAULT_WAL_SYNCHRONOUS=1
ENABLE_API_ARMOR
ENABLE_COLUMN_METADATA
ENABLE_DBSTAT_VTAB
ENABLE_FTS3
ENABLE_FTS3_PARENTHESIS
ENABLE_FTS3_TOKENIZER
ENABLE_FTS4
ENABLE_FTS5


sqlite> CREATE VIRTUAL TABLE ft5_test USING fts5(content, tokenize = 'porter 
unicode61 remove_diacritics 1');
sqlite> INSERT INTO ft5_test values('为什么不支持中文 fts5 does not seem to work for 
chinese');
sqlite> select * from ft5_test where ft5_test = '中文';
sqlite>
sqlite> select * from ft5_test where ft5_test = 'Chinese';
为什么不支持中文 fts5 does not seem to work for chinese


But to my surprise it can't find any CJK word at all. Why is that ? 
Thanks,
Qiulang
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 MATCH using "string*" matches just the exact string

2018-09-11 Thread Dan Kennedy

On 09/11/2018 01:26 PM, Maziar Parsijani wrote:

Thanks for your answer Eric,Sandu
About the first idea of Eric I have to say its correct for me when I do a
test with another fts5 table with no huge rows but my database has that
problem.And about the other possibility I tested that before to set "*"at
the first and end of the match but I have got an error.Here if there is no
problem I will attach an example Arabic text database  if you search for
'أعلم*' it must find 83 rows but it will find 49 which is the same with
"أعلم" results.and if you put "*"at the both sides you will get error.I
don't know if I did something wrong during creation but this is all I have
got.


The "*" may only occur following a string in an fts5 expression, not before:

  https://sqlite.org/fts5.html#fts5_prefix_queries

If possible, can you trim the database down to just two rows that 
demonstrate the problem and upload it somewhere? i.e. so that it 
contains two rows that should both be matched by 'أعلم*', but for which 
the current FTS5 only matches one.


Even if the complete database makes the problem obvious to Arabic 
readers, it will be quite difficult for non-readers to deal with.


Thanks,
Dan.







On Tue, Sep 11, 2018 at 10:20 AM Sandu Buraga 
wrote:


In Arabic you have a different direction of the text. Might be that the
reason?

Sandu

Eric Minbiole  schrieb am Di., 11. Sep. 2018, 00:23:


SELECT rowid,text
FROM table
WHERE table MATCH 'أعلم*';

And I have to add this that my data is Arabic text.

This method must find words that contains 'أعلم' but it doesn't. What
should I do now.



I just tried this in SQLite version 3.24.0 on a Mac, and it seems to

work.

Please see my example below. In particular, the first select doesn't

have a

*, so it doesn't find the word. The second example includes a trailing *,
so it performs a "starts with" search, and does find the match. (Note

that

I simply doubled the letters of your original arabic text to make a

longer

word.)

MacBook-II:Programs eric$ ./sqlite3

SQLite version 3.24.0 2018-06-04 19:24:41
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE VIRTUAL TABLE fts USING fts5(doc);
sqlite> INSERT INTO fts (rowid, doc) VALUES (1, 'english text');
sqlite> INSERT INTO fts (rowid, doc) VALUES (2, 'arabic أعلمأعلم');
sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم';
sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم*';
2
sqlite>



One possible explanation: You mentioned that you want a "contains"

search.

However, the trailing asterisk in your example only results in a "begins
with" search. If you really want a "contains" search, you'll need to put
the * at both the beginning and end of the match word. E.g., "WHERE fts
MATCH '*asdf*';

Hope this is helpful.

~Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 MATCH using "string*" matches just the exact string

2018-09-11 Thread Maziar Parsijani
Thanks for your answer Eric,Sandu
About the first idea of Eric I have to say its correct for me when I do a
test with another fts5 table with no huge rows but my database has that
problem.And about the other possibility I tested that before to set "*"at
the first and end of the match but I have got an error.Here if there is no
problem I will attach an example Arabic text database  if you search for
'أعلم*' it must find 83 rows but it will find 49 which is the same with
"أعلم" results.and if you put "*"at the both sides you will get error.I
don't know if I did something wrong during creation but this is all I have
got.

On Tue, Sep 11, 2018 at 10:20 AM Sandu Buraga 
wrote:

> In Arabic you have a different direction of the text. Might be that the
> reason?
>
> Sandu
>
> Eric Minbiole  schrieb am Di., 11. Sep. 2018, 00:23:
>
> > > SELECT rowid,text
> > > FROM table
> > > WHERE table MATCH 'أعلم*';
> > >
> > > And I have to add this that my data is Arabic text.
> > >
> > > This method must find words that contains 'أعلم' but it doesn't. What
> > > should I do now.
> > >
> > >
> > I just tried this in SQLite version 3.24.0 on a Mac, and it seems to
> work.
> > Please see my example below. In particular, the first select doesn't
> have a
> > *, so it doesn't find the word. The second example includes a trailing *,
> > so it performs a "starts with" search, and does find the match. (Note
> that
> > I simply doubled the letters of your original arabic text to make a
> longer
> > word.)
> >
> > MacBook-II:Programs eric$ ./sqlite3
> > > SQLite version 3.24.0 2018-06-04 19:24:41
> > > Enter ".help" for usage hints.
> > > Connected to a transient in-memory database.
> > > Use ".open FILENAME" to reopen on a persistent database.
> > > sqlite> CREATE VIRTUAL TABLE fts USING fts5(doc);
> > > sqlite> INSERT INTO fts (rowid, doc) VALUES (1, 'english text');
> > > sqlite> INSERT INTO fts (rowid, doc) VALUES (2, 'arabic أعلمأعلم');
> > > sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم';
> > > sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم*';
> > > 2
> > > sqlite>
> > >
> >
> > One possible explanation: You mentioned that you want a "contains"
> search.
> > However, the trailing asterisk in your example only results in a "begins
> > with" search. If you really want a "contains" search, you'll need to put
> > the * at both the beginning and end of the match word. E.g., "WHERE fts
> > MATCH '*asdf*';
> >
> > Hope this is helpful.
> >
> > ~Eric
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 MATCH using "string*" matches just the exact string

2018-09-10 Thread Sandu Buraga
In Arabic you have a different direction of the text. Might be that the
reason?

Sandu

Eric Minbiole  schrieb am Di., 11. Sep. 2018, 00:23:

> > SELECT rowid,text
> > FROM table
> > WHERE table MATCH 'أعلم*';
> >
> > And I have to add this that my data is Arabic text.
> >
> > This method must find words that contains 'أعلم' but it doesn't. What
> > should I do now.
> >
> >
> I just tried this in SQLite version 3.24.0 on a Mac, and it seems to work.
> Please see my example below. In particular, the first select doesn't have a
> *, so it doesn't find the word. The second example includes a trailing *,
> so it performs a "starts with" search, and does find the match. (Note that
> I simply doubled the letters of your original arabic text to make a longer
> word.)
>
> MacBook-II:Programs eric$ ./sqlite3
> > SQLite version 3.24.0 2018-06-04 19:24:41
> > Enter ".help" for usage hints.
> > Connected to a transient in-memory database.
> > Use ".open FILENAME" to reopen on a persistent database.
> > sqlite> CREATE VIRTUAL TABLE fts USING fts5(doc);
> > sqlite> INSERT INTO fts (rowid, doc) VALUES (1, 'english text');
> > sqlite> INSERT INTO fts (rowid, doc) VALUES (2, 'arabic أعلمأعلم');
> > sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم';
> > sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم*';
> > 2
> > sqlite>
> >
>
> One possible explanation: You mentioned that you want a "contains" search.
> However, the trailing asterisk in your example only results in a "begins
> with" search. If you really want a "contains" search, you'll need to put
> the * at both the beginning and end of the match word. E.g., "WHERE fts
> MATCH '*asdf*';
>
> Hope this is helpful.
>
> ~Eric
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 MATCH using "string*" matches just the exact string

2018-09-10 Thread Eric Minbiole
> SELECT rowid,text
> FROM table
> WHERE table MATCH 'أعلم*';
>
> And I have to add this that my data is Arabic text.
>
> This method must find words that contains 'أعلم' but it doesn't. What
> should I do now.
>
>
I just tried this in SQLite version 3.24.0 on a Mac, and it seems to work.
Please see my example below. In particular, the first select doesn't have a
*, so it doesn't find the word. The second example includes a trailing *,
so it performs a "starts with" search, and does find the match. (Note that
I simply doubled the letters of your original arabic text to make a longer
word.)

MacBook-II:Programs eric$ ./sqlite3
> SQLite version 3.24.0 2018-06-04 19:24:41
> Enter ".help" for usage hints.
> Connected to a transient in-memory database.
> Use ".open FILENAME" to reopen on a persistent database.
> sqlite> CREATE VIRTUAL TABLE fts USING fts5(doc);
> sqlite> INSERT INTO fts (rowid, doc) VALUES (1, 'english text');
> sqlite> INSERT INTO fts (rowid, doc) VALUES (2, 'arabic أعلمأعلم');
> sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم';
> sqlite> SELECT rowid FROM fts WHERE fts MATCH 'أعلم*';
> 2
> sqlite>
>

One possible explanation: You mentioned that you want a "contains" search.
However, the trailing asterisk in your example only results in a "begins
with" search. If you really want a "contains" search, you'll need to put
the * at both the beginning and end of the match word. E.g., "WHERE fts
MATCH '*asdf*';

Hope this is helpful.

~Eric
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 MATCH using "string*" matches just the exact string

2018-09-10 Thread Maziar Parsijani
Hi
I am new here and I am not an expert on sqlite my question is about match
option that I have read from https://sqlite.org/fts5.html but the problem
is when I want to not find the exact word so I use :

SELECT rowid,text

FROM table

WHERE table MATCH 'أعلم*';

And I have to add this that my data is Arabic text.

This method must find words that contains 'أعلم' but it doesn't. What
should I do now.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] fts5 giving results for substring searches for Hindi content.

2018-02-05 Thread Dan Kennedy

On 02/04/2018 11:39 AM, raj Singla wrote:

Hi,

-- create fts4 and fts5 tables
create virtual table idx4 using "fts4" (content);
create virtual table idx5 using "fts5" (content);
-- insert 1 sample rows into eachinsert into idx4 (content) values
('नीरजा भनोट के कातिल पाकिस्तान की जेल में थे, फिर वे एफबीआई
आए?');insert into idx5 (content) values ('नीरजा भनोट के कातिल
पाकिस्तान की जेल में थे, फिर वे एफबीआई आए?');
-- query index using complete and partial stringsselect * from idx4
where idx4 match 'पाकिस्तान';-- returns नीरजा भनोट के कातिल पाकिस्तान
की जेल में थे, फिर वे एफबीआई आए?
select * from idx4 where idx4 match 'पाकि';-- no results returned
select * from idx5 where idx5 match 'पाकिस्तान';-- returns नीरजा भनोट
के कातिल पाकिस्तान की जेल में थे, फिर वे एफबीआई आए?
select * from idx5 where idx5 match 'पाकि';-- returns नीरजा भनोट के
कातिल पाकिस्तान की जेल में थे, फिर वे एफबीआई आए?


fts5 giving results for substring searches for Hindi content.
Is this expected behavior.
Please if you can provide more insights on this. Maybe this is just an
experimental feature.


By default, FTS5 uses a unicode tokenizer based on data extracted from 
reference file "UnicodeData.txt":


http://www.unicode.org/Public/6.1.0/ucd/UnicodeData.txt

Which divides the characters into categories:

  http://www.fileformat.info/info/unicode/category/index.htm

FTS5 considers categories "Co", "L*" and "N*" to be token characters and 
all others to be separator characters (handled in the same way as spaces).


The string "पाकिस्तान" contains 9 characters, 3 of which are from the 
"Mn" and "Mc" categories, specifically 0x93E, 0x93F, 0x94D and 0x93E. 
According to UnicodeData.txt, these characters are:


  093E;DEVANAGARI VOWEL SIGN AA;Mc;
  093F;DEVANAGARI VOWEL SIGN I;Mc;
  094D;DEVANAGARI SIGN VIRAMA;Mn;

And so the string is being split into several (actually 5 - as there are 
two instances of 0x93E) different words. Given your report, I'm guessing 
that is not what people expect. Can you, or any other Hindi speaker, 
confirm that "पाकिस्तान" should be treated as a single word by FTS5? And 
not broken into several different words?


Dan.














Thank You,
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] fts5 giving results for substring searches for Hindi content.

2018-02-04 Thread Clemens Ladisch
raj Singla wrote:
> create virtual table idx4 using "fts4" (content);
> create virtual table idx5 using "fts5" (content);
> ...
> select * from idx4 where idx4 match 'पाकि';-- no results returned
> select * from idx5 where idx5 match 'पाकि';-- returns नीरजा भनोट के

FTS4 and FTS5 have different defaults for the tokenizer:
http://www.sqlite.org/fts3.html#tokenizer
http://www.sqlite.org/fts5.html#tokenizers


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] fts5 giving results for substring searches for Hindi content.

2018-02-04 Thread raj Singla
Hi,

-- create fts4 and fts5 tables
create virtual table idx4 using "fts4" (content);
create virtual table idx5 using "fts5" (content);
-- insert 1 sample rows into eachinsert into idx4 (content) values
('नीरजा भनोट के कातिल पाकिस्तान की जेल में थे, फिर वे एफबीआई
आए?');insert into idx5 (content) values ('नीरजा भनोट के कातिल
पाकिस्तान की जेल में थे, फिर वे एफबीआई आए?');
-- query index using complete and partial stringsselect * from idx4
where idx4 match 'पाकिस्तान';-- returns नीरजा भनोट के कातिल पाकिस्तान
की जेल में थे, फिर वे एफबीआई आए?
select * from idx4 where idx4 match 'पाकि';-- no results returned
select * from idx5 where idx5 match 'पाकिस्तान';-- returns नीरजा भनोट
के कातिल पाकिस्तान की जेल में थे, फिर वे एफबीआई आए?
select * from idx5 where idx5 match 'पाकि';-- returns नीरजा भनोट के
कातिल पाकिस्तान की जेल में थे, फिर वे एफबीआई आए?


fts5 giving results for substring searches for Hindi content.
Is this expected behavior.
Please if you can provide more insights on this. Maybe this is just an
experimental feature.

Thank You,
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 crash with NULL in snippet() function's 5th parameter

2017-12-26 Thread Dan Kennedy

On 12/26/2017 03:16 AM, Tony Papadimitriou wrote:

(SQLite v3.21.0 with FTS5 enabled)

If instead of ‘’ (empty string) one uses NULL for the 5th parameter in the 
snippet() function it consistently crashes.


Thanks for reporting this. Now fixed (so that NULL is equivalent to an 
empty string) here:


  http://www.sqlite.org/src/info/553a3ad32498ddda

Dan.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 crash with NULL in snippet() function's 5th parameter

2017-12-25 Thread Tony Papadimitriou
(SQLite v3.21.0 with FTS5 enabled)

If instead of ‘’ (empty string) one uses NULL for the 5th parameter in the 
snippet() function it consistently crashes.

Thanks.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 performance problem.

2017-05-28 Thread Clemens Ladisch
John Found wrote:
> Pseudocode: ?1 = keyword + ' ThreadID:' + num2str(threadID);
>
> select count() from PostFTS where PostFTS match ?1;

That works.

> ?1 = keyword
> ?2 = threadID
> select count() from PostFTS where PostFTS match ?1 ThreadID:?2;

That is not valid SQL.  You have to create a single SQL string:

  select count(*) from PostFTS where PostFTS match ?1 || ' ThreadID:' || ?2;


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 performance problem.

2017-05-28 Thread John Found
On Sun, 28 May 2017 14:23:47 +0200
Clemens Ladisch  wrote:

> It might be better to use a single FTS lookup for both words:
> 
>   select count() from PostFTS where PostFTS match 'innermost ThreadID:6';
> 

Thanks! This is indeed much faster. But I am passing the search query as a 
parameters. How to proceed then? 

Should I build one composite string query, like this:

Pseudocode: ?1 = keyword + ' ThreadID:' + num2str(threadID);

select count() from PostFTS where PostFTS match ?1;

Or I can make it another way:

?1 = keyword
?2 = threadID
select count() from PostFTS where PostFTS match ?1 ThreadID:?2;

-- 
http://fresh.flatassembler.net
http://asm32.info
John Found 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 performance problem.

2017-05-28 Thread Clemens Ladisch
John Found wrote:
> the execution time increases by a factor of 100
>
> select count() from PostFTS where PostFTS match 'innermost' and ThreadID = 6;

This is the same as:

  select count() from PostFTS where PostFTS match 'innermost' and PostFTS match 
'ThreadID:6';

It might be better to use a single FTS lookup for both words:

  select count() from PostFTS where PostFTS match 'innermost ThreadID:6';


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 performance problem.

2017-05-28 Thread John Found
I have a FTS5 virtual external content table, defined following way:

CREATE VIRTUAL TABLE PostFTS using fts5( ThreadID, Content, content=Posts, 
content_rowid=id, tokenize='porter unicode61 remove_diacritics 1');

The following query executes for 4..5ms:

select count() from PostFTS where PostFTS match 'innermost';

But if I include in the WHERE clause any other condition, the execution time 
increases by a factor of 100 - up to 500..700ms:

select count() from PostFTS where PostFTS match 'innermost' and ThreadID = 6;

Both queries returns a value around 15000.

What I am missing? Or such behavior is normal and expected. 


-- 
http://fresh.flatassembler.net
http://asm32.info
John Found 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 bug in SQLite 3.18.0

2017-05-19 Thread Krzysztof Małysa
When SQLIte is compiled with sanitizers they report invalid readouts and
undefined-behavior:
```

sqlite3.c:188900:5: runtime error: index 11 out of bounds for type 'char
[8]'
SUMMARY: AddressSanitizer: undefined-behavior sqlite3.c:188900:5 in
sqlite3.c:189142:30: runtime error: index 12 out of bounds for type 'char
[8]'
SUMMARY: AddressSanitizer: undefined-behavior sqlite3.c:189142:30 in
sqlite3.c:184801:14: runtime error: null pointer passed as argument 1,
which is declared to never be null
/usr/include/string.h:62:62: note: nonnull attribute specified here
SUMMARY: AddressSanitizer: undefined-behavior sqlite3.c:184801:14 in
```
Compilation command:
```
$ clang 4.c sqlite3.c -o 4 -pthread -ldl -lm -DSQLITE_ENABLE_FTS5
-fsanitize=address -fsanitize=undefined
```


Additional information:
```
$ clang -v
clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/i686-linux-gnu/5.4.0
Found candidate GCC installation: /usr/bin/../lib/gcc/i686-linux-gnu/6.0.0
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/6.0.0
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/5.4.0
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/6.0.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.4.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.0.0
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64


$ uname -a
Linux quasarHP 4.4.0-77-generic #98-Ubuntu SMP Wed Apr 26 08:34:02 UTC 2017
x86_64 x86_64 x86_64 GNU/Linux
```


This behaviour appeared in the earlier versions of SQLite, for sure in


Hope this helps,
Krzysztof Małysa
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 query results different before/after MERGE command.

2017-04-21 Thread Dan Kennedy

On 04/21/2017 02:36 AM, Jonathan Gaillard wrote:

Here is the db https://ufile.io/q0314


Thanks for reporting this. It was a bug in the fts5 prefix query code. 
Now fixed here:


  http://www.sqlite.org/src/info/840042cb2bed2924

Dan.








If you do:

select rowid, name from filesfts where filesfts match '"upload 
file漢_5"*';

you get back:


141| upload file漢_5
142| upload file漢_6
152| upload file漢_5
153| upload file漢_6
163| upload file漢_5
164| upload file漢_6
174| upload file漢_5
175| upload file漢_6
185| upload file漢_5
196| upload file漢_5
207| upload file漢_5
218| upload file漢_5
229| upload file漢_5
240| upload file漢_5


and then if you run:

INSERT INTO FilesFTS(FilesFTS, rank) VALUES('merge', -1);

and the select again, you get:


141| upload file漢_5
152| upload file漢_5
163| upload file漢_5
174| upload file漢_5
185| upload file漢_5
196| upload file漢_5
207| upload file漢_5
218| upload file漢_5
229| upload file漢_5
240| upload file漢_5


I'm using:
sqlite> .version
SQLite 3.15.2 2016-11-28 19:13:37 bbd85d235f7037c6a033a9690534391ffeacecc8

But i've confirmed it also happens on 3.17.0 and 3.18.0



From: Jonathan Gaillard 
Sent: Monday, April 17, 2017 12:00:09 PM
To: sqlite-users@mailinglists.sqlite.org
Subject: FTS5 query results different before/after MERGE command.


I have a db where I have a query returning results from something I believe is 
deleted already from a FTS5 table. This is a normal FTS5 table, not a 
contentless one so I used the normal DELETE.


Only after I run:

INSERT INTO MyFTSTable(MyFTSTable, rank) VALUES('automerge', 0);


INSERT INTO MyFTSTable(MyFTSTable, rank) VALUES('merge', 500);

Does the query then start returning the results without the deleted items.

Is this normal fts behavior?

Thanks!!
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 query results different before/after MERGE command.

2017-04-20 Thread Jonathan Gaillard
Here is the db https://ufile.io/q0314

If you do:

select rowid, name from filesfts where filesfts match '"upload 
file漢_5"*';

you get back:


141| upload file漢_5
142| upload file漢_6
152| upload file漢_5
153| upload file漢_6
163| upload file漢_5
164| upload file漢_6
174| upload file漢_5
175| upload file漢_6
185| upload file漢_5
196| upload file漢_5
207| upload file漢_5
218| upload file漢_5
229| upload file漢_5
240| upload file漢_5


and then if you run:

INSERT INTO FilesFTS(FilesFTS, rank) VALUES('merge', -1);

and the select again, you get:


141| upload file漢_5
152| upload file漢_5
163| upload file漢_5
174| upload file漢_5
185| upload file漢_5
196| upload file漢_5
207| upload file漢_5
218| upload file漢_5
229| upload file漢_5
240| upload file漢_5


I'm using:
sqlite> .version
SQLite 3.15.2 2016-11-28 19:13:37 bbd85d235f7037c6a033a9690534391ffeacecc8

But i've confirmed it also happens on 3.17.0 and 3.18.0



From: Jonathan Gaillard 
Sent: Monday, April 17, 2017 12:00:09 PM
To: sqlite-users@mailinglists.sqlite.org
Subject: FTS5 query results different before/after MERGE command.


I have a db where I have a query returning results from something I believe is 
deleted already from a FTS5 table. This is a normal FTS5 table, not a 
contentless one so I used the normal DELETE.


Only after I run:

INSERT INTO MyFTSTable(MyFTSTable, rank) VALUES('automerge', 0);


INSERT INTO MyFTSTable(MyFTSTable, rank) VALUES('merge', 500);

Does the query then start returning the results without the deleted items.

Is this normal fts behavior?

Thanks!!
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 query results different before/after MERGE command.

2017-04-20 Thread Dan Kennedy

On 04/20/2017 07:11 AM, Jonathan Gaillard wrote:


If you do:

select rowid, name from filesfts where filesfts match '"upload 
file漢_5"*';


This certainly seems like a bug. Are you able to share the database that 
exhibits the problem?


Thanks,
Dan.






you get back:


141| upload file漢_5
142| upload file漢_6
152| upload file漢_5
153| upload file漢_6
163| upload file漢_5
164| upload file漢_6
174| upload file漢_5
175| upload file漢_6
185| upload file漢_5
196| upload file漢_5
207| upload file漢_5
218| upload file漢_5
229| upload file漢_5
240| upload file漢_5


and then if you run:

INSERT INTO FilesFTS(FilesFTS, rank) VALUES('merge', -1);

and the select again, you get:


141| upload file漢_5
152| upload file漢_5
163| upload file漢_5
174| upload file漢_5
185| upload file漢_5
196| upload file漢_5
207| upload file漢_5
218| upload file漢_5
229| upload file漢_5
240| upload file漢_5


I'm using:
sqlite> .version
SQLite 3.15.2 2016-11-28 19:13:37 bbd85d235f7037c6a033a9690534391ffeacecc8

But i've confirmed it also happens on 3.17.0 and 3.18.0



From: Jonathan Gaillard 
Sent: Monday, April 17, 2017 12:00:09 PM
To: sqlite-users@mailinglists.sqlite.org
Subject: FTS5 query results different before/after MERGE command.


I have a db where I have a query returning results from something I believe is 
deleted already from a FTS5 table. This is a normal FTS5 table, not a 
contentless one so I used the normal DELETE.


Only after I run:

INSERT INTO MyFTSTable(MyFTSTable, rank) VALUES('automerge', 0);


INSERT INTO MyFTSTable(MyFTSTable, rank) VALUES('merge', 500);

Does the query then start returning the results without the deleted items.

Is this normal fts behavior?

Thanks!!
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 query results different before/after MERGE command.

2017-04-20 Thread Jonathan Gaillard


If you do:

select rowid, name from filesfts where filesfts match '"upload 
file漢_5"*';

you get back:


141| upload file漢_5
142| upload file漢_6
152| upload file漢_5
153| upload file漢_6
163| upload file漢_5
164| upload file漢_6
174| upload file漢_5
175| upload file漢_6
185| upload file漢_5
196| upload file漢_5
207| upload file漢_5
218| upload file漢_5
229| upload file漢_5
240| upload file漢_5


and then if you run:

INSERT INTO FilesFTS(FilesFTS, rank) VALUES('merge', -1);

and the select again, you get:


141| upload file漢_5
152| upload file漢_5
163| upload file漢_5
174| upload file漢_5
185| upload file漢_5
196| upload file漢_5
207| upload file漢_5
218| upload file漢_5
229| upload file漢_5
240| upload file漢_5


I'm using:
sqlite> .version
SQLite 3.15.2 2016-11-28 19:13:37 bbd85d235f7037c6a033a9690534391ffeacecc8

But i've confirmed it also happens on 3.17.0 and 3.18.0



From: Jonathan Gaillard 
Sent: Monday, April 17, 2017 12:00:09 PM
To: sqlite-users@mailinglists.sqlite.org
Subject: FTS5 query results different before/after MERGE command.


I have a db where I have a query returning results from something I believe is 
deleted already from a FTS5 table. This is a normal FTS5 table, not a 
contentless one so I used the normal DELETE.


Only after I run:

INSERT INTO MyFTSTable(MyFTSTable, rank) VALUES('automerge', 0);


INSERT INTO MyFTSTable(MyFTSTable, rank) VALUES('merge', 500);

Does the query then start returning the results without the deleted items.

Is this normal fts behavior?

Thanks!!
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 query results different before/after MERGE command.

2017-04-17 Thread Dan Kennedy

On 04/18/2017 02:00 AM, Jonathan Gaillard wrote:

I have a db where I have a query returning results from something I believe is 
deleted already from a FTS5 table. This is a normal FTS5 table, not a 
contentless one so I used the normal DELETE.


Only after I run:

INSERT INTO MyFTSTable(MyFTSTable, rank) VALUES('automerge', 0);


INSERT INTO MyFTSTable(MyFTSTable, rank) VALUES('merge', 500);

Does the query then start returning the results without the deleted items.

Is this normal fts behavior?


Doesn't sound right. Can you share a database that demonstrates the problem?

Dan.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 query results different before/after MERGE command.

2017-04-17 Thread Jonathan Gaillard
I have a db where I have a query returning results from something I believe is 
deleted already from a FTS5 table. This is a normal FTS5 table, not a 
contentless one so I used the normal DELETE.


Only after I run:

INSERT INTO MyFTSTable(MyFTSTable, rank) VALUES('automerge', 0);


INSERT INTO MyFTSTable(MyFTSTable, rank) VALUES('merge', 500);

Does the query then start returning the results without the deleted items.

Is this normal fts behavior?

Thanks!!
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 not working

2017-03-15 Thread Dan Kennedy

On 03/15/2017 06:00 AM, Domingo Alvarez Duarte wrote:

Hello !

I just downloaded the 
http://www.sqlite.org/snapshot/sqlite-snapshot-201703062044.tar.gz 
compiled it with fts5 enabled and then tested it with this:


===

CREATE VIRTUAL TABLE email USING fts5(body);
insert into email(body) values('hello over there');
select rowid, body from email where body match 'over';


With FTS5, the left hand side of the match operator must be the table 
name, not a column. So you can do:


  SELECT rowid, body FROM email WHERE email MATCH ('body:over');

or

  SELECT rowid, body FROM email('body:over');

Dan.





===

And get this:

===

sqlite3 < test-fts.sql
Error: near line 3: unable to use function MATCH in the requested context

===

Cheers !

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 not working MY MISTAKE FORGET

2017-03-14 Thread Domingo Alvarez Duarte

Hello !

Sorry by my previous message, it was my mistake when quering the fts5.

I was quering like normal sql referring to a specific column in the 
where clause instead of using the table name.


Cheers !

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 not working

2017-03-14 Thread Domingo Alvarez Duarte

Hello !

I just downloaded the 
http://www.sqlite.org/snapshot/sqlite-snapshot-201703062044.tar.gz 
compiled it with fts5 enabled and then tested it with this:


===

CREATE VIRTUAL TABLE email USING fts5(body);
insert into email(body) values('hello over there');
select rowid, body from email where body match 'over';

===

And get this:

===

sqlite3 < test-fts.sql
Error: near line 3: unable to use function MATCH in the requested context

===

Cheers !

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 pagination

2016-12-30 Thread Jens Alfke
Well, the obvious answer is to use OFFSET and LIMIT … but of course that might 
slow down too much as the page number increases.

Or you could avoid the need for pagination by leaving the statement active 
between page-loads, and just reading another N rows out of it to jump to the 
next page. (Doesn’t help you when jumping backwards, though. You could cache a 
list of rowids to regenerate previous pages from.)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 pagination

2016-12-30 Thread Jonathan Gaillard
Hi, How can I paginate fts5 queries when ordered by rank?

Normailly I use an index on an expression that gives me something to put in a 
WHERE exp > X and then also ORDER BY exp. Exp always has to be deterministic so 
it might be a text field (name or whatever) concatenated to a fixed length text 
id field to resolve matching names.

But how to do this same technique with FTS5? I can't use an expression index 
that includes rank as far as I know. Also ordering on anything but just rank 
seems to build a temp b tree in the explain plan which is too slow.

Is the order by rank already secondary sorted by rowid?
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 ranking across multiple databases

2016-04-28 Thread Matt Hamilton
Hi All,
  I'm trying to work out if there is a way to use/abuse/extend SQLite and FTS5 
to allow me to do a query across two (or more) separate databases and then sort 
the results in a meaningful manner. In order to do this I really need the 
'global' elements of the ranking algorithm (e.g. number of times the term 
appears in the database) to be looked up in both databases first, then 
combined, and then fed into the ranking algorithm of each database.

I see that there is the fts5vocab table that would give me this, but there is 
the issue of stemming and splitting. E.g. if the user supplied the query 'apple 
banana carrot' then I'd need to be able to split and stem that phrase into 
terms before I looked them up in the fts5vocab table. If I could do that, I 
could then use the results from the fts5vocab table and feed them into a 
modified bm25 function.

Does this make sense? Or anyone else see any other way to achieve what I want 
to do?

thanks

-Matt

? 
Matt Hamilton
Quernus
matt at quernus.co.uk 
+44 117 325 3025
64 Easton Business Centre
Felix Road, Easton
Bristol, BS5 0HE

Quernus Ltd is a company registered in England and Wales. Registered number: 
09076246





[sqlite] FTS5 query that to match all rows.

2016-04-14 Thread Dan Kennedy
On 04/13/2016 11:24 PM, John Found wrote:
> What FTS5 query should I use in order to match all rows in the table?
>

Can you use "SELECT * FROM fts_table;"?

Dan.




[sqlite] FTS5 query that to match all rows.

2016-04-13 Thread John Found
On Wed, 13 Apr 2016 19:41:34 +0200
R Smith  wrote:

> Or did you mean you publish an interface of sorts in your application 
> that allows users to run their own queries, but it /has/ to be an "fts5" 
> query in some way, and now sometimes they want to list all rows?  It is 
> quite baffling - but I might simply not understand the situation, 
> perhaps some more information on the setup would help us see the light 
> and help us to offer better suggestions than this :)
> 

Well, OK, but the explanation is not very short. :)

I am using SQLite as a storage back-end for my web forum engine (in assembly 
language):
http://board.asm32.info

I want to use the fts5 in order to provide the search feature for the forum. It 
is actually
implemented and you can test it on the above link.

The searching SQL is the following:

select
  U.nick as UserName,
  U.id as UserID,
  U.avatar as avatar,
  T.slug,
  strftime('%d.%m.%Y %H:%M:%S', P.postTime, 'unixepoch') as PostTime,
  P.ReadCount,
  PostFTS.rowid,
  snippet(PostFTS, 0, '', '', '...', 16) as Content,
  T.Caption,
  (select count() from UnreadPosts UP where UP.UserID = ?4 and UP.PostID = 
PostFTS.rowid) as Unread
from
  PostFTS
left join
  Posts P on P.id = PostFTS.rowid
left join
  Threads T on T.id = P.threadID
left join
  ThreadTags TT on TT.ThreadID = T.id
left join
  Users U on P.userID = U.id
where
  PostFTS match ?1 and ( ?4 is null or T.slug = ?4) and (?5 is null or TT.tag = 
?5)
order by rank
limit ?2
offset ?3

As you can see, besides the MATCH condition, in the WHERE clause, there are two 
more conditions -
the first is for searching in particular thread and the second is for searching 
the posts 
in particular sub-forum (tag).

So, I think it is very natural if the user submit empty query or query 
containing for example "*" (parameter ?1) 
the search engine to return all posts, belonging to the current thread (or tag).

And vice versa, I think it is very unnatural, submitting empty query to get 
nothing as a result.

So, I am searching for the way to make it without changing the whole SQL query. 

P.S. BTW another issue that I can't solve with fts5 is returning posts that 
does not contain some word.
"NOT keyword" does not work at all, neither "* NOT keyword";

-- 
http://fresh.flatassembler.net
http://asm32.info
John Found 


[sqlite] FTS5 query that to match all rows.

2016-04-13 Thread John Found
On Wed, 13 Apr 2016 23:38:04 +0700
Dan Kennedy  wrote:

> On 04/13/2016 11:24 PM, John Found wrote:
> > What FTS5 query should I use in order to match all rows in the table?
> >
> 
> Can you use "SELECT * FROM fts_table;"?
> 

*I* can. But the users of the program where I use SQLite can't simply run 
arbitrary SQL statements.

> Dan.
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


-- 
http://fresh.flatassembler.net
http://asm32.info
John Found 


[sqlite] FTS5 query that to match all rows.

2016-04-13 Thread R Smith


On 2016/04/13 6:44 PM, John Found wrote:
> On Wed, 13 Apr 2016 23:38:04 +0700
> Dan Kennedy  wrote:
>
>> On 04/13/2016 11:24 PM, John Found wrote:
>>> What FTS5 query should I use in order to match all rows in the table?
>>>
>> Can you use "SELECT * FROM fts_table;"?
>>
> *I* can. But the users of the program where I use SQLite can't simply run 
> arbitrary SQL statements.

Ok... but then what good would knowing a "query that will return all 
rows" be to you if they cannot run arbitrary queries? The query Dan 
listed is the one that will do it. Either you program it in your 
application, or you teach the users to do it, what other way would there 
be to run any query of any kind?

Or did you mean you publish an interface of sorts in your application 
that allows users to run their own queries, but it /has/ to be an "fts5" 
query in some way, and now sometimes they want to list all rows?  It is 
quite baffling - but I might simply not understand the situation, 
perhaps some more information on the setup would help us see the light 
and help us to offer better suggestions than this :)

Cheers!
Ryan



[sqlite] FTS5 problem ? more than 3,000 queries !

2016-04-04 Thread Dan Kennedy
On 04/04/2016 05:43 AM, Domingo Alvarez Duarte wrote:
> Hello !
>
> I'm testing fts5 on a database with data from hacker news (around 13M
> records) and it took more than 2 hours to index, I have this application to
> log all queries executed for debugging and although the search isn't slow
> it's making a lot of queries.
>
> Is this normal ? More than 3,000 queries ?

Probably. Your query does "ORDER BY rank" with "rank" set to the built 
in bm25() function. One of the things bm25() needs is the size of each 
document that it is ranking. And obtaining this requires a separate 
query on the %_sz table for each row. So if the MATCH expression matches 
3000 rows, FTS5 makes 3000 SQL queries to obtain the size data.

Dan.





>
> Cheers !
>
> 
>
> CREATE TABLE "items" (
>  'id' integer PRIMARY KEY,
>  'parent' INTEGER,
>  'by' text COLLATE NOCASE,
>  'score' integer DEFAULT 0,
>  'title' text  COLLATE NOCASE,
>  'type' text  COLLATE NOCASE,
>  'url' text  COLLATE NOCASE,
>  'deleted' BOOLEAN DEFAULT 0,
>  'dead' BOOLEAN DEFAULT 0,
>  'comment' TEXT COLLATE NOCASE,
>  'time' integer NOT NULL,
>  descendants integer default 0
> );
>
> CREATE VIRTUAL TABLE fts_idx_items USING fts5(title, comment, content=items,
> content_rowid=id, prefix=3);
>
>
> The query:
>
> SQL:  SELECT a.id, a.parent, a.by, a.score,
>  snippet(fts_idx_items, 0, '', '', '...',
> 12) as title,
>  a.type, a.url,
>  --snippet(fts_idx_items, 1, '', '',
> '...', 12) as comment,
>  a.descendants,
>  datetime(a."time", 'unixepoch') AS time,
>  (julianday('now') - julianday(datetime(a."time",
> 'unixepoch'))) as elapsed_time
>  FROM items a, fts_idx_items b
>  where fts_idx_items match 'title:sql'
>  and a.id = b.rowid
>  --and a.type != 'comment'
>  ORDER BY rank --bm25(fts_idx_items)
>  LIMIT 30 OFFSET 0
>
> Generates:
>
> SQL:-- SELECT rowid, rank FROM 'main'.'fts_idx_items' ORDER BY
> bm25(fts_idx_items) ASC
>
> Followed by 28 of this:
>
> SQL:-- SELECT pgno FROM 'main'.'fts_idx_items_idx' WHERE
> segid=? AND term<=? ORDER BY term DESC LIMIT 1
>
> Followed by around 3,000 of:
>
> SQL:-- SELECT sz FROM 'main'.'fts_idx_items_docsize' WHERE id=?
>   
>
> Followed by 18 of:
>
> SQL:-- SELECT T.'id', T.'title', T.'comment' FROM
> 'main'.'items' T WHERE T.'id'=?
>
> SQL:-- SELECT sz FROM 'main'.'fts_idx_items_docsize' WHERE
> id=?
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] FTS5 problem ? more than 3,000 queries !

2016-04-04 Thread Domingo Alvarez Duarte
And the indexes are not supposed to be used ?  

Shouldn't a index cover the ranking too ?  

It seems a mistake for me.  

Cheers !  
>  Mon Apr 04 2016 11:07:51 AM CEST from "Dan Kennedy"
>  Subject: Re: [sqlite] FTS5 problem ? more than 
>3,000
>queries !
>
>  On 04/04/2016 05:43 AM, Domingo Alvarez Duarte wrote:
>  
>>Hello !
>> 
>> I'm testing fts5 on a database with data from hacker news (around 13M
>> records) and it took more than 2 hours to index, I have this application
>>to
>> log all queries executed for debugging and although the search isn't slow
>> it's making a lot of queries
>> 
>> Is this normal ? More than 3,000 queries ?
>> 

>  Probably. Your query does "ORDER BY rank" with "rank" set to the built 
> in bm25() function. One of the things bm25() needs is the size of each 
> document that it is ranking. And obtaining this requires a separate 
> query on the %_sz table for each row. So if the MATCH expression matches 
> 3000 rows, FTS5 makes 3000 SQL queries to obtain the size data.
> 
> Dan.
> 
> 
> 
> 
> 
> 
>  
>>Cheers !
>> 
>> 
>> 
>> CREATE TABLE "items" (
>> 'id' integer PRIMARY KEY,
>> 'parent' INTEGER,
>> 'by' text COLLATE NOCASE,
>> 'score' integer DEFAULT 0,
>> 'title' text COLLATE NOCASE,
>> 'type' text COLLATE NOCASE,
>> 'url' text COLLATE NOCASE,
>> 'deleted' BOOLEAN DEFAULT 0,
>> 'dead' BOOLEAN DEFAULT 0,
>> 'comment' TEXT COLLATE NOCASE,
>> 'time' integer NOT NULL,
>> descendants integer default 0
>> );
>> 
>> CREATE VIRTUAL TABLE fts_idx_items USING fts5(title, comment,
>>content=items,
>> content_rowid=id, prefix=3);
>> 
>> 
>> The query:
>> 
>> SQL : SELECT a.id, a.parent, a.by, a.score,
>> snippet(fts_idx_items, 0, '', '', '...',
>> 12) as title,
>> a.type, a.url,
>> --snippet(fts_idx_items, 1, '', '',
>> '...', 12) as comment,
>> a.descendants,
>> datetime(a."time", 'unixepoch') AS time,
>> (julianday('now') - julianday(datetime(a."time",
>> 'unixepoch'))) as elapsed_time
>> FROM items a, fts_idx_items b
>> where fts_idx_items match 'title:sql'
>> and a.id = b.rowid
>> --and a.type != 'comment'
>> ORDER BY rank --bm25(fts_idx_items)
>> LIMIT 30 OFFSET 0
>> 
>> Generates:
>> 
>> SQL : -- SELECT rowid, rank FROM 'main'.'fts_idx_items' ORDER BY
>> bm25(fts_idx_items) ASC
>> 
>> Followed by 28 of this:
>> 
>> SQL : -- SELECT pgno FROM 'main'.'fts_idx_items_idx' WHERE
>> segid=? AND term<=? ORDER BY term DESC LIMIT 1
>> 
>> Followed by around 3,000 of:
>> 
>> SQL : -- SELECT sz FROM 'main''fts_idx_items_docsize' WHERE id=?
>> 
>> 
>> Followed by 18 of:
>> 
>> SQL : -- SELECT T.'id', T.'title', T.'comment' FROM
>> 'main'.'items' T WHERE T.'id'=?
>> 
>> SQL : -- SELECT sz FROM 'main'.'fts_idx_items_docsize' WHERE
>> id=?
>> 
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> 

>  ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> 
>
>  



?



[sqlite] FTS5 problem ? more than 3,000 queries !

2016-04-04 Thread Domingo Alvarez Duarte
Hello !  

I'm testing fts5 on a database with data from hacker news (around 13M
records) and it took more than 2 hours to index, I have this application to
log all queries executed for debugging and although the search isn't slow
it's making a lot of queries.  

Is this normal ? More than 3,000 queries ?  

Cheers !  

?  

CREATE TABLE "items" (
??? 'id' integer PRIMARY KEY,
??? 'parent' INTEGER,
??? 'by' text COLLATE NOCASE,
??? 'score' integer DEFAULT 0,
??? 'title' text? COLLATE NOCASE,
??? 'type' text? COLLATE NOCASE,
??? 'url' text? COLLATE NOCASE,
??? 'deleted' BOOLEAN DEFAULT 0,
??? 'dead' BOOLEAN DEFAULT 0,
??? 'comment' TEXT COLLATE NOCASE,
??? 'time' integer NOT NULL,
??? descendants integer default 0
);  

CREATE VIRTUAL TABLE fts_idx_items USING fts5(title, comment, content=items,
content_rowid=id, prefix=3);  


The query:  

SQL??? :? SELECT ??? a.id, a.parent, a.by, a.score,
??? ??? ??? ??? snippet(fts_idx_items, 0, '', '', '...',
12) as title,
??? ??? ??? ??? a.type, a.url,
??? ??? ??? ??? --snippet(fts_idx_items, 1, '', '',
'...', 12) as comment,
??? ??? ??? ??? a.descendants,
??? ??? ??? ??? datetime(a."time", 'unixepoch') AS time,
??? ??? ??? ??? (julianday('now') - julianday(datetime(a."time",
'unixepoch'))) as elapsed_time
??? ??? ??? FROM items a, fts_idx_items b
??? ??? ??? where fts_idx_items match 'title:sql'
??? ??? ??? and a.id = b.rowid
??? ??? ??? --and a.type != 'comment'
??? ??? ??? ORDER BY rank --bm25(fts_idx_items)
??? ??? ??? LIMIT 30 OFFSET 0  

Generates:  

SQL??? :??? -- SELECT rowid, rank FROM 'main'.'fts_idx_items' ORDER BY
bm25(fts_idx_items) ASC??? 

Followed by 28 of this: 

SQL??? :??? -- SELECT pgno FROM 'main'.'fts_idx_items_idx' WHERE
segid=? AND term<=? ORDER BY term DESC LIMIT 1  

Followed by around 3,000 of:  

SQL??? :??? -- SELECT sz FROM 'main'.'fts_idx_items_docsize' WHERE id=?


Followed by 18 of:  

SQL??? :??? -- SELECT T.'id', T.'title', T.'comment' FROM
'main'.'items' T WHERE T.'id'= 

SQL??? :??? -- SELECT sz FROM 'main'.'fts_idx_items_docsize' WHERE
id=



[sqlite] FTS5 problem ? more than 3,000 queries !

2016-04-04 Thread Simon Slavin

On 3 Apr 2016, at 11:43pm, Domingo Alvarez Duarte  wrote:
> 
> Is this normal ? More than 3,000 queries ?  

It looks to me as if your SELECT is returning 3000 results.  If you do a search 
for a rarer term does it execute fewer queries ?  For instance you might search 
for '' instead of ''.

Simon.


[sqlite] FTS5 Prefix Indexes

2016-03-28 Thread Domingo Alvarez Duarte
Hello !  

Is it possible to expose commands to create/remove prefix indexes ?  

Something like (or any other friendly way to do it without reindex the whole
database):  INSERT INTO ft(ft) VALUES('remove prefix 2 3');  INSERT INTO
ft(ft) VALUES('remove prefix 2');  INSERT INTO ft(ft) VALUES('create prefix
2');

Cheers !  

?



[sqlite] FTS5 "constraint failed"

2016-03-23 Thread Dan Kennedy
On 03/23/2016 12:06 AM, Domingo Alvarez Duarte wrote:
> Hello !
>
> After seeing several times work/commits on fts5 I decided to try it on a
> table shown bellow, and when trying to populate it I get this error message:
>
>
> sqlite> INSERT INTO fts_idx_items(fts_idx_items) VALUES('rebuild');
> Error: constraint failed
>
> The table has 12,000,000 records and it show the error message after 10
> seconds working, any clue on what can be happening ?

Thanks for testing this. What does "SELECT sqlite_version();" return if 
you run it in the same shell?

Thanks,
Dan.



[sqlite] FTS5 "constraint failed"

2016-03-22 Thread Domingo Alvarez Duarte
Hello !  

It's a sqlite repository clone that follows trunk.  

SQLite version 3.12.0 2016-03-22 15:26:03
Enter ".help" for usage hints
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.  

Cheers !  

?  
>  Tue Mar 22 2016 06:23:53 PM CET from "Dan Kennedy"
>  Subject: Re: [sqlite] FTS5 "constraint failed"
>
>  On 03/23/2016 12:06 AM, Domingo Alvarez Duarte wrote:
>  
>>Hello !
>> 
>> After seeing several times work/commits on fts5 I decided to try it on a
>> table shown bellow, and when trying to populate it I get this error
>>message:
>> 
>> 
>> sqlite> INSERT INTO fts_idx_items(fts_idx_items) VALUES('rebuild');
>> Error: constraint failed
>> 
>> The table has 12,000,000 records and it show the error message after 10
>> seconds working, any clue on what can be happening ?
>> 

>  Thanks for testing this. What does "SELECT sqlite_version();" return if 
> you run it in the same shell?
> 
> Thanks,
> Dan.
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> 
>
>  



?



[sqlite] FTS5 "constraint failed"

2016-03-22 Thread Domingo Alvarez Duarte
In this case sqlite is compiled with the following flags:  

gcc -g -O2 -DSQLITE_OS_UNIX=1 -I. -I/third-party/sqlite3/src
-I/third-party/sqlite3/ext/rtree -I/third-party/sqlite3/ext/fts3
-D_HAVE_SQLITE_CONFIG_H -DBUILD_sqlite -DSQLITE_HAS_CODEC=1
-DSQLITE_ENABLE_FTS4=1 -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_RTREE=1
-DSQLITE_ENABLE_COLUMN_METADATA=1 -DSQLITE_ENABLE_FTS3_PARENTHESIS=1
-DSQLITE_DEFAULT_FOREIGN_KEYS=1 -DSQLITE_USE_URI=1
-DSQLITE_ENABLE_UNLOCK_NOTIFY=1 -DSQLITE_USE_DECIMAL2=1
-DSQLITE_ENABLE_STAT4=1 -DCODEC_TYPE=CODEC_TYPE_AES2562 -DSQLITE_SOUNDEX=1
-DSQLITE_ENABLE_EXTENSION_FUNCTIONS=1 -DSQLITE_OMIT_PREPARED=1 -DNDEBUG
-DSQLITE_THREADSAFE=1 -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1?
-DSQLITE_SMALL_STACK=1 -DHAVE_READLINE=0 -DHAVE_EDITLINE=1
-DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_EXPLAIN_COMMENTS
-o sqlite3 /third-party/sqlite3/src/shell.c sqlite3.c? -ledit -ldl -lpthread
-lm -Wl,-rpath -Wl,/usr/local/lib  

?  
>  Tue Mar 22 2016 06:06:55 PM CET from "Domingo Alvarez Duarte"
>  Subject: [sqlite] FTS5 "constraint failed"
>
>  Hello ! 
> 
> After seeing several times work/commits on fts5 I decided to try it on a
> table shown bellow, and when trying to populate it I get this error
>message: 
> 
> 
> sqlite> INSERT INTO fts_idx_items(fts_idx_items) VALUES('rebuild');
> Error: constraint failed 
> 
> The table has 12,000,000 records and it show the error message after 10
> seconds working, any clue on what can be happening ? 
> 
> Cheers ! 
> 
> CREATE TABLE "items" (
> ??? 'id' integer PRIMARY KEY,
> ??? 'parent' INTEGER,
> ??? 'by' text COLLATE NOCASE,
> ??? 'score' integer DEFAULT 0,
> ??? 'title' text? COLLATE NOCASE,
> ??? 'type' text? COLLATE NOCASE,
> ??? 'url' text? COLLATE NOCASE,
> ??? 'deleted' BOOLEAN DEFAULT 0,
> ??? 'dead' BOOLEAN DEFAULT 0,
> ??? 'comment' TEXT COLLATE NOCASE,
> ??? 'time' integer NOT NULL,
> ??? descendants integer default 0
> ); 
> 
> CREATE VIRTUAL TABLE fts_idx_items USING fts5(title, comment,
>content=items,
> content_rowid=id); 
> 
> INSERT INTO fts_idx_items(fts_idx_items) VALUES('rebuild');
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> 
>
>  



?



[sqlite] FTS5 "constraint failed"

2016-03-22 Thread Domingo Alvarez Duarte
Hello !  

After seeing several times work/commits on fts5 I decided to try it on a
table shown bellow, and when trying to populate it I get this error message: 


sqlite> INSERT INTO fts_idx_items(fts_idx_items) VALUES('rebuild');
Error: constraint failed  

The table has 12,000,000 records and it show the error message after 10
seconds working, any clue on what can be happening ?  

Cheers !  

CREATE TABLE "items" (
??? 'id' integer PRIMARY KEY,
??? 'parent' INTEGER,
??? 'by' text COLLATE NOCASE,
??? 'score' integer DEFAULT 0,
??? 'title' text? COLLATE NOCASE,
??? 'type' text? COLLATE NOCASE,
??? 'url' text? COLLATE NOCASE,
??? 'deleted' BOOLEAN DEFAULT 0,
??? 'dead' BOOLEAN DEFAULT 0,
??? 'comment' TEXT COLLATE NOCASE,
??? 'time' integer NOT NULL,
??? descendants integer default 0
);  

CREATE VIRTUAL TABLE fts_idx_items USING fts5(title, comment, content=items,
content_rowid=id);  

INSERT INTO fts_idx_items(fts_idx_items) VALUES('rebuild');



[sqlite] FTS5 prefix index documentation may be incorrect

2015-11-25 Thread Dan Kennedy
On 11/25/2015 05:11 AM, Charles Leifer wrote:
> The FTS5 prefix index documentation[1] seems to not be working. I've tried
> with SQLite 3.9.0 and 3.10.0 (2015-11-06) and both show the same error
> messages.

Thanks for reporting this. Now fixed here:

   http://sqlite.org/src/info/11eb8e877e2ba859

Dan.


>
> Examples:
>
> sqlite> CREATE VIRTUAL TABLE ft USING fts5(a, b, prefix='2 3');
> Error: malformed prefix=... directive
> sqlite> CREATE VIRTUAL TABLE ft USING fts5(a, b, prefix=2, prefix=3);
> Error: multiple prefix=... directives
>
> What does appear to work is using a comma:
>
> sqlite> CREATE VIRTUAL TABLE ft USING fts5(a, b, prefix='2,3');
> sqlite> .schema ft
> CREATE VIRTUAL TABLE ft USING fts5(a, b, prefix='2,3');
>
> 1: https://sqlite.org/fts5.html#section_4_2
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] FTS5 prefix index documentation may be incorrect

2015-11-25 Thread Charles Leifer
Thank you for the quick fix, and thank you so much for SQLite!
On Nov 25, 2015 5:57 AM, "Dan Kennedy"  wrote:

> On 11/25/2015 05:11 AM, Charles Leifer wrote:
>
>> The FTS5 prefix index documentation[1] seems to not be working. I've tried
>> with SQLite 3.9.0 and 3.10.0 (2015-11-06) and both show the same error
>> messages.
>>
>
> Thanks for reporting this. Now fixed here:
>
>   http://sqlite.org/src/info/11eb8e877e2ba859
>
> Dan.
>
>
>
>> Examples:
>>
>> sqlite> CREATE VIRTUAL TABLE ft USING fts5(a, b, prefix='2 3');
>> Error: malformed prefix=... directive
>> sqlite> CREATE VIRTUAL TABLE ft USING fts5(a, b, prefix=2, prefix=3);
>> Error: multiple prefix=... directives
>>
>> What does appear to work is using a comma:
>>
>> sqlite> CREATE VIRTUAL TABLE ft USING fts5(a, b, prefix='2,3');
>> sqlite> .schema ft
>> CREATE VIRTUAL TABLE ft USING fts5(a, b, prefix='2,3');
>>
>> 1: https://sqlite.org/fts5.html#section_4_2
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] FTS5 prefix index documentation may be incorrect

2015-11-24 Thread Charles Leifer
The FTS5 prefix index documentation[1] seems to not be working. I've tried
with SQLite 3.9.0 and 3.10.0 (2015-11-06) and both show the same error
messages.

Examples:

sqlite> CREATE VIRTUAL TABLE ft USING fts5(a, b, prefix='2 3');
Error: malformed prefix=... directive
sqlite> CREATE VIRTUAL TABLE ft USING fts5(a, b, prefix=2, prefix=3);
Error: multiple prefix=... directives

What does appear to work is using a comma:

sqlite> CREATE VIRTUAL TABLE ft USING fts5(a, b, prefix='2,3');
sqlite> .schema ft
CREATE VIRTUAL TABLE ft USING fts5(a, b, prefix='2,3');

1: https://sqlite.org/fts5.html#section_4_2


[sqlite] FTS5 rank configuration option

2015-11-06 Thread Dan Kennedy
On 11/06/2015 02:26 PM, chromedout64 at yahoo.com wrote:
> The documentation says that a persistent rank can be set with
> INSERT INTO ft(ft, rank) VALUES('rank', 'bm25(10.0, 5.0)');
> However, I can only get it to work on the same database connection as the 
> query. Subsequent database connections seem to use the standard default 
> ranking. Is this by design?

Thanks for reporting this. Might be fixed here:

   http://sqlite.org/src/info/33e6606f5e497e81

Dan.




> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] FTS5 External Content Table - REPLACE - SQL logic error or missing database

2015-11-06 Thread Dan Kennedy
On 11/06/2015 12:36 PM, chromedout64 at yahoo.com wrote:
> I noticed that this update is not included in the official 3.9.2 release. How 
> is it determined when an update such as this one gets included in a release? 
> Thanks for any info.

Most likely as part of 3.10.0.

You can always download the SQLite trunk, either via fossil or via a 
link like:

   http://www.sqlite.org/src/tarball/SQLite-trunk.tgz?uuid=trunk

Then run [./configure && make amalgamation-tarball] to build an autoconf 
package.

Regards,
Dan.






>   
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] FTS5 rank configuration option

2015-11-06 Thread chromedou...@yahoo.com
The documentation says that a persistent rank can be set with
INSERT INTO ft(ft, rank) VALUES('rank', 'bm25(10.0, 5.0)');
However, I can only get it to work on the same database connection as the 
query. Subsequent database connections seem to use the standard default 
ranking. Is this by design?


[sqlite] FTS5 External Content Table - REPLACE - SQL logic error or missing database

2015-11-06 Thread chromedou...@yahoo.com
Great, that's exactly what I need, I'll give it a try.



[sqlite] FTS5 External Content Table - REPLACE - SQL logic error or missing database

2015-11-06 Thread chromedou...@yahoo.com
I noticed that this update is not included in the official 3.9.2 release. How 
is it determined when an update such as this one gets included in a release? 
Thanks for any info.



[sqlite] FTS5 explicitly set delimiter

2015-11-02 Thread Dan Kennedy
On 11/02/2015 01:55 AM, chromedout64 at yahoo.com wrote:
> Thanks, I figured that this might be the case. What is the best way to 
> specify all characters except whitespace as part of a CREATE VIRTUAL TABLE 
> statement? Should you simply list each of the literal ascii characters such 
> as tokenchars '!@#$%' and so on.

I think you'll have to do the above, yes.

Dan.



[sqlite] FTS5 explicitly set delimiter

2015-11-01 Thread Dan Kennedy
On 11/01/2015 06:39 PM, chromedout64 at yahoo.com wrote:
>   Is there an easy way to set an FTS5 table so that the only 
> delimiter/separator is a space and everything else, including all 
> punctuation, is a token? Some searching revealed that there is an 
> undocumented feature as a part of FTS3/4 that allows the actual delimiter to 
> be specified as part of table creation, but I'm not sure if this exists for 
> FTS5.

You could use the tokenchars option of the unicode61 or ascii tokenizers 
to specify that all characters except whitespace are token characters:

   https://www.sqlite.org/fts5.html#section_4_3_1

Dan.




[sqlite] FTS5 explicitly set delimiter

2015-11-01 Thread chromedou...@yahoo.com
Thanks, I figured that this might be the case. What is the best way to specify 
all characters except whitespace as part of a CREATE VIRTUAL TABLE statement? 
Should you simply list each of the literal ascii characters such as tokenchars 
'!@#$%' and so on. Or is it possible or would it be better to use some sort of 
hex representation of all of them understood by SQLite or another method?


[sqlite] FTS5 explicitly set delimiter

2015-11-01 Thread chromedou...@yahoo.com
 Is there an easy way to set an FTS5 table so that the only delimiter/separator 
is a space and everything else, including all punctuation,?is a token? Some 
searching revealed that there is an undocumented?feature as a part of FTS3/4 
that allows the actual delimiter to be specified as part of table creation, 
but?I'm not?sure if this exists for FTS5.?


[sqlite] FTS5 External Content Table - REPLACE - SQL logic error or missing database

2015-10-28 Thread Dan Kennedy
On 10/27/2015 05:06 PM, chromedout64 at yahoo.com wrote:
> Using the example from the documentation at https://www.sqlite.org/fts5.html
> I thought this worked in a previous version.
>
> CREATE TABLE tbl(a INTEGER PRIMARY KEY, b, c);
> CREATE VIRTUAL TABLE fts_idx USING fts5(b, c, content='tbl', 
> content_rowid='a');
> CREATE TRIGGER tbl_ai AFTER INSERT ON tbl BEGIN
>INSERT INTO fts_idx(rowid, b, c) VALUES (new.a, new.b, new.c);
> END;
> CREATE TRIGGER tbl_ad AFTER DELETE ON tbl BEGIN
>INSERT INTO fts_idx(fts_idx, rowid, b, c) VALUES('delete', old.a, old.b, 
> old.c);
> END;
> CREATE TRIGGER tbl_au AFTER UPDATE ON tbl BEGIN
>INSERT INTO fts_idx(fts_idx, rowid, b, c) VALUES('delete', old.a, old.b, 
> old.c);
>INSERT INTO fts_idx(rowid, b, c) VALUES (new.a, new.b, new.c);
> END;
>
> REPLACE INTO tbl VALUES(1,'foo','bar');

I think this may have worked with an older version of FTS5 that did not 
support REPLACE.

The trouble is that because the top-level statement uses OR REPLACE 
conflict handling, so do the INSERT statements fired by the triggers. 
And OR REPLACE conflict handling was broken for external content tables.

Fts5 is now updated on trunk so that OR REPLACE is ignored for 
contentless and external content tables:

   http://sqlite.org/src/info/a85c2a4758c27e8d

So this sort of thing should work again.

Dan.




[sqlite] FTS5 External Content Table - REPLACE - SQL logic error or missing database

2015-10-27 Thread chromedou...@yahoo.com


Thanks for fixing it, Dan. Do you know when this will be rolled into an 
official or preliminary autoconf file? Thanks again.


[sqlite] FTS5 External Content Table - REPLACE - SQL logic error or missing database

2015-10-27 Thread chromedou...@yahoo.com
This is on v3.9.1. I'm not sure exactly when it stopped working. I was using 
the external content functionality with triggers to keep the virtual table 
updated as part of a larger program, but found that the referenced example from 
the docs exhibits the same issue. INSERT's work fine, but REPLACE gives this 
error now.


[sqlite] FTS5 External Content Table - REPLACE - SQL logic error or missing database

2015-10-27 Thread chromedou...@yahoo.com
Using the example from the documentation at https://www.sqlite.org/fts5.html
I thought this worked in a previous version.

CREATE TABLE tbl(a INTEGER PRIMARY KEY, b, c);
CREATE VIRTUAL TABLE fts_idx USING fts5(b, c, content='tbl', content_rowid='a');
CREATE TRIGGER tbl_ai AFTER INSERT ON tbl BEGIN
? INSERT INTO fts_idx(rowid, b, c) VALUES (new.a, new.b, new.c);
END;
CREATE TRIGGER tbl_ad AFTER DELETE ON tbl BEGIN
? INSERT INTO fts_idx(fts_idx, rowid, b, c) VALUES('delete', old.a, old.b, 
old.c);
END;
CREATE TRIGGER tbl_au AFTER UPDATE ON tbl BEGIN
? INSERT INTO fts_idx(fts_idx, rowid, b, c) VALUES('delete', old.a, old.b, 
old.c);
? INSERT INTO fts_idx(rowid, b, c) VALUES (new.a, new.b, new.c);
END;

REPLACE INTO tbl VALUES(1,'foo','bar');
Error: near line 14: SQL logic error or missing database


[sqlite] FTS5 issue on OS X

2015-10-22 Thread Gergely Lukacsy (glukacsy)
Oh yeah, bingo :)

Thanks?

Gergely

On 22/10/2015 17:05, "Scott Hess"  wrote:

>Dollars to donuts you're compiling SQLite but then linking against the
>system version.
>
>-scott
>
>
>On Thu, Oct 22, 2015 at 7:51 AM, Gergely Lukacsy (glukacsy) <
>glukacsy at cisco.com> wrote:
>
>> Hi Simon,
>>
>> Thanks for coming back to me.
>>
>> I ran sqlite3_compileoption_get in a loop to enumerate all options (I
>> assume this is the same as the pragma query suggested below) and I got
>>the
>> following:
>>
>> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
>> DBInit:OptiomENABLE_FTS3
>> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
>> DBInit:OptiomENABLE_FTS3_PARENTHESIS
>> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
>> DBInit:OptiomENABLE_LOCKING_STYLE=1
>> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
>> DBInit:OptiomENABLE_RTREE
>> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
>> DBInit:OptiomOMIT_AUTORESET
>> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
>> DBInit:OptiomOMIT_BUILTIN_TEST
>> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
>> DBInit:OptiomOMIT_LOAD_EXTENSION
>> 2015-10-22T14:46:26.000Z  [0x7fff7679d300] DBWrapper.cpp:33
>> DBInit:OptiomSYSTEM_MALLOC
>> 2015-10-22T14:46:26.000Z  [0x7fff7679d300] DBWrapper.cpp:33
>> DBInit:OptiomTHREADSAFE=2
>>
>>
>> I am really puzzled by this as its not that FTS is set to version 3, but
>> THREADSAFE=2 which is not what I am giving clang as an option. If I omit
>> -DSQLITE_ENABLE_FTS5 I see I can put garbage in the FTS5 specific parts
>>of
>> the amalgamation and it still compiles, with -DSQLITE_ENABLE_FTS5 it
>>fails
>> - indicating that the FTS5 specific parts are compiled and the
>> preprocessof flags honoured.
>>
>>
>> Regards,
>> Gergely
>>
>>
>> On 22/10/2015 14:51, "Simon Slavin"  wrote:
>>
>> >
>> >On 22 Oct 2015, at 2:38pm, Gergely Lukacsy (glukacsy)
>> > wrote:
>> >
>> >> ... I get an "error: no module: fts5" error message. This indicates
>> >>that I didn't compile SQLite with FTS5 support, but I am pretty sure
>> >>that I am using the right preprocessor flags
>> >
>> >Using that version of SQLite, can you execute the command
>> >
>> >PRAGMA compile_options;
>> >
>> >and tell us what it reports back ?  It will reply with a series of rows
>> >just like a SELECT command would.
>> >
>> >See  for
>> >further details.
>> >
>> >Simon.
>> >___
>> >sqlite-users mailing list
>> >sqlite-users at mailinglists.sqlite.org
>> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>___
>sqlite-users mailing list
>sqlite-users at mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] FTS5 issue on OS X

2015-10-22 Thread Gergely Lukacsy (glukacsy)
Hi Simon, 

Thanks for coming back to me.

I ran sqlite3_compileoption_get in a loop to enumerate all options (I
assume this is the same as the pragma query suggested below) and I got the
following:

2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
DBInit:OptiomENABLE_FTS3
2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
DBInit:OptiomENABLE_FTS3_PARENTHESIS
2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
DBInit:OptiomENABLE_LOCKING_STYLE=1
2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
DBInit:OptiomENABLE_RTREE
2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
DBInit:OptiomOMIT_AUTORESET
2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
DBInit:OptiomOMIT_BUILTIN_TEST
2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
DBInit:OptiomOMIT_LOAD_EXTENSION
2015-10-22T14:46:26.000Z  [0x7fff7679d300] DBWrapper.cpp:33
DBInit:OptiomSYSTEM_MALLOC
2015-10-22T14:46:26.000Z  [0x7fff7679d300] DBWrapper.cpp:33
DBInit:OptiomTHREADSAFE=2


I am really puzzled by this as its not that FTS is set to version 3, but
THREADSAFE=2 which is not what I am giving clang as an option. If I omit
-DSQLITE_ENABLE_FTS5 I see I can put garbage in the FTS5 specific parts of
the amalgamation and it still compiles, with -DSQLITE_ENABLE_FTS5 it fails
- indicating that the FTS5 specific parts are compiled and the
preprocessof flags honoured.


Regards,
Gergely


On 22/10/2015 14:51, "Simon Slavin"  wrote:

>
>On 22 Oct 2015, at 2:38pm, Gergely Lukacsy (glukacsy)
> wrote:
>
>> ... I get an "error: no module: fts5" error message. This indicates
>>that I didn't compile SQLite with FTS5 support, but I am pretty sure
>>that I am using the right preprocessor flags
>
>Using that version of SQLite, can you execute the command
>
>PRAGMA compile_options;
>
>and tell us what it reports back ?  It will reply with a series of rows
>just like a SELECT command would.
>
>See  for
>further details.
>
>Simon.
>___
>sqlite-users mailing list
>sqlite-users at mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] FTS5 issue on OS X

2015-10-22 Thread Simon Slavin

On 22 Oct 2015, at 2:38pm, Gergely Lukacsy (glukacsy)  
wrote:

> ... I get an "error: no module: fts5" error message. This indicates that I 
> didn't compile SQLite with FTS5 support, but I am pretty sure that I am using 
> the right preprocessor flags 

Using that version of SQLite, can you execute the command

PRAGMA compile_options;

and tell us what it reports back ?  It will reply with a series of rows just 
like a SELECT command would.

See  for further 
details.

Simon.


[sqlite] FTS5 issue on OS X

2015-10-22 Thread Gergely Lukacsy (glukacsy)
Hi,

I am using Sqlite in a codebase shared between Windows and OS X and 
specifically I am using FTS4 for searching our DB for free text strings. I have 
just upgraded to the latest 3.9.1 and thought to give a try to FTS5, but 
encountered an issue. It works fine on Windows, but on OS X the moment I try to 
create the FTS5 virtual table...

"CREATE VIRTUAL TABLE IF NOT EXISTS MessageSearch USING fts5(MESSAGE_ID 
UNINDEXED,CONVERSATION_ID UNINDEXED,MESSAGE_PUBLISHED_TIME 
UNINDEXED,MESSAGE_DATA, prefix='3,4,5')"

... I get an "error: no module: fts5" error message. This indicates that I 
didn't compile SQLite with FTS5 support, but I am pretty sure that I am using 
the right preprocessor flags (I am using the single C file amalgamation). For 
reference here is the entire build script I am using:

clang -c -O2 -DSQLITE_ENABLE_FTS5 -DSQLITE_THREADSAFE=0 -o sqlite3.o sqlite3.c

ar rcs darwin/libsqlite3.a sqlite3.o


I confirmed that the fts5.c part and the fts5 initialisation functions are all 
compiled into the .o.


Any idea what am I doing wrong? The main thing why I am puzzled is that it 
works without issues on Windows.


Regards,

Gergely



[sqlite] FTS5 issue on OS X

2015-10-22 Thread Scott Hess
Dollars to donuts you're compiling SQLite but then linking against the
system version.

-scott


On Thu, Oct 22, 2015 at 7:51 AM, Gergely Lukacsy (glukacsy) <
glukacsy at cisco.com> wrote:

> Hi Simon,
>
> Thanks for coming back to me.
>
> I ran sqlite3_compileoption_get in a loop to enumerate all options (I
> assume this is the same as the pragma query suggested below) and I got the
> following:
>
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomENABLE_FTS3
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomENABLE_FTS3_PARENTHESIS
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomENABLE_LOCKING_STYLE=1
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomENABLE_RTREE
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomOMIT_AUTORESET
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomOMIT_BUILTIN_TEST
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomOMIT_LOAD_EXTENSION
> 2015-10-22T14:46:26.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomSYSTEM_MALLOC
> 2015-10-22T14:46:26.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomTHREADSAFE=2
>
>
> I am really puzzled by this as its not that FTS is set to version 3, but
> THREADSAFE=2 which is not what I am giving clang as an option. If I omit
> -DSQLITE_ENABLE_FTS5 I see I can put garbage in the FTS5 specific parts of
> the amalgamation and it still compiles, with -DSQLITE_ENABLE_FTS5 it fails
> - indicating that the FTS5 specific parts are compiled and the
> preprocessof flags honoured.
>
>
> Regards,
> Gergely
>
>
> On 22/10/2015 14:51, "Simon Slavin"  wrote:
>
> >
> >On 22 Oct 2015, at 2:38pm, Gergely Lukacsy (glukacsy)
> > wrote:
> >
> >> ... I get an "error: no module: fts5" error message. This indicates
> >>that I didn't compile SQLite with FTS5 support, but I am pretty sure
> >>that I am using the right preprocessor flags
> >
> >Using that version of SQLite, can you execute the command
> >
> >PRAGMA compile_options;
> >
> >and tell us what it reports back ?  It will reply with a series of rows
> >just like a SELECT command would.
> >
> >See  for
> >further details.
> >
> >Simon.
> >___
> >sqlite-users mailing list
> >sqlite-users at mailinglists.sqlite.org
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] FTS5 returns "corrupt" plus trailing zero

2015-09-29 Thread Dan Kennedy
On 09/21/2015 05:14 PM, Ralf Junker wrote:
> My implementation of
>
>   http://www.sqlite.org/src/artifact/400384798349d658?ln=94-96
>
> returns "corrupt" plus a trailing zero, that is 8 characters in total.
>
> Maybe this line
>
>   http://www.sqlite.org/src/artifact/4fdbc0a321e3a1d7?ln=5364
>
> should be corrected to
>
>   sqlite3Fts5BufferSet(, , 7, (const u8*)"corrupt");
>
> so that the number of characters passed matches the length of "corrupt".

Finally noticed this. Thanks.

   http://www.sqlite.org/src/info/3a9f076250d9559d

Dan.




[sqlite] FTS5 returns "corrupt" plus trailing zero

2015-09-21 Thread Ralf Junker
My implementation of

   http://www.sqlite.org/src/artifact/400384798349d658?ln=94-96

returns "corrupt" plus a trailing zero, that is 8 characters in total.

Maybe this line

   http://www.sqlite.org/src/artifact/4fdbc0a321e3a1d7?ln=5364

should be corrected to

   sqlite3Fts5BufferSet(, , 7, (const u8*)"corrupt");

so that the number of characters passed matches the length of "corrupt".

Ralf


[sqlite] FTS5 stopwords

2015-09-14 Thread Dan Kennedy
On 09/14/2015 09:13 PM, Abilio Marques wrote:
> ?Hi,
>
> I know I'm a newcomer into the SQLite project, but I'm excited about what
> FTS5 has to offer. To me it seems simple and powerful, and has some really
> nice ideas.
>
> Is it possible for me to contribute on the module, or is it too late for
> that?
>
> I would like to mention two new ideas I would offer to introduce. First, a
> customizable list of stopwords:
>
> https://en.wikipedia.org/wiki/Stop_words
> ?
> (I didn't find anything similar to that in the documentation, am I missing
> something?)
>
> I know I can add it via a custom tokenizer, but wouldn't it be useful to
> have it straight out of the box?

Hi,

I think such a thing would be implemented using the custom tokenizer API 
even if it were shipped as part of FTS5. As a "wrapper tokenizer" 
similar to the built-in porter tokenizer perhaps.

If we had code for a stop-words implementation that seemed like it would 
work for everybody and any licensing issues could be worked out then 
there's no reason something like that couldn't be made part of FTS5.

Dan.




[sqlite] FTS5 stopwords

2015-09-14 Thread Dan Kennedy
On 09/14/2015 09:13 PM, Abilio Marques wrote:
> ?Hi,
>
> I know I'm a newcomer into the SQLite project, but I'm excited about what
> FTS5 has to offer. To me it seems simple and powerful, and has some really
> nice ideas.
>
> Is it possible for me to contribute on the module, or is it too late for
> that?
>
> I would like to mention two new ideas I would offer to introduce. First, a
> customizable list of stopwords:
>
> https://en.wikipedia.org/wiki/Stop_words
> ?
> (I didn't find anything similar to that in the documentation, am I missing
> something?)
>
> I know I can add it via a custom tokenizer, but wouldn't it be useful to
> have it straight out of the box?
>
>
> Also, I would like to mention the usefulness of some statistics to create
> more advanced ranking formulas. Things like: the Longest Common Subsequence
> between query and document, number of unique matched keywords, etc. These
> and other values are really useful in applications where bm25 is not
> suitable or enough.

Hi,

 From an FTS5 custom auxiliary function, there are two ways to find the 
token offset of every phrase match in the current document:

The xInstCount()/xInst() allows random access to an array of matches - 
i.e. give me the phrase number, column and token offset of the Nth match:

   https://www.sqlite.org/draft/fts5.html#xInstCount

And xPhraseFirst()/xPhraseNext() allow the user to iterate through the 
matches for a specific query phrase within the current document:

   https://www.sqlite.org/draft/fts5.html#xPhraseFirst

xPhraseFirst/xPhraseNext is faster, but xInstCount/xInst can be easier 
to use.

It should be possible to build the sorts of things you're talking about 
on top of one of those, no? The example matchinfo() code contains code 
to determine the longest common subsequence here:

   http://www.sqlite.org/src/artifact/e96be827aa8f5?ln=259-281

Feedback from anyone who actually tries to use this API much appreciated.

Dan.



>
> I come from using an engine called Sphinx Search (used on huge things like
> Craigslist), which offers such factors. Using them, they have defined
> rankers that mix bm25 with proximity, and some other they call
> SPH_RANK_SPH04, which includes a weighting boost for the result appearing
> at the beginning of the text field, and a bigger boost if its an exact
> match:
>
> http://sphinxsearch.com/docs/latest/builtin-rankers.html
>
> The formulas (in sphinx higher is better) for them are:
> http://sphinxsearch.com/docs/latest/formulas-for-builtin-rankers.html
>
> And the list of supported factor is:
> http://sphinxsearch.com/docs/latest/ranking-factors.html.
>
> Of course having all of them would be overkill, but if you find them
> interesting, we can get the most useful ones, allowing people to build
> rankers to their own needs.
>
>
> ?Once again, you people are the experts and know if such ideas are feasible
> and where is the right place to include them, so please tell me your
> opinions.
>
>   ?
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] FTS5 documentation typo

2015-09-14 Thread Dan Kennedy
On 09/14/2015 06:31 PM, Abilio Marques wrote:
> While reading the documentation draft for FTS5, I spotted this on
> section 4.3.3 .
> I believe the first example is missing quotation marks around the word 
> 'porter':
>
> *-- Two ways to create an FTS5 table that uses the porter tokenizer to
> -- stem the output of the default tokenizer (unicode61). *
> CREATE VIRTUAL TABLE t1 USING fts5(x, tokenize = porter);
> CREATE VIRTUAL TABLE t1 USING fts5(x, tokenize = 'porter unicode61');

Hi,

I think it's Ok as is. An "option value" can be eitehr an FTS5 bareword 
or string literal. See the 4th paragraph here:

   https://www.sqlite.org/fts5.html#section_4

Dan.




[sqlite] FTS5 stopwords

2015-09-14 Thread Stadin, Benjamin
I?ve implemented a custom ranker in SQLite that is similar to
SPH_RANK_SPH04 using FTS4 (BM25 + word distance and distance to beginning
of text). The only thing that wasn?t possible out of the box using FTS4
was to get the distance between found matches as distance between them
(how many words are between matches). FTS4 callback allows currently only
to get this distance as byte offset, but not word distance.

As far as I remember, there are internal data structures in FTS4 which
would allow this. But these structures aren?t available to the callback.

Anyways, it will be nice if FTS5 would have a feature to get the distance
between matched words expressed as word / token distance.

Cheers
Ben

Am 14.09.15 16:13 schrieb "sqlite-users-bounces at mailinglists.sqlite.org on
behalf of Abilio Marques" unter
:

>SPH_RANK_SPH04



  1   2   >