Re: [sqlite] small sqlite fts snippet function or Fts Bug!

2015-01-27 Thread boscowitch
Yeah, -ID 4 was just a desperate experiment for a hack with longer data
in the search to see if it would lead the snippet function to start
grabbing the data from the start (or at least one "word"/char more).

The offsets beeing wrong and therefore the ... was kinda expected of
me, but in case it worked I would have manually substracted the offset
and put the markers in myself... so it wasn't part of the bug report
just a test how it behaves in that case.

Good to know for the future that its already fixed, thx for taking care
of it so fast!

boscowitch

Am Mittwoch, den 28.01.2015, 02:04 +0700 schrieb Dan Kennedy:
> On 01/27/2015 06:48 PM, boscowitch wrote:
> >
> >
> > and the in an sqlite shell (SQLite version 3.8.8.1 2015-01-20 16:51:25)
> > I get following for a select with snippet:
> >
> > EXAMPLE OUTPUT:
> > sqlite> select docid,*,snippet(test) from test where german match "a";
> > 1|[1] a b c|1] a b c
> > 2|[{[_.,:;[1] a b c|1] a b c
> > 3|1[1] a b c|1[1] a b c
> > 4|[1] a b c|1] a b c
> > 5|​[1] a b c|​[1] a b c
> >
> >
> >
> > -As you can see for id 1 and 2  is at the right position
> > but all beginning non-alphanumerical [,{, etc. are just left out in the
> > snippet.
> >
> >
> > -ID 4 does not help and breaks the offsets so even worse
> >
> 
> Thanks for reporting this. The issue with (1) and (2) is now fixed here:
> 
>http://www.sqlite.org/src/info/adc9283dd9b
> 
> I think it is a bug in the input data causing the problem in (4). The 
> values inserted into "test" and "testdata" are just slightly different.
> 
> Dan.
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] small sqlite fts snippet function or Fts Bug!

2015-01-27 Thread Dan Kennedy

On 01/27/2015 06:48 PM, boscowitch wrote:



and the in an sqlite shell (SQLite version 3.8.8.1 2015-01-20 16:51:25)
I get following for a select with snippet:

EXAMPLE OUTPUT:
sqlite> select docid,*,snippet(test) from test where german match "a";
1|[1] a b c|1] a b c
2|[{[_.,:;[1] a b c|1] a b c
3|1[1] a b c|1[1] a b c
4|[1] a b c|1] a b c
5|​[1] a b c|​[1] a b c



-As you can see for id 1 and 2  is at the right position
but all beginning non-alphanumerical [,{, etc. are just left out in the
snippet.


-ID 4 does not help and breaks the offsets so even worse



Thanks for reporting this. The issue with (1) and (2) is now fixed here:

  http://www.sqlite.org/src/info/adc9283dd9b

I think it is a bug in the input data causing the problem in (4). The 
values inserted into "test" and "testdata" are just slightly different.


Dan.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] small sqlite fts snippet function or Fts Bug!

2015-01-27 Thread boscowitch
Hello since it this bug report (+ a dirty-fix) it might be useful for
both users and devs.
that's why I send a copy to both mailing lists! 
I hope I don't bother the diligent devs who read all of both list, sry
to them, and thx for sqlite btw. ;)!

recently I wanted to use the snippet function in sqlite for my small
sqlite dictionary (running on android but the bug occurs also on my
linux desktop).

but it behaved strangely when my entry started with "non-words"
character(s) (not alphanumeric and all Unicode (or chars>128) in short
simple tokenizer delimiters)

the snippet never prints them if they are in the beginning  of the first
word
here an examples to demonstrate:

EXAMPLE SETUP SQL:
create table testdata (german);
create virtual table test using fts4(content="testdata",german);

insert into testdata(german) VALUES ("[1] a b c");
insert into test(docid,german) VALUES(1,"[1] a b c ");

insert into testdata(german) VALUES ("[{[_.,:;[1] a b c");
insert into test(docid,german) VALUES(2,"[{[_.,:;[1] a b c "); 

insert into testdata(german) VALUES ("1[1] a b c");
insert into test(docid,german) VALUES(3,"1[1] a b c "); 

insert into testdata(german) VALUES ("[1] a b c");
insert into test(docid,german) VALUES(4,"1[1] a b c "); 

insert into testdata(german) 
VALUES(char(8203,91,49,93,32,97,32,98,32,99));
insert into test(docid,german)
VALUES(5,char(8203,91,49,93,32,97,32,98,32,99));


and the in an sqlite shell (SQLite version 3.8.8.1 2015-01-20 16:51:25)
I get following for a select with snippet:

EXAMPLE OUTPUT:
sqlite> select docid,*,snippet(test) from test where german match "a";
1|[1] a b c|1] a b c
2|[{[_.,:;[1] a b c|1] a b c
3|1[1] a b c|1[1] a b c
4|[1] a b c|1] a b c
5|​[1] a b c|​[1] a b c



-As you can see for id 1 and 2  is at the right position
but all beginning non-alphanumerical [,{, etc. are just left out in the
snippet.

-ID 3 works but has an additional 1 that should not be there so no
solution...

-ID 4 does not help and breaks the offsets so even worse

-ID 5 works BUT this is a dirty fix i found.
it adds an Unicode character ('ZERO WIDTH SPACE' (U+200B)) in front
which obviously cant be seen and doesnt "break" the offsets (just shifts
them all +1)
I didn't test it yet on android but I hope so, since it supports
Unicode ... 
obviously this is not a nice solution or one for more simpler/embedded
systems.


(btw. the same bug occurs also with fts3 and also with no special
content option)
here a small example for normal fts4 with a more custom snippet call:

create virtual table test using fts4(german);
insert into test VALUES("[1] a b c");

sqlite> select *,snippet(test,"#","#","...",0,64) from test where german
match 'a';
[1] a b c|1] #a# b c



regards boscowitch

PS: please excuse the "german" ;) and all English spelling errors 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS snippet()

2011-04-14 Thread Gert Van Assche
Drake,

if I do this, I get: SQL logic error or missing database.

Thanks

Gert
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS snippet()

2011-04-13 Thread Drake Wilson
Quoth Gert Van Assche , on 2011-04-13 22:35:49 +0200:
>   SELECT snippet(example, '[', ']') FROM example WHERE CONTEXT MATCH
> (SELECT TOKEN FROM example);

You're asking to match a single independently arbitrarily chosen token
from anywhere in the table (which is not even the same as "matching at
least one token from the table"), not whether it matches the one from
the same row.

Can you do WHERE CONTEXT MATCH TOKEN instead?  I think you still need
a full table scan for that, but it should return the right results
unless FTS4 has some relevant restriction on the RHS of a MATCH.

   ---> Drake Wilson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS snippet()

2011-04-13 Thread Gert Van Assche
Hi all,

I'm sure I'm doing something stupid here...

  CREATE VIRTUAL TABLE example USING fts4(TOKEN, CONTEXT);

  INSERT INTO example(TOKEN, CONTEXT) VALUES('one', 'This is just one
sentence.');
  INSERT INTO example(TOKEN, CONTEXT) VALUES('two', 'This is just one
sentence. Sorry, it are two sentences.');
  INSERT INTO example(TOKEN, CONTEXT) VALUES('three', 'More then three
words in one sentence.');

  SELECT snippet(example, '[', ']') FROM example WHERE CONTEXT MATCH
(SELECT TOKEN FROM example);


this returns

  This is just [one] sentence.
  This is just [one] sentence. Sorry, it are two sentences.
  More then three words in [one] sentence.


while I was hoping for

  This is just [one] sentence.
  This is just one sentence. Sorry, it are [two] sentences.
  More then [three] words in one sentence.


Can anyone tell me what I'm doing wrong?

thanks

gert
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS, snippet & Unicode?

2008-08-27 Thread Petite Abeille

On Aug 27, 2008, at 4:52 AM, Alexandre Courbot wrote:

> I know there is a patch at
> http://www.sqlite.org/cvstrac/tktview?tn=3140,38 that is supposed to
> improve Unicode support in FTS3. I suspect it to turn any Unicode
> character into a token - however maybe you can use it as a basis to
> implement what you need.

Thanks for the pointer. WIll give it a try.

Cheers,

--
PA.
http://alt.textdrive.com/nanoki/

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS, snippet & Unicode?

2008-08-27 Thread Dennis Cote
Alexey Pechnikov wrote:
> 
> Is it included to 3.6.1 or 3.6.2 version?
> 

No, it is not included in either version. The patch was submitted by the 
  mozilla group, but it has not been checked in to SQLite.

You can of course apply the patch to your own customized version of SQLite.

HTH
Dennis Cote
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS, snippet & Unicode?

2008-08-27 Thread Alexey Pechnikov
Hello!

В сообщении от Wednesday 27 August 2008 06:52:09 Alexandre Courbot написал(а):
> I know there is a patch at
> http://www.sqlite.org/cvstrac/tktview?tn=3140,38 that is supposed to
> improve Unicode support in FTS3. I suspect it to turn any Unicode
> character into a token - however maybe you can use it as a basis to
> implement what you need.

Is it included to 3.6.1 or 3.6.2 version?

Best regards, Alexey.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS, snippet & Unicode?

2008-08-26 Thread Alexandre Courbot
I know there is a patch at
http://www.sqlite.org/cvstrac/tktview?tn=3140,38 that is supposed to
improve Unicode support in FTS3. I suspect it to turn any Unicode
character into a token - however maybe you can use it as a basis to
implement what you need.

Alex.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS, snippet & Unicode?

2008-08-26 Thread Petite Abeille
Hello,

% sqlite3 -version 3.5.9

FTS's snippet seems to truncate Unicode sequences at time.

For example, given the following text:

Motto: ძალა ერთობაშია  (Georgian)
"Strength is in Unity"

FTS's snippet would return the extract bellow for 'Unity, Freedom,  
Work':

“… ��ია  (Georgian) "Strength is in Unity" Anthem:  
Tavisupleba  ("Freedom") Capital (and largest city) … America.  
Relations with NATO Georgia is working in becoming a full member of  
NATO. In …”

Note how ერთობაშია has been truncated to ��ია.

Thoughts?

Thanks in advance.

Cheers,

--
PA.
http://alt.textdrive.com/nanoki/



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users