Re: [sqlite] Full text search FTS3 of files

2010-10-18 Thread Dami Laurent (PJ)
>Is it possible to use FTS3 for search without storing the actual file
>contents/search terms/keywords in a row. In other words, create a FTS3
>tables with rows that only contains an ID and populate the B-Tree with
>keywords for search.
>

Each FTS3 table t is stored internally within three regular tables :
t_content, t_segments and t_segdir. The last two tables contain the
fulltext index. The first table t_content stores the complete documents
being indexed, and is only used when you call the offsets() or
snippets() functions. So if you don't need those functions, you can
cheat : a) call FTS3 to index your document as usual; b) do an update on
the t_content table to remove the document text. 

I did play with that scenario, and gained quite a lot of disk space;
however it's really a hack and maybe wouldn't work in future versions of
SQLite. 
More on
http://search.cpan.org/dist/DBD-SQLite/lib/DBD/SQLite/Cookbook.pod#Spari
ng_database_disk_space 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] using SQLite with mod_perl

2010-10-12 Thread Dami Laurent (PJ)
>-Message d'origine-
>De : sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
>boun...@sqlite.org] De la part de Clark Christensen
>Envoyé : lundi, 11. octobre 2010 17:26
>À : General Discussion of SQLite Database
>Objet : Re: [sqlite] using SQLite with mod_perl
>
>>So, I want all the perl modules to be loaded when Apache2 starts, and
>>then a $dbh created for each user when the user comes to the web site,
>>but not recreated for the same user on every reload. I am assuming
>>that would be the correct way to work speedily.
>
>I had the same problem, and it's a huge pain, even when you have root
>privilege.
>
>I never did completely solve it.  I had some success with explicit
>$dbh->disconnect calls before the script exits, but I could never prove
>why it
>worked. 

Creating a persistent dbh for each user doesn't make sense to me, because
if the first request from user U1 is served by Apache process P4, there is
no garantee that the next request will again go to process P4; if it goes
to process P2, then that process will need its own connection.  So the 
common practice is one persistent database connection for each Apache process,
not for each user; if you need user_specific stuff, you pass that information
through the shared connection.

Furthermore, SQLite is just a file, it has no notion of users, so one 
connection per user makes even less sense.


 Ultimately, I abandoned mod_perl.  Having to restart Apache
>every time I update code in a module was a big obstacle.
>

Updated scripts are reloaded automatically. Updated modules are not reloaded by
default, but if you want that behaviour you can activate 
http://search.cpan.org/~PHRED/Apache-Reload/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXTERNAL:Re: FTS Question (hyphenated words).

2010-10-04 Thread Dami Laurent (PJ)


>-Message d'origine-
>De : sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
>boun...@sqlite.org] De la part de Black, Michael (IS)
>Envoyé : dimanche, 3. octobre 2010 14:02
>À : General Discussion of SQLite Database
>Objet : Re: [sqlite] EXTERNAL:Re: FTS Question
>
>OK...that make sense...so i hook up my own tokenizerthen how do I
>allow the hyphen to really mean hyphen and not "NOT".
>
>I would've expected "play-off" to be ok and "play -off" to be the NOT
>operator.
>
>Is there some reason why whitespace isn't being taken into account to
>determine the meaning of the hypen?
>


Don't know the reason, I didn't write that code, I just had to study it while 
working on FTS3 integration into Perl's DBD::SQLite.

I didn't try nor didn't check in detail, but if your tokenizer treats 
"play-off" as a single word, then probably the query analyzer will receive one 
single token and will not interpret the hyphen as NOT.

Laurent Dami
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS Question

2010-10-02 Thread Dami Laurent (PJ)
Hi Michael,

While indexing, the standard tokenizer treats 'play-off' as two different 
words, so the indexing is exactly the same as for 'play off' or 'play, off'.

Now when querying, the query analyzer treats the hyphen as 'AND NOT', so your 
query really becomes 'play AND NOT off', which explains the result.

Clearly this behaviour is quite rudimentary, not what one would expect when 
being accustomed to Google or other powerful search engines ... but that's what 
the current version of FTS3 has to offer.

To obtain more sophisticated behaviour, you would have to implement your own 
tokenizer and hook it to FTS3. The doc explains how to do this.

Best regards, Laurent Dami


>-Message d'origine-
>De : sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
>boun...@sqlite.org] De la part de Black, Michael (IS)
>Envoyé : vendredi, 1. octobre 2010 16:54
>À : General Discussion of SQLite Database
>Objet : [sqlite] FTS Question
>
>I'm experimenting with the FTS capability and have a question
>
>How do I enter a value in the database that is hyphenated?  Seems to be
>backwards form what I would expectif the content contains a hyphen
>then only non-hyphenated query returns the correct value whereas the
>hyphenated one returns the wrong one.
>
>This seems to not work correctly...or am I missing a concept here?
>
>SQLite version 3.7.2
>sqlite> create virtual table dict using fts3(content);
>sqlite> insert into dict values('play-off');
>sqlite> insert into dict values('play-back');
>sqlite> select * from dict where content match 'play-back';
>play-off
>sqlite> select * from dict where content match 'play-off';
>play-back
>sqlite> select * from dict where content match 'play off';
>play-off
>sqlite> select * from dict where content match 'play back';
>play-back
>
>
>Michael D. Black
>Senior Scientist
>Advanced Analytics Directorate
>Northrop Grumman Information Systems
>
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Disambiguation of Latin accent characters for FTS3

2010-09-30 Thread Dami Laurent (PJ)
Hi Travis,

You need to define a "tokenizer" to be used by FTS3; something somehow similar 
to user-defined collating sequences.
See  http://www.sqlite.org/fts3.html#section_5_1 

The ICU library has language-specific library functions for  ignoring accents 
while tokenizing.
 
The Perl binding for SQLite has a general-purpose "unaccent" tokenizer, see
http://search.cpan.org/dist/DBD-SQLite/lib/DBD/SQLite.pm#FULLTEXT_SEARCH 
and
http://search.cpan.org/~dami/Search-Tokenizer-1.00/lib/Search/Tokenizer.pm 

Or you can write your own tokenizer in C ...

Best regards,
Laurent Dami


>-Message d'origine-
>De : sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
>boun...@sqlite.org] De la part de Travis Orr
>Envoyé : jeudi, 30. septembre 2010 17:36
>À : sqlite-users@sqlite.org
>Objet : [sqlite] Disambiguation of Latin accent characters for FTS3
>
>I know it is possible but can't figure out what needs to be done to be
>able to make FTS3 see E as being equal to É. And other similar cases.
>
>
>
>I have a custom collation sequence that does this disambiguation for
>sorting query results, but it doesn't appear to be functioning when
>performing FTS3 queries.
>
>
>
>Thanks
>
>
>
>___
>sqlite-users mailing list
>sqlite-users@sqlite.org
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implementing Regular Expression Support...?

2009-12-10 Thread Dami Laurent (PJ)
 

>-Message d'origine-
>De : sqlite-users-boun...@sqlite.org 
>[mailto:sqlite-users-boun...@sqlite.org] De la part de J. King
>Envoyé : mardi, 8. décembre 2009 18:24
>À : General Discussion of SQLite Database
>Objet : Re: [sqlite] Implementing Regular Expression Support...?
>
>On Tue, 08 Dec 2009 12:11:13 -0500, li...@mgreg.com   
>wrote:
>
>> Hi All,
>>
>> I'm currently using SQLITE in a few production apps.  I'm 
>using various  
>> languages such as Ruby, PERL, RB, etc.  I have need to use regular  
>> expressions in some of my queries, but I'm not sure how to 
>implement  
>> "user defined functionality".  Where are the hooks?  Is there a  
>> particular mechanism/language I must use to create them?  Is this  
>> something I'm required to recompile SQLITE for?
>
>You do so by defining a user function called 'regexp'.  The 
>means by which  
>one defines a user function depends on the language.  See, for 
>instance,  
>[1] for Ruby.  For a 'regexp' function you would specify two 
>arguments,  
>pattern and string to match against.
>
>[1]  
>ase.html#M000115>
>
>-- 
>J. King

Hi,

If you use Perl, make sure to get the latest version of DBD::SQLite, where the 
SQLite "regexp" function is automatically hooked to Perl regexes. See 
http://search.cpan.org/dist/DBD-SQLite/lib/DBD/SQLite.pm#REGEXP_function : 

SQLite includes syntactic support for an infix operator 'REGEXP', but without 
any implementation. The DBD::SQLite driver automatically registers an 
implementation that performs standard perl regular expression matching, using 
current locale. So for example you can search for words starting with an 'A' 
with a query like

  SELECT * from table WHERE column REGEXP '\bA\w+'

If you want case-insensitive searching, use perl regex flags, like this :

  SELECT * from table WHERE column REGEXP '(?i:\bA\w+)'
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] experiencing SQLite crash (Apache/mod_perl/Catalyst)

2008-10-15 Thread Dami Laurent (PJ)

>Hi all,
>
>I have a Perl Catalyst app using SQLite as storage. It has been running
>for several months without any problem.
>Now in the last few days we had several crashes of SQLite (file gets
>corrupted). This is just incomprehensible because all development and
>production engineers claim that nothing has changed in 
>application code,
>nor in Catalyst, DBD::Sqlite, Apache, or operating system 
>(Solaris), and
>we can't reproduce the problem in dev. So we don't even know where to
>start our investigations.
>
>Any hints, suggestions, related info, etc. would be welcome.
>
>The production server is running Apache 2.2.8 Solaris / mod_perl 2.04 /
>Perl 5.7.8 / Catalyst 5.7007 / DBD::SQlite 1.13.
>
>Thanks in advance, 
>
>   Laurent Dami
>


For info : we partially understood the problem.

The production server uses a private Apache authentication module
written in C.
That module was accidentally compiled with a debug flag and therefore
was sending
printf(..) statements to STDOUT (probably a very bad idea, instead of
going through Apache logging methods!). 
This went undetected for several months -- the printf were written God
knows where.

What happened in the last few days is that in some occasions the file
descriptor
for accessing SQLite was the same as the stdout used by C printf --
so some logging data was overwriting some blocks inside the SQLite
binary file!
No idea if this sharing of file descriptor is because of Apache or
mod_perl or 
DBD::Sqlite or still something else, nor why this sharing only started a
few
days ago  (maybe a difference in load of the server).

Anyway, the fix is of course to disable debug flag in that C module, so
that
it no longer prints to stdout.

Many thanks to people who responded to my previous call for help.

Best regards,

Laurent Dami





___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] experiencing SQLite crash (Apache/mod_perl/Catalyst)

2008-10-15 Thread Dami Laurent (PJ)
Hi all,

I have a Perl Catalyst app using SQLite as storage. It has been running
for several months without any problem.
Now in the last few days we had several crashes of SQLite (file gets
corrupted). This is just incomprehensible because all development and
production engineers claim that nothing has changed in application code,
nor in Catalyst, DBD::Sqlite, Apache, or operating system (Solaris), and
we can't reproduce the problem in dev. So we don't even know where to
start our investigations.

Any hints, suggestions, related info, etc. would be welcome.

The production server is running Apache 2.2.8 Solaris / mod_perl 2.04 /
Perl 5.7.8 / Catalyst 5.7007 / DBD::SQlite 1.13.

Thanks in advance, 

Laurent Dami





___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users