Re: [sqlite] Extending Porter Tokenizer

2016-07-10 Thread Abhinav Upadhyay
On Fri, Jul 8, 2016 at 3:01 AM, Matthias-Christian Ott <o...@mirix.org> wrote:
> On 2016-07-05 18:11, Abhinav Upadhyay wrote:
>> I'm wondering if it is possible to extend the functionality of the
>> porter tokenizer. I would like to use the functionality of the Porter
>> tokenizer but before stemming the token, I want to decide whether the
>> token should be stemmed or not.
>>
>> Do I need to copy the Porter tokenizer and modify it to suit my needs
>> or there is a better way, to minimize code duplication?
>
> The first argument of the Porter tokenizer is its parent tokenizer. The
> Porter tokenizer calls the parent tokenizer's xTokenize function with an
> xToken function that wraps the xToken function that was passed to the
> xTokenize function of the Porter tokenizer and stems the tokens passed
> to it. So create a custom tokenizer that extracts the original xToken
> function from the xToken member of its pCtx parameter:
>
> typedef struct PorterContext PorterContext;
> struct PorterContext {
>   void *pCtx;
>   int (*xToken)(void *pCtx, int tflags, const char *pToken, int nToken,
>   int iStart, int iEnd);
>   char *aBuf;
> };
>
> typedef struct CustomTokenizer CustomTokenizer;
> struct CustomTokenizer {
>   fts5_tokenizer tokenizer;
>   Fts5Tokenizer *pTokenizer;
> };
>
> typedef struct CustomContext CustomContext;
> struct CustomContext {
>   void *pCtx;
>   int (*xToken)(void *pCtx, int tflags, const char *pToken, int nToken,
>   int iStart, int iEnd);
> };
>
> int customToken(
>   void *pCtx,
>   int tflags,
>   const char *pToken,
>   int nToken,
>   int iStart,
>   int iEnd
> ){
>   CustomContext *c = (CustomContext*)pCtx;
>   PorterContext *p;
>
>   if( stem ){
> c->xToken(c->pCtx, tflags, pToken, nToken, iStart, iEnd);
>   }else{
> p = (PorterContext)c->pCtx;
> return p->xToken(p->pCtx, tflags, pToken, nToken, iStart, iEnd);
>   }
> }
>
> int customTokenize(
>   Fts5Tokenizer *pTokenizer,
>   void *pCtx,
>   int flags,
>   const char *pText,
>   int nText,
>   int (*xToken)(void *, int, const char *, int nToken, int iStart,
>   int iEnd)
> ){
>   CustomTokenizer *t = (CustomTokenizer)pTokenizer;
>   CustomContext sCtx;
>   sCtx.pCtx = pCtx;
>   sCtx.xToken = xToken;
>   return t->tokenizer.xTokenize(t->pTokenizer, (void*), flags,
>   pText, nText, customToken);
> }
>
> Note that you are accessing an internal struct and relying on
> implementation details and therefore have check whether the struct or
> any other relevant implementation details changed with every release.

Thanks for the detailed response. I think this would work but we are
currently using FTS4. The ability of calling a parent tokenizer is
really what I needed, but I don't think this is possible with FTS4?

-
Abhinav
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Extending Porter Tokenizer

2016-07-05 Thread Abhinav Upadhyay
Hi,

I'm wondering if it is possible to extend the functionality of the
porter tokenizer. I would like to use the functionality of the Porter
tokenizer but before stemming the token, I want to decide whether the
token should be stemmed or not.

Do I need to copy the Porter tokenizer and modify it to suit my needs
or there is a better way, to minimize code duplication?

-
Abhinav
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Preventing certain query keywords from getting stemmed

2016-05-29 Thread Abhinav Upadhyay
Hi,

While running queries, sometimes there are technical keywords which
shouldn't be stemmed by the tokenizer. For example, if I query for
"lfs" (which is a file system), the porter stemmer, converts it to
"lf", which matches many other unrelated keywords in the corpus (such
as ascii lf or some other acronyms).

I'm wondering if there is an option to tell the tokenizer not to stem
certain keywords and take them as it is?

-
Abhinav
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on the Compound Select Statements

2012-02-23 Thread Abhinav Upadhyay
On Thu, Feb 23, 2012 at 6:50 PM, Simon Slavin <slav...@bigfraud.org> wrote:
>
> On 23 Feb 2012, at 1:16pm, Abhinav Upadhyay <er.abhinav.upadh...@gmail.com> 
> wrote:
>
>> I do not remember the
>> exact error message but it close to this. As per the documentation on
>> the compound select statements
>> (http://www.sqlite.org/lang_select.html) on Sqlite website, there is
>> no mention of an explicit limit. I would like to know the exact limit
>> on this, so that I could my code to work within this limit
>
> <http://www.sqlite.org/limits.html>
>
> especially item 3, but also others.
>
> However, I question the advantage of doing one long INSERT rather than doing 
> many inside a transaction.  Are you binding parameters ?
>

It was already inside a bigger transaction, I was trying out something
naive and turns out it is not worth it. Thanks for the pointer  :)

--
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Limit on the Compound Select Statements

2012-02-23 Thread Abhinav Upadhyay
Hi,

I have a single column table, in which I wish to store around several
thousands of rows. I was wondering if I could insert them using a
single INSERT query and came across this Stackoverflow answer:
http://stackoverflow.com/a/1734067/348637 . According to that answer
it is possible to insert multiple rows using a single query with an
INSERT statement of the following form:

INSERT INTO table_name
SELECT 'val1' as col_name
UNION SELECT 'val2'
UNION SELECT 'val3'...

This seems to work but in my case I get an error sometimes saying "Too
many terms in the compound select statement" , I do not remember the
exact error message but it close to this. As per the documentation on
the compound select statements
(http://www.sqlite.org/lang_select.html) on Sqlite website, there is
no mention of an explicit limit. I would like to know the exact limit
on this, so that I could my code to work within this limit :)

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] File checking mechanism.

2012-02-01 Thread Abhinav Upadhyay
On Wed, Feb 1, 2012 at 1:37 PM, bhaskarReddy  wrote:
>
> Hi Friends,
>
>         Is there any File checking mechanism in sqilte3.
>
>         Suppose i have a file ABCD.db, before i want to create the data
> base file, i want to check whether it is already exit with the same or not.
> If it is exist returns an error.
>
>         Is there any sqlite function to do file checking mechanism.
>
>
> Regards,
> Bhaskar Reddy

Just do a stat(2) on the file. (see man 2 stat)

--
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [FTS] Executing Sql statements inside a custom tokenizer

2012-01-03 Thread Abhinav Upadhyay
> Two FTS tables? One with the Porter stemmer, for search, one without, to 
> build the auxiliary tables?

Yeah, that is the last option, if nothing else works. For a small set
of documents the extra processing time might be ok but for a larger
set of documents building the FTS tables twice might be a bit taxing.

I think I will first try a custom tokenizer. So is it ok to execute
SQL statements from within the tokenizer ? It would have been great if
there was some way to determine whether the tokenizer code is being
executed for indexing the documents or for searching the index.

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] [FTS] Executing Sql statements inside a custom tokenizer

2012-01-03 Thread Abhinav Upadhyay
Hi,

I would like to build up a table of all the unique words occurring in
my corpus (for spelling suggestion feature). Presently I am using the
Porter stemming tokenizer and I would not like to stop using the
stemmer at any cost. Although if I was not using the Porter stemmer
then I could easily obtain the list of unique words in the corpus
using the FTS4Aux module. But using the stemmer means that all the
words are stored in the index in their stem form which is not
desirable for building a dictionary of proper English words.

One solution is to use a custom tokenizer. I was thinking of using the
default Porter tokenizer supplied with Sqlite and adding some bits of
code to store the token in a separate table before stemming it down.
But I am not sure if it is ok to access or modify the database using
Sql statements inside a tokenizer. Now that I think of it, the
tokenizer code is also executed when an SQL query is performed against
the FTS table (when performing search), at which time I don't want my
dictionary building code to execute. So perhaps this is not a good
idea.

What other options do I have ?

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS3/FTS4 - Finding the term(s) that completes the input

2011-11-20 Thread Abhinav Upadhyay
On Mon, Nov 21, 2011 at 12:17 AM, Mohit Sindhwani  wrote:
> Hi, I'm finding my way through FTS3/FTS4 to replace some of the old code
> that we have for searching terms within titles.  I now know that FTS3/4
> should be the way to proceed.
>
> So far, I have this:
> - an FTS4 table that has two columns: title (main column), ext (certain
> conditions to match)
> - an FTS4aux table
>
> What I'd like to be able to do is something like this:
> - let's say that the FTS4 table has values such as:
> * mohit sindhwani, onghu
> * john doe, gmail
> * james ling, alibaba
> * john barn, yahoo
> ...and so on
>
> If the user enters "j", I'd like to suggest that this would complete to the
> words:
> john and james in the current set
>
> If the user enters 'ling j', I'd like to restrict it and say:
>> james is the only word that can be matched now
>> james ling, alibaba is the result
>
> Similarly, if the user enters 'yahoo j', I should be able to zoom in to
> 'john barn, yahoo'.
>
> I think this should be within the reach of FTS3/FTS4, but I'm having trouble
> framing the queries, etc.  Are you able to nudge me in the correct
> direction?
>
> Thanks,
> Mohit.

I think you might want to look at Token Prefix queries:
http://sqlite.org/fts3.html#section_3

--
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Attaching an in-memory database

2011-08-27 Thread Abhinav Upadhyay
On Sat, Aug 27, 2011 at 9:52 PM, Simon Slavin <slav...@bigfraud.org> wrote:
>
> On 27 Aug 2011, at 4:50pm, Abhinav Upadhyay wrote:
>
>> sqlite3_exec(db, "ATTACH DATABASE :memory AS metadb", NULL, NULL, );
>
> Need a colon after 'memory':
>
> ATTACH DATABASE ':memory:' AS metadb;
>
> I don't think you need the quotes I put in, I'm not sure.
>
> When in doubt, use the SQLite shell tool to execute the commands you're 
> trying to do in your app.  This will tell you whether the fault is in your 
> SQL coding or your C coding.  It has saved me many hours of looking at the 
> wrong problem.

Ah, that was silly of me :( . I could not see the 2nd colon in the
documentation page until now :P. Looks like it is working.

Thanks

--
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Attaching an in-memory database

2011-08-27 Thread Abhinav Upadhyay
Hi,

I am trying to attach an in-memory database and then create a table in
it. Here is the code that I am using:

/* I have already opened a connection to a database before the following */
sqlite3_exec(db, "ATTACH DATABASE :memory AS metadb", NULL, NULL, );
if (errmsg != NULL) {
warnx("%s", errmsg);
free(errmsg);
exit(EXIT_FAILURE);
}

/* I need to keep all the operations in a single transaction for
better performance */
sqlite3_exec(db, "BEGIN", NULL, NULL, );
if (errmsg != NULL) {
warnx("%s", errmsg);
free(errmsg);
exit(EXIT_FAILURE);
}

...
/* somewhere down the line I try to create a table in the in-memory
attached database */
sqlstr = "CREATE TABLE IF NOT EXISTS metadb.file_cache("
" md5_hash, file PRIMARY KEY)";


sqlite3_exec(db, sqlstr, NULL, NULL, );
if (errmsg != NULL) {
warnx("%s", errmsg);
free(errmsg);
exit(EXIT_FAILURE);
}
...

But on running the program I am getting an error exactly at the point
where I am trying to execute the create table statement above:

# ./makemandb
Building temporary file cache
makemandb: near ".": syntax error

What might be wrong here ? From the documentation it seems that I
should be explicitly specifying the database name if I have attached
one or more databases.

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Use of VACUUM

2011-08-11 Thread Abhinav Upadhyay
On Fri, Aug 12, 2011 at 12:28 AM, Richard Hipp <d...@sqlite.org> wrote:
> On Thu, Aug 11, 2011 at 2:48 PM, Abhinav Upadhyay <
> er.abhinav.upadh...@gmail.com> wrote:
>
>> On Fri, Aug 12, 2011 at 12:05 AM, Michael Stephenson
>> <domehead...@gmail.com> wrote:
>> > If you use INTEGER PRIMARY KEY, that column becomes your rowids; this
>> does
>> > not create a new, separate column in addition to the rowid column.
>> Indeed, but the INTEGER PRIMARY KEY column would count as a user
>> defined column and thus affect the FTS search :) The FTS table has all
>> text data, so I really do need to create a separate column for the
>> IDs.
>>
>
> Every FTS table has a "docid" column that is not searched, that is a unique
> integer key (like rowid), and which is not modified by VACUUM.
Ah, that saves the day for me :-) . Using VACUUM brings down the size
of my databse by 1/3rd, so I really wanted to use it.

Thanks :)
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Use of VACUUM

2011-08-11 Thread Abhinav Upadhyay
On Fri, Aug 12, 2011 at 12:05 AM, Michael Stephenson
 wrote:
> If you use INTEGER PRIMARY KEY, that column becomes your rowids; this does
> not create a new, separate column in addition to the rowid column.
Indeed, but the INTEGER PRIMARY KEY column would count as a user
defined column and thus affect the FTS search :) The FTS table has all
text data, so I really do need to create a separate column for the
IDs.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Use of VACUUM

2011-08-11 Thread Abhinav Upadhyay
On Thu, Aug 11, 2011 at 11:14 PM, Igor Tandetnik <itandet...@mvps.org> wrote:
> On 8/11/2011 1:35 PM, Abhinav Upadhyay wrote:
>> The documentation page of the VACUUM command says that "The VACUUM
>> command may change the ROWIDs of entries in any tables that do not
>> have an explicit INTEGER PRIMARY KEY." So what are the possible cases
>> in which the ROWIDs might change ?
>
> ROWIDs might possibly change if the table doesn't have an explicit
> INTEGER PRIMARY KEY column, and you run VACUUM commad on the database
> containing this table. Which part of the statement you quoted do you
> find unclear?
I wanted to know, why would the ROWID change, but Simon's answer makes sense.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Use of VACUUM

2011-08-11 Thread Abhinav Upadhyay
On Thu, Aug 11, 2011 at 11:15 PM, Simon Slavin <slav...@bigfraud.org> wrote:
>
> On 11 Aug 2011, at 6:35pm, Abhinav Upadhyay wrote:
>
>> The documentation page of the VACUUM command says that "The VACUUM
>> command may change the ROWIDs of entries in any tables that do not
>> have an explicit INTEGER PRIMARY KEY." So what are the possible cases
>> in which the ROWIDs might change ?
>
> Not documented.  So even if someone told you what they were in this version 
> of SQLite, there might be other reasons in the next version.  Theoretically 
> it might renumber rows to close up gaps in the AUTOINCREMENT.
That makes sense.

> As it says, to stop it all you need to do is declare one of the columns as 
> INTEGER PRIMARY KEY.  Once you do that it assumes that you might be referring 
> to those values elsewhere and won't change them.
Indeed, I am using the ROWID as a reference in another table. Actually
I have an FTS table and I don't really want to create an explicit
column for storing  the IDs, as I am afraid that matches from the ID
column could affect the quality of search results.
But if this is the only option, then I guess I need to give it a try.
I might give this column a weight of 0.0 so that it doesn't create
noise in the search results.

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Use of VACUUM

2011-08-11 Thread Abhinav Upadhyay
Hi,

The documentation page of the VACUUM command says that "The VACUUM
command may change the ROWIDs of entries in any tables that do not
have an explicit INTEGER PRIMARY KEY." So what are the possible cases
in which the ROWIDs might change ?

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [FTS3] Understanding the Flow of data through the tokenizer

2011-07-25 Thread Abhinav Upadhyay
On Mon, Jul 25, 2011 at 9:54 AM, Dan Kennedy <danielk1...@gmail.com> wrote:
> On 07/24/2011 08:16 PM, Abhinav Upadhyay wrote:
>> Hi,
>>
>> I am trying to write my own custom tokenizer to filter stopwords apart
>> from doing normalization and stemming. I have gone through the
>> comments in fts3_tokenizer.h and also read the implementation of the
>> simple tokenizer. While overall I am able to understand what I need to
>> do to implement this tokenizer, but I still cannot visualize how the
>> FTS engine calls the tokenizer and what data in what form it passes to
>> it.
>>
>> Does the FTS engine pass the complete document data to the tokenizer
>> or it passes some chunks of data, or individual words ? I need to
>> understand this part because the next function needs to set the
>> offsets accordingly. By just going through the code of the simple
>> tokenizer I could not completely comprehend it (it would have been
>> better if I could debug it).
>>
>> By the next functio I mean this: int (*xNext)(
>>      sqlite3_tokenizer_cursor *pCursor,   /* Tokenizer cursor */
>>      const char **ppToken, int *pnBytes,  /* OUT: Normalized text for token 
>> */
>>      int *piStartOffset,  /* OUT: Byte offset of token in input buffer */
>>      int *piEndOffset,    /* OUT: Byte offset of end of token in input 
>> buffer */
>>      int *piPosition      /* OUT: Number of tokens returned before this one 
>> */
>>    );
>> };
>>
>> It would be better if you could explain what is the role of these
>> parameters: piEndOffset , piStartOffset ?
>
> Each time xNext() returns SQLITE_OK to return a new token, xNext()
> should set:
>
>   *piStartOffset to the number of bytes in the input buffer before
>   start of the token being returned,
>
>   *piEndOffset to *piStartOffset plus the number of bytes in the
>   token text, and
>
>   *piPosition to the number of tokens that occur in the input buffer
>   before the token being returned.

Thanks for the explanation. I was able to correct my implementation :-)
.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] [FTS3] Understanding the Flow of data through the tokenizer

2011-07-24 Thread Abhinav Upadhyay
Hi,

I am trying to write my own custom tokenizer to filter stopwords apart
from doing normalization and stemming. I have gone through the
comments in fts3_tokenizer.h and also read the implementation of the
simple tokenizer. While overall I am able to understand what I need to
do to implement this tokenizer, but I still cannot visualize how the
FTS engine calls the tokenizer and what data in what form it passes to
it.

Does the FTS engine pass the complete document data to the tokenizer
or it passes some chunks of data, or individual words ? I need to
understand this part because the next function needs to set the
offsets accordingly. By just going through the code of the simple
tokenizer I could not completely comprehend it (it would have been
better if I could debug it).

By the next functio I mean this: int (*xNext)(
sqlite3_tokenizer_cursor *pCursor,   /* Tokenizer cursor */
const char **ppToken, int *pnBytes,  /* OUT: Normalized text for token */
int *piStartOffset,  /* OUT: Byte offset of token in input buffer */
int *piEndOffset,/* OUT: Byte offset of end of token in input buffer */
int *piPosition  /* OUT: Number of tokens returned before this one */
  );
};

It would be better if you could explain what is the role of these
parameters: piEndOffset , piStartOffset ?

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [FTS3] Header to include for a custom tokenizer

2011-07-23 Thread Abhinav Upadhyay
On Sun, Jul 24, 2011 at 1:40 AM, Abhinav Upadhyay
<er.abhinav.upadh...@gmail.com> wrote:
> On Sat, Jul 23, 2011 at 11:00 PM, Richard Hipp <d...@sqlite.org> wrote:
>> On Sat, Jul 23, 2011 at 1:01 PM, Abhinav Upadhyay <
>> er.abhinav.upadh...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>  I am using the Sqlite3 amalgamation. I am trying to register a custom
>>> tokenizer with sqlite for my FTS application. The custom tokenizer is
>>> in it's separate source file. I have included sqlite3.h header with
>>> the tokenizer source but sqlite3.h does not contain the declaration of
>>> the various structs like sqlite3_tokenizer_module etc. So what is the
>>> usual way to resolve this ? These declarations are also not there in
>>> sqlite3ext.h . Although I see them there in sqlite3.c but I am not
>>> sure I want to include it ? What is the usual way to resolve this ?
>>> May be import fts3_tokenizer.h from the sqlite3 source ?
>>>
>>
>> Yes.  Use fts3_tokenizer.h.
>
> Thanks, That worked. Now, I am able to compile everything.
>
> Next, I am having problem with using this tokenizer. I followed the
> exampple from the FTS3 documentation page on the website and
> registered the tokenizer using code like this:
>
> const sqlite3_tokenizer_module *stopword_tokenizer_module;
>
> sqlstr = "select fts3_tokenizer(:tokenizer_name, :tokenizer_ptr)";
>        rc = sqlite3_prepare_v2(db, sqlstr, -1, , NULL);
>        if (rc != SQLITE_OK) {
>                sqlite3_close(db);
>                sqlite3_shutdown();
>                return -1;
>        }
>
>        idx = sqlite3_bind_parameter_index(stmt, ":tokenizer_name");
>        rc = sqlite3_bind_text(stmt, idx, "my_tokenizer", -1, NULL);
>        if (rc != SQLITE_OK) {
>                fprintf(stderr, "%s\n", sqlite3_errmsg(db));
>                sqlite3_finalize(stmt);
>                return -1;
>        }
>
>        sqlite3Fts3MyTokenizerModule((const sqlite3_tokenizer_module
> **)_tokenizer_module);
>        idx = sqlite3_bind_parameter_index(stmt, ":tokenizer_ptr");
>        rc = sqlite3_bind_blob(stmt, idx, _tokenizer_module,
> sizeof(my_tokenizer_module), SQLITE_STATIC);
>        if (rc != SQLITE_OK) {
>                fprintf(stderr, "%s\n", sqlite3_errmsg(db));
>                sqlite3_finalize(stmt);
>                return -1;
>        }
>        rc = sqlite3_step(stmt);
>        if (rc != SQLITE_ROW) {
>                fprintf(stderr, "%s\n", sqlite3_errmsg(db));
>                sqlite3_finalize(stmt);
>                return -1;
>        }
>        sqlite3_finalize(stmt);
>
> It is working fine till above. After executing the above statements, I
> try to create an FTS table using this custom tokenizer, which also
> seem to be getting created. The problem is coming up when I try to
> insert data in the table. A simple statement like "insert into
> my_table values(...)" is giving out errors:
>
> unknown tokenizer: my_tokenizer
>
> I am sure I am missing something here, but don't know what ?
>
> Thanks
>

Nevermind. Seems like I need to register the tokenizer everytime I try
to query the database or store the values.

Thanks
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [FTS3] Header to include for a custom tokenizer

2011-07-23 Thread Abhinav Upadhyay
On Sat, Jul 23, 2011 at 11:00 PM, Richard Hipp <d...@sqlite.org> wrote:
> On Sat, Jul 23, 2011 at 1:01 PM, Abhinav Upadhyay <
> er.abhinav.upadh...@gmail.com> wrote:
>
>> Hi,
>>
>>  I am using the Sqlite3 amalgamation. I am trying to register a custom
>> tokenizer with sqlite for my FTS application. The custom tokenizer is
>> in it's separate source file. I have included sqlite3.h header with
>> the tokenizer source but sqlite3.h does not contain the declaration of
>> the various structs like sqlite3_tokenizer_module etc. So what is the
>> usual way to resolve this ? These declarations are also not there in
>> sqlite3ext.h . Although I see them there in sqlite3.c but I am not
>> sure I want to include it ? What is the usual way to resolve this ?
>> May be import fts3_tokenizer.h from the sqlite3 source ?
>>
>
> Yes.  Use fts3_tokenizer.h.

Thanks, That worked. Now, I am able to compile everything.

Next, I am having problem with using this tokenizer. I followed the
exampple from the FTS3 documentation page on the website and
registered the tokenizer using code like this:

const sqlite3_tokenizer_module *stopword_tokenizer_module;

sqlstr = "select fts3_tokenizer(:tokenizer_name, :tokenizer_ptr)";
rc = sqlite3_prepare_v2(db, sqlstr, -1, , NULL);
if (rc != SQLITE_OK) {
sqlite3_close(db);
sqlite3_shutdown();
return -1;
}

idx = sqlite3_bind_parameter_index(stmt, ":tokenizer_name");
rc = sqlite3_bind_text(stmt, idx, "my_tokenizer", -1, NULL);
if (rc != SQLITE_OK) {
fprintf(stderr, "%s\n", sqlite3_errmsg(db));
sqlite3_finalize(stmt);
return -1;
}

sqlite3Fts3MyTokenizerModule((const sqlite3_tokenizer_module
**)_tokenizer_module);
idx = sqlite3_bind_parameter_index(stmt, ":tokenizer_ptr");
rc = sqlite3_bind_blob(stmt, idx, _tokenizer_module,
sizeof(my_tokenizer_module), SQLITE_STATIC);
if (rc != SQLITE_OK) {
fprintf(stderr, "%s\n", sqlite3_errmsg(db));
sqlite3_finalize(stmt);
return -1;
}
rc = sqlite3_step(stmt);
if (rc != SQLITE_ROW) {
fprintf(stderr, "%s\n", sqlite3_errmsg(db));
sqlite3_finalize(stmt);
return -1;
}
sqlite3_finalize(stmt);

It is working fine till above. After executing the above statements, I
try to create an FTS table using this custom tokenizer, which also
seem to be getting created. The problem is coming up when I try to
insert data in the table. A simple statement like "insert into
my_table values(...)" is giving out errors:

unknown tokenizer: my_tokenizer

I am sure I am missing something here, but don't know what ?

Thanks
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] [FTS3] Header to include for a custom tokenizer

2011-07-23 Thread Abhinav Upadhyay
Hi,

 I am using the Sqlite3 amalgamation. I am trying to register a custom
tokenizer with sqlite for my FTS application. The custom tokenizer is
in it's separate source file. I have included sqlite3.h header with
the tokenizer source but sqlite3.h does not contain the declaration of
the various structs like sqlite3_tokenizer_module etc. So what is the
usual way to resolve this ? These declarations are also not there in
sqlite3ext.h . Although I see them there in sqlite3.c but I am not
sure I want to include it ? What is the usual way to resolve this ?
May be import fts3_tokenizer.h from the sqlite3 source ?

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [FTS3] The Compress and Uncompress functions and extension

2011-07-22 Thread Abhinav Upadhyay
On Fri, Jul 22, 2011 at 1:32 PM, Abhinav Upadhyay
<er.abhinav.upadh...@gmail.com> wrote:
> On Fri, Jul 22, 2011 at 12:38 PM, Alexey Pechnikov
> <pechni...@mobigroup.ru> wrote:
>> But why you don't use compress/uncompress functions from DRH? See
>> http://www.mail-archive.com/sqlite-users%40sqlite.org/msg17018.html
>>
>> I did wrap these into extension and add SQLITE_COMPRESS_MIN_LENGTH
>> http://sqlite.mobigroup.ru/artifact/a5da96353bb851b34114052ba85041fdffb725cd
>> http://sqlite.mobigroup.ru/artifact/56df1be3c402d7d49c3a13be704a2ff22c3003d2
>>
>> http://sqlite.mobigroup.ru/dir?name=ext/compress
>>
>
> Thanks for pointing out that mail archive discussion. I wasn't using
> compress/uncompress because uncompress requires you to store the size
> of the compressed buffer which is returned by the compress function
> while compressing. But that email discussion suggests a nifty trick to
> overcome this.
>
> I implemented the compress/uncompress functions as suggested by
> RIchard in the email, however, it is still not working for me. The
> database is getting compressed fine, but  there is problem with
> decompression. I seem to be getting null values for the column values
> in the result set of my query.
>
> For example:
>
> $./my_program "print a document"
>
> (null)((null))
> (null)
>
> The above are three column values that I am trying select in my query.
> "pring a document" is a query that I tried to execute. Although this
> is definitely an improvement from before :)
>

Sorry, it was my mistake. I had done a small error in calling
uncompress. Now it seems to be working fine.

Thanks a ton for the help. You saved my day :-)

--
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [FTS3] The Compress and Uncompress functions and extension

2011-07-22 Thread Abhinav Upadhyay
On Fri, Jul 22, 2011 at 12:38 PM, Alexey Pechnikov
 wrote:
> But why you don't use compress/uncompress functions from DRH? See
> http://www.mail-archive.com/sqlite-users%40sqlite.org/msg17018.html
>
> I did wrap these into extension and add SQLITE_COMPRESS_MIN_LENGTH
> http://sqlite.mobigroup.ru/artifact/a5da96353bb851b34114052ba85041fdffb725cd
> http://sqlite.mobigroup.ru/artifact/56df1be3c402d7d49c3a13be704a2ff22c3003d2
>
> http://sqlite.mobigroup.ru/dir?name=ext/compress
>

Thanks for pointing out that mail archive discussion. I wasn't using
compress/uncompress because uncompress requires you to store the size
of the compressed buffer which is returned by the compress function
while compressing. But that email discussion suggests a nifty trick to
overcome this.

I implemented the compress/uncompress functions as suggested by
RIchard in the email, however, it is still not working for me. The
database is getting compressed fine, but  there is problem with
decompression. I seem to be getting null values for the column values
in the result set of my query.

For example:

$./my_program "print a document"

(null)((null))  
(null)

The above are three column values that I am trying select in my query.
"pring a document" is a query that I tried to execute. Although this
is definitely an improvement from before :)

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [FTS3] The Compress and Uncompress functions

2011-07-20 Thread Abhinav Upadhyay
On Wed, Jul 20, 2011 at 7:51 PM, Abhinav Upadhyay
<er.abhinav.upadh...@gmail.com> wrote:
> Hi,
>
> I have an FTS table with compress and uncompress options enabled. I am
> using zlib(3) for doing the compression. The compression function
> seems to be doing ok as I can see the size of the database coming down
> drastically. But I the uncompress function is not working properly.
>
> For example if I do a query like: "select snippet(my_fts_table) from
> my_fts_table where my_fts_table match 'my query'" , it should only
> output the snippets of the individual matching documents. But instead
> it is sending the complete document data to the output. I am not sure
> if it is the fault of my uncompress function or a bug in sqlite so
> showing my compress/uncompress code here. Hope to get some insight.
>
> Thanks
>
> Following are the functions that I have written for compress and uncompress:
>
> /* the compress function */
> void
> zip(sqlite3_context *pctx, int nval, sqlite3_value **apval)
> {
>        const Bytef *source = sqlite3_value_text(apval[0]);
>        uLong sourcelen = strlen((const char *)source);
>        uLong destlen = (sourcelen + 12) + (int)(sourcelen + 12) * .01/100;
>        Bytef *dest = (Bytef *) malloc(sizeof(Bytef) * destlen);
>        int ret_val = compress(dest, , source, sourcelen);
>        if (ret_val != Z_OK) {
>                sqlite3_result_error(pctx, "Error in compression", -1);
>        }
>        sqlite3_result_text(pctx, (const char *)dest, -1, NULL);
>        return;
> }
>
> /* the uncompress function */
>
>
> void
> unzip(sqlite3_context *pctx, int nval, sqlite3_value **apval)
> {
>        int ret;
>        unsigned int have;
>        z_stream strm;
>        size_t bufsize;
>        unsigned char *in;
>        unsigned char out[CHUNK];
>        unsigned char *dest = NULL;
>        strm.zalloc = Z_NULL;
>        strm.zfree = Z_NULL;
>        strm.opaque = Z_NULL;
>        strm.next_in = NULL;
>        strm.avail_in = 0;
>
>        ret = inflateInit();
>        if (ret != Z_OK) {
>                fprintf(stderr, "Could not initiate decompression of 
> database\n");
>                sqlite3_result_error(pctx, "Error in decompression", -1);
>                return;
>        }
>
>        in = (unsigned char *) sqlite3_value_text(apval[0]);
>        bufsize = strlen((const char *) in);
>        strm.next_in = in;
>        strm.avail_in = bufsize;
>
>        /* run inflate() on input until output buffer is not full */
>        do {
>                strm.avail_out = CHUNK;
>                strm.next_out = out;
>                ret = inflate(, Z_NO_FLUSH);
>
>                have = CHUNK - strm.avail_out;
>                if (concat((char **) , (const char *) strm.next_out) < 0) 
> {
>                        fprintf(stderr, "Error in concatnating decompressed 
> stream\n");
>                        sqlite3_result_error(pctx, "Error in decompression", 
> -1);
>                        return;
>                }
>        } while (strm.avail_out == 0);
>
>        inflateEnd();
>        sqlite3_result_text(pctx, (const char *)dest, -1, NULL);
>        return;
> }
>
> // concat is a utility function that concatenates its 2nd parameter at
> the end of its Ist parameter.It returns a negative value on error.
>

Just to add to what I said before. On executing a query from within my
application it is sending the complete indexed data of the matched
documents.

So actually, first it sends this output to stderr, and then the same
output to stdoutput. This I observed just a while ago. I am pretty
sure I am doing something wrong in the decompression function but not
sure what. :/

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] [FTS3] The Compress and Uncompress functions

2011-07-20 Thread Abhinav Upadhyay
Hi,

I have an FTS table with compress and uncompress options enabled. I am
using zlib(3) for doing the compression. The compression function
seems to be doing ok as I can see the size of the database coming down
drastically. But I the uncompress function is not working properly.

For example if I do a query like: "select snippet(my_fts_table) from
my_fts_table where my_fts_table match 'my query'" , it should only
output the snippets of the individual matching documents. But instead
it is sending the complete document data to the output. I am not sure
if it is the fault of my uncompress function or a bug in sqlite so
showing my compress/uncompress code here. Hope to get some insight.

Thanks

Following are the functions that I have written for compress and uncompress:

/* the compress function */
void
zip(sqlite3_context *pctx, int nval, sqlite3_value **apval)
{   
const Bytef *source = sqlite3_value_text(apval[0]);
uLong sourcelen = strlen((const char *)source);
uLong destlen = (sourcelen + 12) + (int)(sourcelen + 12) * .01/100;
Bytef *dest = (Bytef *) malloc(sizeof(Bytef) * destlen);
int ret_val = compress(dest, , source, sourcelen);
if (ret_val != Z_OK) {
sqlite3_result_error(pctx, "Error in compression", -1);
}
sqlite3_result_text(pctx, (const char *)dest, -1, NULL);
return;
}

/* the uncompress function */


void
unzip(sqlite3_context *pctx, int nval, sqlite3_value **apval)
{   
int ret;
unsigned int have;
z_stream strm;
size_t bufsize;
unsigned char *in;
unsigned char out[CHUNK];
unsigned char *dest = NULL;
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
strm.next_in = NULL;
strm.avail_in = 0;

ret = inflateInit();
if (ret != Z_OK) {
fprintf(stderr, "Could not initiate decompression of 
database\n");
sqlite3_result_error(pctx, "Error in decompression", -1);
return;
}

in = (unsigned char *) sqlite3_value_text(apval[0]);
bufsize = strlen((const char *) in);
strm.next_in = in;
strm.avail_in = bufsize;

/* run inflate() on input until output buffer is not full */
do {
strm.avail_out = CHUNK;
strm.next_out = out;
ret = inflate(, Z_NO_FLUSH);

have = CHUNK - strm.avail_out;
if (concat((char **) , (const char *) strm.next_out) < 0) {
fprintf(stderr, "Error in concatnating decompressed 
stream\n");
sqlite3_result_error(pctx, "Error in decompression", 
-1);
return;
}
} while (strm.avail_out == 0);

inflateEnd();
sqlite3_result_text(pctx, (const char *)dest, -1, NULL);
return;
}

// concat is a utility function that concatenates its 2nd parameter at
the end of its Ist parameter.It returns a negative value on error.

--
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Data type of the blob returned by matchinfo()

2011-07-12 Thread Abhinav Upadhyay
Hi,

Quoting the ranking function given in the appendix of the FTS3
documentation page (http://www.sqlite.org/fts3.html#appendix_a)

static void rankfunc(sqlite3_context *pCtx, int nVal, sqlite3_value **apVal){
  int *aMatchinfo;/* Return value of matchinfo() */
...
...
aMatchinfo = (unsigned int *)sqlite3_value_blob(apVal[0]);
...
...

aMatchinfo is declared as int * and the value obtained from
sqlite3_value_blob() is being case to unsigned int *. This is causing
a compiler warning, so I am wondering what is the datatype of the
matchinfo blob (int * or unsigned int *) ? Although common sense says
it should be unsigned int *, but just wanted to confirm .

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users