Re: [sqlite] Extending Porter Tokenizer
On Fri, Jul 8, 2016 at 3:01 AM, Matthias-Christian Ott <o...@mirix.org> wrote: > On 2016-07-05 18:11, Abhinav Upadhyay wrote: >> I'm wondering if it is possible to extend the functionality of the >> porter tokenizer. I would like to use the functionality of the Porter >> tokenizer but before stemming the token, I want to decide whether the >> token should be stemmed or not. >> >> Do I need to copy the Porter tokenizer and modify it to suit my needs >> or there is a better way, to minimize code duplication? > > The first argument of the Porter tokenizer is its parent tokenizer. The > Porter tokenizer calls the parent tokenizer's xTokenize function with an > xToken function that wraps the xToken function that was passed to the > xTokenize function of the Porter tokenizer and stems the tokens passed > to it. So create a custom tokenizer that extracts the original xToken > function from the xToken member of its pCtx parameter: > > typedef struct PorterContext PorterContext; > struct PorterContext { > void *pCtx; > int (*xToken)(void *pCtx, int tflags, const char *pToken, int nToken, > int iStart, int iEnd); > char *aBuf; > }; > > typedef struct CustomTokenizer CustomTokenizer; > struct CustomTokenizer { > fts5_tokenizer tokenizer; > Fts5Tokenizer *pTokenizer; > }; > > typedef struct CustomContext CustomContext; > struct CustomContext { > void *pCtx; > int (*xToken)(void *pCtx, int tflags, const char *pToken, int nToken, > int iStart, int iEnd); > }; > > int customToken( > void *pCtx, > int tflags, > const char *pToken, > int nToken, > int iStart, > int iEnd > ){ > CustomContext *c = (CustomContext*)pCtx; > PorterContext *p; > > if( stem ){ > c->xToken(c->pCtx, tflags, pToken, nToken, iStart, iEnd); > }else{ > p = (PorterContext)c->pCtx; > return p->xToken(p->pCtx, tflags, pToken, nToken, iStart, iEnd); > } > } > > int customTokenize( > Fts5Tokenizer *pTokenizer, > void *pCtx, > int flags, > const char *pText, > int nText, > int (*xToken)(void *, int, const char *, int nToken, int iStart, > int iEnd) > ){ > CustomTokenizer *t = (CustomTokenizer)pTokenizer; > CustomContext sCtx; > sCtx.pCtx = pCtx; > sCtx.xToken = xToken; > return t->tokenizer.xTokenize(t->pTokenizer, (void*), flags, > pText, nText, customToken); > } > > Note that you are accessing an internal struct and relying on > implementation details and therefore have check whether the struct or > any other relevant implementation details changed with every release. Thanks for the detailed response. I think this would work but we are currently using FTS4. The ability of calling a parent tokenizer is really what I needed, but I don't think this is possible with FTS4? - Abhinav ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Extending Porter Tokenizer
Hi, I'm wondering if it is possible to extend the functionality of the porter tokenizer. I would like to use the functionality of the Porter tokenizer but before stemming the token, I want to decide whether the token should be stemmed or not. Do I need to copy the Porter tokenizer and modify it to suit my needs or there is a better way, to minimize code duplication? - Abhinav ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Preventing certain query keywords from getting stemmed
Hi, While running queries, sometimes there are technical keywords which shouldn't be stemmed by the tokenizer. For example, if I query for "lfs" (which is a file system), the porter stemmer, converts it to "lf", which matches many other unrelated keywords in the corpus (such as ascii lf or some other acronyms). I'm wondering if there is an option to tell the tokenizer not to stem certain keywords and take them as it is? - Abhinav ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Limit on the Compound Select Statements
On Thu, Feb 23, 2012 at 6:50 PM, Simon Slavin <slav...@bigfraud.org> wrote: > > On 23 Feb 2012, at 1:16pm, Abhinav Upadhyay <er.abhinav.upadh...@gmail.com> > wrote: > >> I do not remember the >> exact error message but it close to this. As per the documentation on >> the compound select statements >> (http://www.sqlite.org/lang_select.html) on Sqlite website, there is >> no mention of an explicit limit. I would like to know the exact limit >> on this, so that I could my code to work within this limit > > <http://www.sqlite.org/limits.html> > > especially item 3, but also others. > > However, I question the advantage of doing one long INSERT rather than doing > many inside a transaction. Are you binding parameters ? > It was already inside a bigger transaction, I was trying out something naive and turns out it is not worth it. Thanks for the pointer :) -- Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Limit on the Compound Select Statements
Hi, I have a single column table, in which I wish to store around several thousands of rows. I was wondering if I could insert them using a single INSERT query and came across this Stackoverflow answer: http://stackoverflow.com/a/1734067/348637 . According to that answer it is possible to insert multiple rows using a single query with an INSERT statement of the following form: INSERT INTO table_name SELECT 'val1' as col_name UNION SELECT 'val2' UNION SELECT 'val3'... This seems to work but in my case I get an error sometimes saying "Too many terms in the compound select statement" , I do not remember the exact error message but it close to this. As per the documentation on the compound select statements (http://www.sqlite.org/lang_select.html) on Sqlite website, there is no mention of an explicit limit. I would like to know the exact limit on this, so that I could my code to work within this limit :) Thanks Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] File checking mechanism.
On Wed, Feb 1, 2012 at 1:37 PM, bhaskarReddywrote: > > Hi Friends, > > Is there any File checking mechanism in sqilte3. > > Suppose i have a file ABCD.db, before i want to create the data > base file, i want to check whether it is already exit with the same or not. > If it is exist returns an error. > > Is there any sqlite function to do file checking mechanism. > > > Regards, > Bhaskar Reddy Just do a stat(2) on the file. (see man 2 stat) -- Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [FTS] Executing Sql statements inside a custom tokenizer
> Two FTS tables? One with the Porter stemmer, for search, one without, to > build the auxiliary tables? Yeah, that is the last option, if nothing else works. For a small set of documents the extra processing time might be ok but for a larger set of documents building the FTS tables twice might be a bit taxing. I think I will first try a custom tokenizer. So is it ok to execute SQL statements from within the tokenizer ? It would have been great if there was some way to determine whether the tokenizer code is being executed for indexing the documents or for searching the index. Thanks Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] [FTS] Executing Sql statements inside a custom tokenizer
Hi, I would like to build up a table of all the unique words occurring in my corpus (for spelling suggestion feature). Presently I am using the Porter stemming tokenizer and I would not like to stop using the stemmer at any cost. Although if I was not using the Porter stemmer then I could easily obtain the list of unique words in the corpus using the FTS4Aux module. But using the stemmer means that all the words are stored in the index in their stem form which is not desirable for building a dictionary of proper English words. One solution is to use a custom tokenizer. I was thinking of using the default Porter tokenizer supplied with Sqlite and adding some bits of code to store the token in a separate table before stemming it down. But I am not sure if it is ok to access or modify the database using Sql statements inside a tokenizer. Now that I think of it, the tokenizer code is also executed when an SQL query is performed against the FTS table (when performing search), at which time I don't want my dictionary building code to execute. So perhaps this is not a good idea. What other options do I have ? Thanks Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS3/FTS4 - Finding the term(s) that completes the input
On Mon, Nov 21, 2011 at 12:17 AM, Mohit Sindhwaniwrote: > Hi, I'm finding my way through FTS3/FTS4 to replace some of the old code > that we have for searching terms within titles. I now know that FTS3/4 > should be the way to proceed. > > So far, I have this: > - an FTS4 table that has two columns: title (main column), ext (certain > conditions to match) > - an FTS4aux table > > What I'd like to be able to do is something like this: > - let's say that the FTS4 table has values such as: > * mohit sindhwani, onghu > * john doe, gmail > * james ling, alibaba > * john barn, yahoo > ...and so on > > If the user enters "j", I'd like to suggest that this would complete to the > words: > john and james in the current set > > If the user enters 'ling j', I'd like to restrict it and say: >> james is the only word that can be matched now >> james ling, alibaba is the result > > Similarly, if the user enters 'yahoo j', I should be able to zoom in to > 'john barn, yahoo'. > > I think this should be within the reach of FTS3/FTS4, but I'm having trouble > framing the queries, etc. Are you able to nudge me in the correct > direction? > > Thanks, > Mohit. I think you might want to look at Token Prefix queries: http://sqlite.org/fts3.html#section_3 -- Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Attaching an in-memory database
On Sat, Aug 27, 2011 at 9:52 PM, Simon Slavin <slav...@bigfraud.org> wrote: > > On 27 Aug 2011, at 4:50pm, Abhinav Upadhyay wrote: > >> sqlite3_exec(db, "ATTACH DATABASE :memory AS metadb", NULL, NULL, ); > > Need a colon after 'memory': > > ATTACH DATABASE ':memory:' AS metadb; > > I don't think you need the quotes I put in, I'm not sure. > > When in doubt, use the SQLite shell tool to execute the commands you're > trying to do in your app. This will tell you whether the fault is in your > SQL coding or your C coding. It has saved me many hours of looking at the > wrong problem. Ah, that was silly of me :( . I could not see the 2nd colon in the documentation page until now :P. Looks like it is working. Thanks -- Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Attaching an in-memory database
Hi, I am trying to attach an in-memory database and then create a table in it. Here is the code that I am using: /* I have already opened a connection to a database before the following */ sqlite3_exec(db, "ATTACH DATABASE :memory AS metadb", NULL, NULL, ); if (errmsg != NULL) { warnx("%s", errmsg); free(errmsg); exit(EXIT_FAILURE); } /* I need to keep all the operations in a single transaction for better performance */ sqlite3_exec(db, "BEGIN", NULL, NULL, ); if (errmsg != NULL) { warnx("%s", errmsg); free(errmsg); exit(EXIT_FAILURE); } ... /* somewhere down the line I try to create a table in the in-memory attached database */ sqlstr = "CREATE TABLE IF NOT EXISTS metadb.file_cache(" " md5_hash, file PRIMARY KEY)"; sqlite3_exec(db, sqlstr, NULL, NULL, ); if (errmsg != NULL) { warnx("%s", errmsg); free(errmsg); exit(EXIT_FAILURE); } ... But on running the program I am getting an error exactly at the point where I am trying to execute the create table statement above: # ./makemandb Building temporary file cache makemandb: near ".": syntax error What might be wrong here ? From the documentation it seems that I should be explicitly specifying the database name if I have attached one or more databases. Thanks Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Use of VACUUM
On Fri, Aug 12, 2011 at 12:28 AM, Richard Hipp <d...@sqlite.org> wrote: > On Thu, Aug 11, 2011 at 2:48 PM, Abhinav Upadhyay < > er.abhinav.upadh...@gmail.com> wrote: > >> On Fri, Aug 12, 2011 at 12:05 AM, Michael Stephenson >> <domehead...@gmail.com> wrote: >> > If you use INTEGER PRIMARY KEY, that column becomes your rowids; this >> does >> > not create a new, separate column in addition to the rowid column. >> Indeed, but the INTEGER PRIMARY KEY column would count as a user >> defined column and thus affect the FTS search :) The FTS table has all >> text data, so I really do need to create a separate column for the >> IDs. >> > > Every FTS table has a "docid" column that is not searched, that is a unique > integer key (like rowid), and which is not modified by VACUUM. Ah, that saves the day for me :-) . Using VACUUM brings down the size of my databse by 1/3rd, so I really wanted to use it. Thanks :) Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Use of VACUUM
On Fri, Aug 12, 2011 at 12:05 AM, Michael Stephensonwrote: > If you use INTEGER PRIMARY KEY, that column becomes your rowids; this does > not create a new, separate column in addition to the rowid column. Indeed, but the INTEGER PRIMARY KEY column would count as a user defined column and thus affect the FTS search :) The FTS table has all text data, so I really do need to create a separate column for the IDs. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Use of VACUUM
On Thu, Aug 11, 2011 at 11:14 PM, Igor Tandetnik <itandet...@mvps.org> wrote: > On 8/11/2011 1:35 PM, Abhinav Upadhyay wrote: >> The documentation page of the VACUUM command says that "The VACUUM >> command may change the ROWIDs of entries in any tables that do not >> have an explicit INTEGER PRIMARY KEY." So what are the possible cases >> in which the ROWIDs might change ? > > ROWIDs might possibly change if the table doesn't have an explicit > INTEGER PRIMARY KEY column, and you run VACUUM commad on the database > containing this table. Which part of the statement you quoted do you > find unclear? I wanted to know, why would the ROWID change, but Simon's answer makes sense. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Use of VACUUM
On Thu, Aug 11, 2011 at 11:15 PM, Simon Slavin <slav...@bigfraud.org> wrote: > > On 11 Aug 2011, at 6:35pm, Abhinav Upadhyay wrote: > >> The documentation page of the VACUUM command says that "The VACUUM >> command may change the ROWIDs of entries in any tables that do not >> have an explicit INTEGER PRIMARY KEY." So what are the possible cases >> in which the ROWIDs might change ? > > Not documented. So even if someone told you what they were in this version > of SQLite, there might be other reasons in the next version. Theoretically > it might renumber rows to close up gaps in the AUTOINCREMENT. That makes sense. > As it says, to stop it all you need to do is declare one of the columns as > INTEGER PRIMARY KEY. Once you do that it assumes that you might be referring > to those values elsewhere and won't change them. Indeed, I am using the ROWID as a reference in another table. Actually I have an FTS table and I don't really want to create an explicit column for storing the IDs, as I am afraid that matches from the ID column could affect the quality of search results. But if this is the only option, then I guess I need to give it a try. I might give this column a weight of 0.0 so that it doesn't create noise in the search results. Thanks Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Use of VACUUM
Hi, The documentation page of the VACUUM command says that "The VACUUM command may change the ROWIDs of entries in any tables that do not have an explicit INTEGER PRIMARY KEY." So what are the possible cases in which the ROWIDs might change ? Thanks Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [FTS3] Understanding the Flow of data through the tokenizer
On Mon, Jul 25, 2011 at 9:54 AM, Dan Kennedy <danielk1...@gmail.com> wrote: > On 07/24/2011 08:16 PM, Abhinav Upadhyay wrote: >> Hi, >> >> I am trying to write my own custom tokenizer to filter stopwords apart >> from doing normalization and stemming. I have gone through the >> comments in fts3_tokenizer.h and also read the implementation of the >> simple tokenizer. While overall I am able to understand what I need to >> do to implement this tokenizer, but I still cannot visualize how the >> FTS engine calls the tokenizer and what data in what form it passes to >> it. >> >> Does the FTS engine pass the complete document data to the tokenizer >> or it passes some chunks of data, or individual words ? I need to >> understand this part because the next function needs to set the >> offsets accordingly. By just going through the code of the simple >> tokenizer I could not completely comprehend it (it would have been >> better if I could debug it). >> >> By the next functio I mean this: int (*xNext)( >> sqlite3_tokenizer_cursor *pCursor, /* Tokenizer cursor */ >> const char **ppToken, int *pnBytes, /* OUT: Normalized text for token >> */ >> int *piStartOffset, /* OUT: Byte offset of token in input buffer */ >> int *piEndOffset, /* OUT: Byte offset of end of token in input >> buffer */ >> int *piPosition /* OUT: Number of tokens returned before this one >> */ >> ); >> }; >> >> It would be better if you could explain what is the role of these >> parameters: piEndOffset , piStartOffset ? > > Each time xNext() returns SQLITE_OK to return a new token, xNext() > should set: > > *piStartOffset to the number of bytes in the input buffer before > start of the token being returned, > > *piEndOffset to *piStartOffset plus the number of bytes in the > token text, and > > *piPosition to the number of tokens that occur in the input buffer > before the token being returned. Thanks for the explanation. I was able to correct my implementation :-) . ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] [FTS3] Understanding the Flow of data through the tokenizer
Hi, I am trying to write my own custom tokenizer to filter stopwords apart from doing normalization and stemming. I have gone through the comments in fts3_tokenizer.h and also read the implementation of the simple tokenizer. While overall I am able to understand what I need to do to implement this tokenizer, but I still cannot visualize how the FTS engine calls the tokenizer and what data in what form it passes to it. Does the FTS engine pass the complete document data to the tokenizer or it passes some chunks of data, or individual words ? I need to understand this part because the next function needs to set the offsets accordingly. By just going through the code of the simple tokenizer I could not completely comprehend it (it would have been better if I could debug it). By the next functio I mean this: int (*xNext)( sqlite3_tokenizer_cursor *pCursor, /* Tokenizer cursor */ const char **ppToken, int *pnBytes, /* OUT: Normalized text for token */ int *piStartOffset, /* OUT: Byte offset of token in input buffer */ int *piEndOffset,/* OUT: Byte offset of end of token in input buffer */ int *piPosition /* OUT: Number of tokens returned before this one */ ); }; It would be better if you could explain what is the role of these parameters: piEndOffset , piStartOffset ? Thanks Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [FTS3] Header to include for a custom tokenizer
On Sun, Jul 24, 2011 at 1:40 AM, Abhinav Upadhyay <er.abhinav.upadh...@gmail.com> wrote: > On Sat, Jul 23, 2011 at 11:00 PM, Richard Hipp <d...@sqlite.org> wrote: >> On Sat, Jul 23, 2011 at 1:01 PM, Abhinav Upadhyay < >> er.abhinav.upadh...@gmail.com> wrote: >> >>> Hi, >>> >>> I am using the Sqlite3 amalgamation. I am trying to register a custom >>> tokenizer with sqlite for my FTS application. The custom tokenizer is >>> in it's separate source file. I have included sqlite3.h header with >>> the tokenizer source but sqlite3.h does not contain the declaration of >>> the various structs like sqlite3_tokenizer_module etc. So what is the >>> usual way to resolve this ? These declarations are also not there in >>> sqlite3ext.h . Although I see them there in sqlite3.c but I am not >>> sure I want to include it ? What is the usual way to resolve this ? >>> May be import fts3_tokenizer.h from the sqlite3 source ? >>> >> >> Yes. Use fts3_tokenizer.h. > > Thanks, That worked. Now, I am able to compile everything. > > Next, I am having problem with using this tokenizer. I followed the > exampple from the FTS3 documentation page on the website and > registered the tokenizer using code like this: > > const sqlite3_tokenizer_module *stopword_tokenizer_module; > > sqlstr = "select fts3_tokenizer(:tokenizer_name, :tokenizer_ptr)"; > rc = sqlite3_prepare_v2(db, sqlstr, -1, , NULL); > if (rc != SQLITE_OK) { > sqlite3_close(db); > sqlite3_shutdown(); > return -1; > } > > idx = sqlite3_bind_parameter_index(stmt, ":tokenizer_name"); > rc = sqlite3_bind_text(stmt, idx, "my_tokenizer", -1, NULL); > if (rc != SQLITE_OK) { > fprintf(stderr, "%s\n", sqlite3_errmsg(db)); > sqlite3_finalize(stmt); > return -1; > } > > sqlite3Fts3MyTokenizerModule((const sqlite3_tokenizer_module > **)_tokenizer_module); > idx = sqlite3_bind_parameter_index(stmt, ":tokenizer_ptr"); > rc = sqlite3_bind_blob(stmt, idx, _tokenizer_module, > sizeof(my_tokenizer_module), SQLITE_STATIC); > if (rc != SQLITE_OK) { > fprintf(stderr, "%s\n", sqlite3_errmsg(db)); > sqlite3_finalize(stmt); > return -1; > } > rc = sqlite3_step(stmt); > if (rc != SQLITE_ROW) { > fprintf(stderr, "%s\n", sqlite3_errmsg(db)); > sqlite3_finalize(stmt); > return -1; > } > sqlite3_finalize(stmt); > > It is working fine till above. After executing the above statements, I > try to create an FTS table using this custom tokenizer, which also > seem to be getting created. The problem is coming up when I try to > insert data in the table. A simple statement like "insert into > my_table values(...)" is giving out errors: > > unknown tokenizer: my_tokenizer > > I am sure I am missing something here, but don't know what ? > > Thanks > Nevermind. Seems like I need to register the tokenizer everytime I try to query the database or store the values. Thanks ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [FTS3] Header to include for a custom tokenizer
On Sat, Jul 23, 2011 at 11:00 PM, Richard Hipp <d...@sqlite.org> wrote: > On Sat, Jul 23, 2011 at 1:01 PM, Abhinav Upadhyay < > er.abhinav.upadh...@gmail.com> wrote: > >> Hi, >> >> I am using the Sqlite3 amalgamation. I am trying to register a custom >> tokenizer with sqlite for my FTS application. The custom tokenizer is >> in it's separate source file. I have included sqlite3.h header with >> the tokenizer source but sqlite3.h does not contain the declaration of >> the various structs like sqlite3_tokenizer_module etc. So what is the >> usual way to resolve this ? These declarations are also not there in >> sqlite3ext.h . Although I see them there in sqlite3.c but I am not >> sure I want to include it ? What is the usual way to resolve this ? >> May be import fts3_tokenizer.h from the sqlite3 source ? >> > > Yes. Use fts3_tokenizer.h. Thanks, That worked. Now, I am able to compile everything. Next, I am having problem with using this tokenizer. I followed the exampple from the FTS3 documentation page on the website and registered the tokenizer using code like this: const sqlite3_tokenizer_module *stopword_tokenizer_module; sqlstr = "select fts3_tokenizer(:tokenizer_name, :tokenizer_ptr)"; rc = sqlite3_prepare_v2(db, sqlstr, -1, , NULL); if (rc != SQLITE_OK) { sqlite3_close(db); sqlite3_shutdown(); return -1; } idx = sqlite3_bind_parameter_index(stmt, ":tokenizer_name"); rc = sqlite3_bind_text(stmt, idx, "my_tokenizer", -1, NULL); if (rc != SQLITE_OK) { fprintf(stderr, "%s\n", sqlite3_errmsg(db)); sqlite3_finalize(stmt); return -1; } sqlite3Fts3MyTokenizerModule((const sqlite3_tokenizer_module **)_tokenizer_module); idx = sqlite3_bind_parameter_index(stmt, ":tokenizer_ptr"); rc = sqlite3_bind_blob(stmt, idx, _tokenizer_module, sizeof(my_tokenizer_module), SQLITE_STATIC); if (rc != SQLITE_OK) { fprintf(stderr, "%s\n", sqlite3_errmsg(db)); sqlite3_finalize(stmt); return -1; } rc = sqlite3_step(stmt); if (rc != SQLITE_ROW) { fprintf(stderr, "%s\n", sqlite3_errmsg(db)); sqlite3_finalize(stmt); return -1; } sqlite3_finalize(stmt); It is working fine till above. After executing the above statements, I try to create an FTS table using this custom tokenizer, which also seem to be getting created. The problem is coming up when I try to insert data in the table. A simple statement like "insert into my_table values(...)" is giving out errors: unknown tokenizer: my_tokenizer I am sure I am missing something here, but don't know what ? Thanks ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] [FTS3] Header to include for a custom tokenizer
Hi, I am using the Sqlite3 amalgamation. I am trying to register a custom tokenizer with sqlite for my FTS application. The custom tokenizer is in it's separate source file. I have included sqlite3.h header with the tokenizer source but sqlite3.h does not contain the declaration of the various structs like sqlite3_tokenizer_module etc. So what is the usual way to resolve this ? These declarations are also not there in sqlite3ext.h . Although I see them there in sqlite3.c but I am not sure I want to include it ? What is the usual way to resolve this ? May be import fts3_tokenizer.h from the sqlite3 source ? Thanks Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [FTS3] The Compress and Uncompress functions and extension
On Fri, Jul 22, 2011 at 1:32 PM, Abhinav Upadhyay <er.abhinav.upadh...@gmail.com> wrote: > On Fri, Jul 22, 2011 at 12:38 PM, Alexey Pechnikov > <pechni...@mobigroup.ru> wrote: >> But why you don't use compress/uncompress functions from DRH? See >> http://www.mail-archive.com/sqlite-users%40sqlite.org/msg17018.html >> >> I did wrap these into extension and add SQLITE_COMPRESS_MIN_LENGTH >> http://sqlite.mobigroup.ru/artifact/a5da96353bb851b34114052ba85041fdffb725cd >> http://sqlite.mobigroup.ru/artifact/56df1be3c402d7d49c3a13be704a2ff22c3003d2 >> >> http://sqlite.mobigroup.ru/dir?name=ext/compress >> > > Thanks for pointing out that mail archive discussion. I wasn't using > compress/uncompress because uncompress requires you to store the size > of the compressed buffer which is returned by the compress function > while compressing. But that email discussion suggests a nifty trick to > overcome this. > > I implemented the compress/uncompress functions as suggested by > RIchard in the email, however, it is still not working for me. The > database is getting compressed fine, but there is problem with > decompression. I seem to be getting null values for the column values > in the result set of my query. > > For example: > > $./my_program "print a document" > > (null)((null)) > (null) > > The above are three column values that I am trying select in my query. > "pring a document" is a query that I tried to execute. Although this > is definitely an improvement from before :) > Sorry, it was my mistake. I had done a small error in calling uncompress. Now it seems to be working fine. Thanks a ton for the help. You saved my day :-) -- Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [FTS3] The Compress and Uncompress functions and extension
On Fri, Jul 22, 2011 at 12:38 PM, Alexey Pechnikovwrote: > But why you don't use compress/uncompress functions from DRH? See > http://www.mail-archive.com/sqlite-users%40sqlite.org/msg17018.html > > I did wrap these into extension and add SQLITE_COMPRESS_MIN_LENGTH > http://sqlite.mobigroup.ru/artifact/a5da96353bb851b34114052ba85041fdffb725cd > http://sqlite.mobigroup.ru/artifact/56df1be3c402d7d49c3a13be704a2ff22c3003d2 > > http://sqlite.mobigroup.ru/dir?name=ext/compress > Thanks for pointing out that mail archive discussion. I wasn't using compress/uncompress because uncompress requires you to store the size of the compressed buffer which is returned by the compress function while compressing. But that email discussion suggests a nifty trick to overcome this. I implemented the compress/uncompress functions as suggested by RIchard in the email, however, it is still not working for me. The database is getting compressed fine, but there is problem with decompression. I seem to be getting null values for the column values in the result set of my query. For example: $./my_program "print a document" (null)((null)) (null) The above are three column values that I am trying select in my query. "pring a document" is a query that I tried to execute. Although this is definitely an improvement from before :) Thanks Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [FTS3] The Compress and Uncompress functions
On Wed, Jul 20, 2011 at 7:51 PM, Abhinav Upadhyay <er.abhinav.upadh...@gmail.com> wrote: > Hi, > > I have an FTS table with compress and uncompress options enabled. I am > using zlib(3) for doing the compression. The compression function > seems to be doing ok as I can see the size of the database coming down > drastically. But I the uncompress function is not working properly. > > For example if I do a query like: "select snippet(my_fts_table) from > my_fts_table where my_fts_table match 'my query'" , it should only > output the snippets of the individual matching documents. But instead > it is sending the complete document data to the output. I am not sure > if it is the fault of my uncompress function or a bug in sqlite so > showing my compress/uncompress code here. Hope to get some insight. > > Thanks > > Following are the functions that I have written for compress and uncompress: > > /* the compress function */ > void > zip(sqlite3_context *pctx, int nval, sqlite3_value **apval) > { > const Bytef *source = sqlite3_value_text(apval[0]); > uLong sourcelen = strlen((const char *)source); > uLong destlen = (sourcelen + 12) + (int)(sourcelen + 12) * .01/100; > Bytef *dest = (Bytef *) malloc(sizeof(Bytef) * destlen); > int ret_val = compress(dest, , source, sourcelen); > if (ret_val != Z_OK) { > sqlite3_result_error(pctx, "Error in compression", -1); > } > sqlite3_result_text(pctx, (const char *)dest, -1, NULL); > return; > } > > /* the uncompress function */ > > > void > unzip(sqlite3_context *pctx, int nval, sqlite3_value **apval) > { > int ret; > unsigned int have; > z_stream strm; > size_t bufsize; > unsigned char *in; > unsigned char out[CHUNK]; > unsigned char *dest = NULL; > strm.zalloc = Z_NULL; > strm.zfree = Z_NULL; > strm.opaque = Z_NULL; > strm.next_in = NULL; > strm.avail_in = 0; > > ret = inflateInit(); > if (ret != Z_OK) { > fprintf(stderr, "Could not initiate decompression of > database\n"); > sqlite3_result_error(pctx, "Error in decompression", -1); > return; > } > > in = (unsigned char *) sqlite3_value_text(apval[0]); > bufsize = strlen((const char *) in); > strm.next_in = in; > strm.avail_in = bufsize; > > /* run inflate() on input until output buffer is not full */ > do { > strm.avail_out = CHUNK; > strm.next_out = out; > ret = inflate(, Z_NO_FLUSH); > > have = CHUNK - strm.avail_out; > if (concat((char **) , (const char *) strm.next_out) < 0) > { > fprintf(stderr, "Error in concatnating decompressed > stream\n"); > sqlite3_result_error(pctx, "Error in decompression", > -1); > return; > } > } while (strm.avail_out == 0); > > inflateEnd(); > sqlite3_result_text(pctx, (const char *)dest, -1, NULL); > return; > } > > // concat is a utility function that concatenates its 2nd parameter at > the end of its Ist parameter.It returns a negative value on error. > Just to add to what I said before. On executing a query from within my application it is sending the complete indexed data of the matched documents. So actually, first it sends this output to stderr, and then the same output to stdoutput. This I observed just a while ago. I am pretty sure I am doing something wrong in the decompression function but not sure what. :/ Thanks Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] [FTS3] The Compress and Uncompress functions
Hi, I have an FTS table with compress and uncompress options enabled. I am using zlib(3) for doing the compression. The compression function seems to be doing ok as I can see the size of the database coming down drastically. But I the uncompress function is not working properly. For example if I do a query like: "select snippet(my_fts_table) from my_fts_table where my_fts_table match 'my query'" , it should only output the snippets of the individual matching documents. But instead it is sending the complete document data to the output. I am not sure if it is the fault of my uncompress function or a bug in sqlite so showing my compress/uncompress code here. Hope to get some insight. Thanks Following are the functions that I have written for compress and uncompress: /* the compress function */ void zip(sqlite3_context *pctx, int nval, sqlite3_value **apval) { const Bytef *source = sqlite3_value_text(apval[0]); uLong sourcelen = strlen((const char *)source); uLong destlen = (sourcelen + 12) + (int)(sourcelen + 12) * .01/100; Bytef *dest = (Bytef *) malloc(sizeof(Bytef) * destlen); int ret_val = compress(dest, , source, sourcelen); if (ret_val != Z_OK) { sqlite3_result_error(pctx, "Error in compression", -1); } sqlite3_result_text(pctx, (const char *)dest, -1, NULL); return; } /* the uncompress function */ void unzip(sqlite3_context *pctx, int nval, sqlite3_value **apval) { int ret; unsigned int have; z_stream strm; size_t bufsize; unsigned char *in; unsigned char out[CHUNK]; unsigned char *dest = NULL; strm.zalloc = Z_NULL; strm.zfree = Z_NULL; strm.opaque = Z_NULL; strm.next_in = NULL; strm.avail_in = 0; ret = inflateInit(); if (ret != Z_OK) { fprintf(stderr, "Could not initiate decompression of database\n"); sqlite3_result_error(pctx, "Error in decompression", -1); return; } in = (unsigned char *) sqlite3_value_text(apval[0]); bufsize = strlen((const char *) in); strm.next_in = in; strm.avail_in = bufsize; /* run inflate() on input until output buffer is not full */ do { strm.avail_out = CHUNK; strm.next_out = out; ret = inflate(, Z_NO_FLUSH); have = CHUNK - strm.avail_out; if (concat((char **) , (const char *) strm.next_out) < 0) { fprintf(stderr, "Error in concatnating decompressed stream\n"); sqlite3_result_error(pctx, "Error in decompression", -1); return; } } while (strm.avail_out == 0); inflateEnd(); sqlite3_result_text(pctx, (const char *)dest, -1, NULL); return; } // concat is a utility function that concatenates its 2nd parameter at the end of its Ist parameter.It returns a negative value on error. -- Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Data type of the blob returned by matchinfo()
Hi, Quoting the ranking function given in the appendix of the FTS3 documentation page (http://www.sqlite.org/fts3.html#appendix_a) static void rankfunc(sqlite3_context *pCtx, int nVal, sqlite3_value **apVal){ int *aMatchinfo;/* Return value of matchinfo() */ ... ... aMatchinfo = (unsigned int *)sqlite3_value_blob(apVal[0]); ... ... aMatchinfo is declared as int * and the value obtained from sqlite3_value_blob() is being case to unsigned int *. This is causing a compiler warning, so I am wondering what is the datatype of the matchinfo blob (int * or unsigned int *) ? Although common sense says it should be unsigned int *, but just wanted to confirm . Thanks Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users