On 2016-07-05 18:11, Abhinav Upadhyay wrote: > I'm wondering if it is possible to extend the functionality of the > porter tokenizer. I would like to use the functionality of the Porter > tokenizer but before stemming the token, I want to decide whether the > token should be stemmed or not. > > Do I need to copy the Porter tokenizer and modify it to suit my needs > or there is a better way, to minimize code duplication?
The first argument of the Porter tokenizer is its parent tokenizer. The Porter tokenizer calls the parent tokenizer's xTokenize function with an xToken function that wraps the xToken function that was passed to the xTokenize function of the Porter tokenizer and stems the tokens passed to it. So create a custom tokenizer that extracts the original xToken function from the xToken member of its pCtx parameter: typedef struct PorterContext PorterContext; struct PorterContext { void *pCtx; int (*xToken)(void *pCtx, int tflags, const char *pToken, int nToken, int iStart, int iEnd); char *aBuf; }; typedef struct CustomTokenizer CustomTokenizer; struct CustomTokenizer { fts5_tokenizer tokenizer; Fts5Tokenizer *pTokenizer; }; typedef struct CustomContext CustomContext; struct CustomContext { void *pCtx; int (*xToken)(void *pCtx, int tflags, const char *pToken, int nToken, int iStart, int iEnd); }; int customToken( void *pCtx, int tflags, const char *pToken, int nToken, int iStart, int iEnd ){ CustomContext *c = (CustomContext*)pCtx; PorterContext *p; if( stem ){ c->xToken(c->pCtx, tflags, pToken, nToken, iStart, iEnd); }else{ p = (PorterContext)c->pCtx; return p->xToken(p->pCtx, tflags, pToken, nToken, iStart, iEnd); } } int customTokenize( Fts5Tokenizer *pTokenizer, void *pCtx, int flags, const char *pText, int nText, int (*xToken)(void *, int, const char *, int nToken, int iStart, int iEnd) ){ CustomTokenizer *t = (CustomTokenizer)pTokenizer; CustomContext sCtx; sCtx.pCtx = pCtx; sCtx.xToken = xToken; return t->tokenizer.xTokenize(t->pTokenizer, (void*)&sCtx, flags, pText, nText, customToken); } Note that you are accessing an internal struct and relying on implementation details and therefore have check whether the struct or any other relevant implementation details changed with every release. - Matthias-Christian _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users