> Should we check for stop words before stemming or after ?
Current implementation supports both variants. Look dictionary interface
definition in morph.c:
typedef struct
{
char localename[NAMEDATALEN];
/* init dictionary */
void *(*init) (void);
/* close dictionary */
void (*close) (void *);
/* find in dictionary */
char *(*lemmatize) (void *, char *, int *);
int (*is_stoplemm) (void *, char *, int);
int (*is_stemstoplemm) (void *, char *, int);
} DICT;
'is_stoplemm' method is called before 'lemmtize' and 'is_stemstoplemm' after.
dict/porter_english.dct at the end:
TABLE_DICT_START
"C",
setup_english_stemmer,
closedown_english_stemmer,
engstemming,
NULL,
is_stopengword
TABLE_DICT_END
dict/russian_stemming.dct:
TABLE_DICT_START
"ru_RU.KOI8-R",
NULL,
NULL,
ru_RUKOI8R_stem,
ru_RUKOI8R_is_stopword,
NULL
TABLE_DICT_END
So english stemmer defines is lexem stop or not after stemming, but russian before.
--
Teodor Sigaev
[EMAIL PROTECTED]
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster