Re: [GENERAL] TSearch2 / German compound words / UTF-8

2006-02-18 Thread Teodor Sigaev
Hmm, I have found a small bug: When there is a compound affix with zero length of search pattern (which should not be!), ispell dictionary ignores all other compound affixes. Original afix file contains flag ~\`: E > -E,NINGS#~ avskrive > avskrivnings- Z Y Z Y

Re: [GENERAL] TSearch2 / German compound words / UTF-8

2006-02-17 Thread Oleg Bartunov
Norwegian (Nynorsk and Bokmaal) ispell dictionaries are available from http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ I didn't test them. Oleg On Fri, 17 Feb 2006, Teodor Sigaev wrote: Very strange... ~% file tsearch/dict/ispell_no/norwegian.dict tsearch/dict/ispell_no/n

Re: [GENERAL] TSearch2 / German compound words / UTF-8

2006-02-17 Thread Teodor Sigaev
BTW, if you take norwegian dictionary from http://folk.uio.no/runekl/dictionary.html then try to build it from OpenOffice sources (http://lingucomponent.openoffice.org/spell_dic.html, tsearch2/my2ispell). I found mails in my archive which says that norwegian people prefer OpenOffice's one. --

Re: [GENERAL] TSearch2 / German compound words / UTF-8

2006-02-17 Thread Teodor Sigaev
Very strange... ~% file tsearch/dict/ispell_no/norwegian.dict tsearch/dict/ispell_no/norwegian.dict: ISO-8859 C program text ~% file tsearch/dict/ispell_no/norwegian.aff tsearch/dict/ispell_no/norwegian.aff: ISO-8859 English text Can you place that files anywhere wher I can download it

Re: [GENERAL] TSearch2 / German compound words / UTF-8

2006-02-17 Thread Alexander Presber
Hello, Thanks for your efforts, I still don't get it to work. I now tried the norwegian example. My encoding is ISO-8859 (I never used UTF-8, because I thought it would be slower, the thread name is a bit misleading). So I am using an ISO-8859-9 database: ~/cvs/ssd% psql -l Name

Re: [GENERAL] TSearch2 / German compound words / UTF-8

2006-01-30 Thread Mike Rylander
On 1/30/06, Oleg Bartunov wrote: > On Fri, 27 Jan 2006, Harald Armin Massa wrote: > > > Teodor, > > > >> > >> To all: May be, we should put all snowball's stemmers (for all available > >> languages and encodings) to tsearch2 directory? > > > > > > Yes, that would be VERY helpfull. Up to now I do n

Re: [GENERAL] TSearch2 / German compound words / UTF-8

2006-01-30 Thread Oleg Bartunov
On Fri, 27 Jan 2006, Harald Armin Massa wrote: Teodor, To all: May be, we should put all snowball's stemmers (for all available languages and encodings) to tsearch2 directory? Yes, that would be VERY helpfull. Up to now I do not dare to use tsearch2 because "get stemmer here, get dictionar

Re: [GENERAL] TSearch2 / German compound words / UTF-8

2006-01-27 Thread Teodor Sigaev
contrib_regression=# insert into pg_ts_dict values ( 'norwegian_ispell', (select dict_init from pg_ts_dict where dict_name='ispell_template'), 'DictFile="/usr/local/share/ispell/norsk.dict" ,' 'AffFile ="/usr/local/share/ispell/norsk.aff"', (select d

Re: [GENERAL] TSearch2 / German compound words / UTF-8

2006-01-27 Thread Harald Armin Massa
Teodor,To all: May be, we should put all snowball's stemmers (for all available languages and encodings) to tsearch2 directory?Yes, that would be VERY helpfull. Up to now I do not dare to use tsearch2 because "get stemmer here, get dictionary there..."Harald -- GHUM Harald Massapersuadere et progra

Re: [GENERAL] TSearch2 / German compound words / UTF-8

2006-01-27 Thread Oleg Bartunov
Alexander, could you try tsearch2 from CVS HEAD ? tsearch2 in 8.1.X doesn't supports UTF-8 and works for someone only by accident :) Oleg On Fri, 27 Jan 2006, Alexander Presber wrote: Tsearch/isepll is not able to break this word into parts, because of the "s" in "Produktion/s/interva

Re: [GENERAL] TSearch2 / German compound words / UTF-8

2006-01-27 Thread Alexander Presber
I should add that, with the minimal dictionary and .aff file, "vertrags" gets reduced alright, dropping the trailing 's': tstest=# SELECT tsearch2.ts_debug('vertrags'); ts_debug - (german,lword,"La

Re: [GENERAL] TSearch2 / German compound words / UTF-8

2006-01-27 Thread Alexander Presber
Tsearch/isepll is not able to break this word into parts, because of the "s" in "Produktion/s/intervall". Misspelling the word as "Produktionintervall" fixes it: It should be affixes marked as 'affix in middle of compound word', Flag is '~', example look in norsk dictionary: flag ~\\: [^S

Re: [GENERAL] TSearch2 / German compound words / UTF-8

2005-11-23 Thread Oleg Bartunov
On Wed, 23 Nov 2005, Hannes Dorbath wrote: Hi, I'm on PG 8.0.4, initDB and locale set to de_DE.UTF-8, FreeBSD. My TSearch config is based on "Tsearch2 and Unicode/UTF-8" by Markus Wollny (http://tinyurl.com/a6po4). The following files are used: http://hannes.imos.net/german.med [U

Re: [GENERAL] TSearch2 / German compound words / UTF-8

2005-11-23 Thread Teodor Sigaev
Tsearch/isepll is not able to break this word into parts, because of the "s" in "Produktion/s/intervall". Misspelling the word as "Produktionintervall" fixes it: It should be affixes marked as 'affix in middle of compound word', Flag is '~', example look in norsk dictionary: flag ~\\: [^S]

[GENERAL] TSearch2 / German compound words / UTF-8

2005-11-23 Thread Hannes Dorbath
Hi, I'm on PG 8.0.4, initDB and locale set to de_DE.UTF-8, FreeBSD. My TSearch config is based on "Tsearch2 and Unicode/UTF-8" by Markus Wollny (http://tinyurl.com/a6po4). The following files are used: http://hannes.imos.net/german.med [UTF-8] http://hannes.imos.net/german.aff