Dominique Pelle wrote: > Vim-7.3.107 crashes when I try to create the Esperanto > dictionary from OpenOffice-3. > > Steps to reproduce: > > $ wget http://extensions.services.openoffice.org/e-files/3377/1/1.0-dev.oxt > $ unzip 1.0-dev.oxt > $ vim -u NONE --noplugin -c 'set nomore' -c 'mkspell! /tmp/eo literumilo' > Vim: Caught deadly signal SEGV > > Vim: Finished. > Segmentation fault (core dumped) > > > Valgrind memory checker gives errors: > > ==3877== Invalid write of size 1 > ==3877== at 0x4026087: strcat (mc_replace_strmem.c:176) > ==3877== by 0x818AF60: spell_read_aff (spell.c:5493) > ==3877== by 0x819359D: mkspell (spell.c:9234) > ==3877== by 0x8191F53: ex_mkspell (spell.c:8557) > ==3877== by 0x80A81AC: do_one_cmd (ex_docmd.c:2657) > ==3877== by 0x80A5A85: do_cmdline (ex_docmd.c:1123) > ==3877== by 0x80A513F: do_cmdline_cmd (ex_docmd.c:728) > ==3877== by 0x80EAEE5: exe_commands (main.c:2803) > ==3877== by 0x80E85CA: main (main.c:881) > ==3877== Address 0x7e3b434 is 0 bytes after a block of size 16,012 alloc'd > ==3877== at 0x4025230: malloc (vg_replace_malloc.c:236) > ==3877== by 0x8117B93: lalloc (misc2.c:918) > ==3877== by 0x8117ACB: alloc_clear (misc2.c:829) > ==3877== by 0x818F951: getroom (spell.c:7368) > ==3877== by 0x818AEFA: spell_read_aff (spell.c:5485) > ==3877== by 0x819359D: mkspell (spell.c:9234) > ==3877== by 0x8191F53: ex_mkspell (spell.c:8557) > ==3877== by 0x80A81AC: do_one_cmd (ex_docmd.c:2657) > ==3877== by 0x80A5A85: do_cmdline (ex_docmd.c:1123) > ==3877== by 0x80A513F: do_cmdline_cmd (ex_docmd.c:728) > ==3877== by 0x80EAEE5: exe_commands (main.c:2803) > ==3877== by 0x80E85CA: main (main.c:881) > (several other errors after that) > > spell.c: > > 5478 else if (is_aff_rule(items, itemcnt, "COMPOUNDRULE", 2)) > 5479 { > 5480 /* Concatenate this string to previously defined ones, using a > 5481 * slash to separate them. */ > 5482 l = (int)STRLEN(items[1]) + 1; > 5483 if (compflags != NULL) > 5484 l += (int)STRLEN(compflags) + 1; > 5485 p = getroom(spin, l, FALSE); > 5486 if (p != NULL) > 5487 { > 5488 if (compflags != NULL) > 5489 { > 5490 STRCPY(p, compflags); > 5491 STRCAT(p, "/"); > 5492 } > !!5493 STRCAT(p, items[1]); > 5494 compflags = p; > 5495 } > 5496 } > > When it crashes, I notice that variable l reaches 16005 which is just > slightly bigger than SBLOCKSIZE (#define SBLOCKSIZE 16000 at > spell.c:4885).
First thing to fix would be to add a check in getroom() for a length more than what can be handled. > I changed SBLOCKSIZE from 16000 to 1024000 at spell.c:4885 > and it longer crashes but that's quite a dramatic increase so I > doubt whether that's right. While creating the dictionary, > l variable at spell.c:5485 reached l=564,458. Pfew, that's a lot of compound rules. Did you check in the affix file that this is actually there? I suppose this must be generated, no human would type this. A solution would be to make compflags a growarray. The way it's done now it is re-allocated for each COMPOUNDRULE, very inefficient. This was made with the assumption there would be only a few COMPOUNDRULEs. > Vim temporary used 12.5 Gb of memory while creating the > dictionary which is quite a lot. Compound rules are indeed a problem, since Vim expands them into all possible words to be able to build the trie. It compresses really well, but lots of memory is needed before the compression happens. > Size of the created dictionary file 'eo.utf-8.spl' is 587,214 bytes > and it does not work. Trying to use it with... > > $ cp /tmp/eo.utf-8.spl ~/.vim/spell/. > > ... then in Vim... > :setlocal spell spelllang=eo > > ... gives errors after waiting for ~10 sec or so: > > Error detected while processing /home/pel/.vim/spell/eo.utf-8.spl: > E339: Pattern too long > E759: Format error in spell file > > ":help E339" says: "This only happens on systems with 16 bit ints". > > Well, that's not true since I get E339 on Linux x86_64 where > sizeof(int) is 4 (32 bits). > > E339 at regexp.c:1059 is in between SMALL_MALLOC > but E339 at regexp.c:1077 is not in between SMALL_MALLOC. > > Either help file needs to be updated for E339 or regexp.c > needs to handle longer regexes. The help is wrong, it can also happen when an offset gets too big. > Problem is triggered by the fact that file 'literumilo.aff' contains > many COMPOUNDRULE lines (Esperanto being an agglutinative > language). Did you check which regcomp() call results in E339? > Esperanto dictionary from OpenOffice-2.x as currently used by > Vim works fine. -- hundred-and-one symptoms of being an internet addict: 168. You have your own domain name. /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net \\\ /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\ an exciting new programming language -- http://www.Zimbu.org /// \\\ help me help AIDS victims -- http://ICCF-Holland.org /// -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php
