Bram Moolenaar wrote:
> Dominique Pelle wrote:
>
>> Vim-7.3.107 crashes when I try to create the Esperanto
>> dictionary from OpenOffice-3.
>>
>> Steps to reproduce:
>>
>> $ wget http://extensions.services.openoffice.org/e-files/3377/1/1.0-dev.oxt
>> $ unzip 1.0-dev.oxt
>> $ vim -u NONE --noplugin -c 'set nomore' -c 'mkspell! /tmp/eo literumilo'
>> Vim: Caught deadly signal SEGV
>>
>> Vim: Finished.
>> Segmentation fault (core dumped)
>>
>>
>> Valgrind memory checker gives errors:
>>
>> ==3877== Invalid write of size 1
>> ==3877== at 0x4026087: strcat (mc_replace_strmem.c:176)
>> ==3877== by 0x818AF60: spell_read_aff (spell.c:5493)
>> ==3877== by 0x819359D: mkspell (spell.c:9234)
>> ==3877== by 0x8191F53: ex_mkspell (spell.c:8557)
>> ==3877== by 0x80A81AC: do_one_cmd (ex_docmd.c:2657)
>> ==3877== by 0x80A5A85: do_cmdline (ex_docmd.c:1123)
>> ==3877== by 0x80A513F: do_cmdline_cmd (ex_docmd.c:728)
>> ==3877== by 0x80EAEE5: exe_commands (main.c:2803)
>> ==3877== by 0x80E85CA: main (main.c:881)
>> ==3877== Address 0x7e3b434 is 0 bytes after a block of size 16,012 alloc'd
>> ==3877== at 0x4025230: malloc (vg_replace_malloc.c:236)
>> ==3877== by 0x8117B93: lalloc (misc2.c:918)
>> ==3877== by 0x8117ACB: alloc_clear (misc2.c:829)
>> ==3877== by 0x818F951: getroom (spell.c:7368)
>> ==3877== by 0x818AEFA: spell_read_aff (spell.c:5485)
>> ==3877== by 0x819359D: mkspell (spell.c:9234)
>> ==3877== by 0x8191F53: ex_mkspell (spell.c:8557)
>> ==3877== by 0x80A81AC: do_one_cmd (ex_docmd.c:2657)
>> ==3877== by 0x80A5A85: do_cmdline (ex_docmd.c:1123)
>> ==3877== by 0x80A513F: do_cmdline_cmd (ex_docmd.c:728)
>> ==3877== by 0x80EAEE5: exe_commands (main.c:2803)
>> ==3877== by 0x80E85CA: main (main.c:881)
>> (several other errors after that)
>>
>> spell.c:
>>
>> 5478 else if (is_aff_rule(items, itemcnt, "COMPOUNDRULE", 2))
>> 5479 {
>> 5480 /* Concatenate this string to previously defined ones, using a
>> 5481 * slash to separate them. */
>> 5482 l = (int)STRLEN(items[1]) + 1;
>> 5483 if (compflags != NULL)
>> 5484 l += (int)STRLEN(compflags) + 1;
>> 5485 p = getroom(spin, l, FALSE);
>> 5486 if (p != NULL)
>> 5487 {
>> 5488 if (compflags != NULL)
>> 5489 {
>> 5490 STRCPY(p, compflags);
>> 5491 STRCAT(p, "/");
>> 5492 }
>> !!5493 STRCAT(p, items[1]);
>> 5494 compflags = p;
>> 5495 }
>> 5496 }
>>
>> When it crashes, I notice that variable l reaches 16005 which is just
>> slightly bigger than SBLOCKSIZE (#define SBLOCKSIZE 16000 at
>> spell.c:4885).
>
> First thing to fix would be to add a check in getroom() for a length
> more than what can be handled.
I tried adding a check to avoid allocating more than SBLOCKSIZE
in getroom().
$ hg diff spell.c
diff -r 26fb122355d4 src/spell.c
--- a/src/spell.c Sat Jan 22 21:25:11 2011 +0100
+++ b/src/spell.c Mon Jan 31 21:12:13 2011 +0100
@@ -7364,6 +7364,9 @@
if (bl == NULL || bl->sb_used + len > SBLOCKSIZE)
{
+ if (len > SBLOCKSIZE)
+ return NULL;
+
/* Allocate a block of memory. This is not freed until much later. */
bl = (sblock_T *)alloc_clear((unsigned)(sizeof(sblock_T) + SBLOCKSIZE));
if (bl == NULL)
A dictionary is then created without crash but it's crude:
many (and most) COMPOUNDRULE are skipped without
warning when creating the dictionary. I did not try to improve
to give a warning since dictionary is bad anyway (:spelldump
shows very few words). It's probably best to not create a
dictionary at all when getroom() returns NULL.
>> I changed SBLOCKSIZE from 16000 to 1024000 at spell.c:4885
>> and it longer crashes but that's quite a dramatic increase so I
>> doubt whether that's right. While creating the dictionary,
>> l variable at spell.c:5485 reached l=564,458.
>
> Pfew, that's a lot of compound rules. Did you check in the affix file
> that this is actually there? I suppose this must be generated, no human
> would type this.
The file "literumilo.aff" contains 37,300 lines of COMPOUNDRULE .
(previous email contains a link to the OpenOffice file which contains
the *.aff & *.dic files).
COMPOUNDRULE lines are most certainly created automatically.
It seems that most words are built using COMPOUNDRULE in this
dictionary. I need to read the doc of Myspell or Hunspell
carefully to understand this better.
> A solution would be to make compflags a growarray. The way it's done
> now it is re-allocated for each COMPOUNDRULE, very inefficient. This
> was made with the assumption there would be only a few COMPOUNDRULEs.
Even if we did that, a regex is created with all COMPOUNDRULE
with many alternatives \|. The regex would be slow I think.
See below...
>> Vim temporary used 12.5 Gb of memory while creating the
>> dictionary which is quite a lot.
>
> Compound rules are indeed a problem, since Vim expands them into all
> possible words to be able to build the trie. It compresses really well,
> but lots of memory is needed before the compression happens.
>
>> Size of the created dictionary file 'eo.utf-8.spl' is 587,214 bytes
>> and it does not work. Trying to use it with...
>>
>> $ cp /tmp/eo.utf-8.spl ~/.vim/spell/.
>>
>> ... then in Vim...
>> :setlocal spell spelllang=eo
>>
>> ... gives errors after waiting for ~10 sec or so:
>>
>> Error detected while processing /home/pel/.vim/spell/eo.utf-8.spl:
>> E339: Pattern too long
>> E759: Format error in spell file
>>
>> ":help E339" says: "This only happens on systems with 16 bit ints".
>>
>> Well, that's not true since I get E339 on Linux x86_64 where
>> sizeof(int) is 4 (32 bits).
>>
>> E339 at regexp.c:1059 is in between SMALL_MALLOC
>> but E339 at regexp.c:1077 is not in between SMALL_MALLOC.
>>
>> Either help file needs to be updated for E339 or regexp.c
>> needs to handle longer regexes.
>
> The help is wrong, it can also happen when an offset gets too big.
>
>> Problem is triggered by the fact that file 'literumilo.aff' contains
>> many COMPOUNDRULE lines (Esperanto being an agglutinative
>> language).
>
> Did you check which regcomp() call results in E339?
I've added a printf I see this:
[vim_regcomp:1061] len(expr)=491048
expr=[^\(<7f>\~}||\|{z\|{yz\|xwz\|vutz\|vstz\|rz\|wz\|qpz\|oz\|xn\|xmn\|xlkn\|xljin\|xhgn\|xhtn\|xhfn\|xhin\|xhen\|xhdn\|xghen\|
.... etc, etc... (huge regexp of ~491 Kb)
The stack trace when E339 happens is:
#17 0x000000000052c0c3 in vim_regcomp (
expr=0x7fc9c6a21010
"^\\(\177\\~}||\\|{z\\|{yz\\|xwz\\|vutz\\|vstz\\|rz\\|wz\\|qpz\\|oz\\|xn\\|xmn\\|xlkn\\|xljin\\|xhgn\\|xhtn\\|xhfn\\|xhin\\|xhen\\|xhdn\\|xghen\\|xgcn\\|xbln\\|xbhn\\|xthn\\|xbtn\\|xbwn\\|xban\\|xt`_Zn\\|xbYn\\|xtYn\\|xwaiZn\\|xXtn\\|xXYn\\"...,
re_flags=7) at regexp.c:1077
#18 0x000000000055bc33 in read_compound (fd=0x92d1b0, slang=0x92d3f0,
len=453893) at spell.c:3704
#19 0x000000000055a7a9 in spell_load_file (fname=0xa0d5f0
"/home/pel/.vim/spell/eo.utf-8.spl", lang=0x7fff44b8a430 "eo",
old_lp=0x0, silent=0)
at spell.c:2920
#20 0x000000000055a352 in spell_load_cb (fname=0xa0d5f0
"/home/pel/.vim/spell/eo.utf-8.spl", cookie=0x7fff44b8a430) at
spell.c:2732
#21 0x0000000000464602 in do_in_runtimepath (name=0x7fff44b8a540
"spell/eo.utf-8.spl", all=0, callback=0x55a323 <spell_load_cb>,
cookie=0x7fff44b8a430) at ex_cmds2.c:2715
#22 0x0000000000559c09 in spell_load_lang (lang=0x7fff44b8b5f0 "eo")
at spell.c:2486
#23 0x000000000055cd74 in did_set_spelllang (wp=0x858e30) at spell.c:4304
#24 0x0000000000512263 in did_set_string_option (opt_idx=284,
varp=0x85c298, new_value_alloced=1, oldval=0x85d430 "en",
errbuf=0x7fff44b8b9b0 "",
opt_flags=4) at option.c:6551
#25 0x000000000050f0fd in do_set (arg=0xa0d63c "", opt_flags=4) at option.c:4792
#26 0x00000000004773bd in ex_set (eap=0x7fff44b8bb80) at ex_docmd.c:11121
#27 0x00000000004697b2 in do_one_cmd (cmdlinep=0x7fff44b8c228,
sourcing=1, cstack=0x7fff44b8bd80, fgetline=0x45253b <get_func_line>,
cookie=0x92b860)
at ex_docmd.c:2657
#28 0x0000000000466f15 in do_cmdline (cmdline=0x0, fgetline=0x45253b
<get_func_line>, cookie=0x92b860, flags=7) at ex_docmd.c:1123
#29 0x0000000000451b79 in call_user_func (fp=0x884890, argcount=0,
argvars=0x7fff44b8c750, rettv=0x7fff44b8c910, firstline=1, lastline=1,
selfdict=0x0) at eval.c:22018
#30 0x000000000043c9f9 in call_func (funcname=0x8611e0 "MySpellLang",
len=11, rettv=0x7fff44b8c910, argcount=0, argvars=0x7fff44b8c750,
firstline=1,
lastline=1, doesrange=0x7fff44b8c90c, evaluate=1, selfdict=0x0) at
eval.c:8316
#31 0x000000000043c62c in get_func_tv (name=0x8611e0 "MySpellLang",
len=11, rettv=0x7fff44b8c910, arg=0x7fff44b8c930, firstline=1,
lastline=1,
doesrange=0x7fff44b8c90c, evaluate=1, selfdict=0x0) at eval.c:8158
#32 0x00000000004358b4 in ex_call (eap=0x7fff44b8ca10) at eval.c:3398
#33 0x00000000004697b2 in do_one_cmd (cmdlinep=0x7fff44b8d0b8,
sourcing=0, cstack=0x7fff44b8cc10, fgetline=0x47e54f <getexline>,
cookie=0x0)
at ex_docmd.c:2657
#34 0x0000000000466f15 in do_cmdline (cmdline=0x0, fgetline=0x47e54f
<getexline>, cookie=0x0, flags=0) at ex_docmd.c:1123
#35 0x00000000004f96dc in nv_colon (cap=0x7fff44b8d210) at normal.c:5342
#36 0x00000000004f239c in normal_cmd (oap=0x7fff44b8d2d0, toplevel=1)
at normal.c:1193
#37 0x00000000004ae938 in main_loop (cmdwin=0, noexmode=0) at main.c:1256
#38 0x00000000004ae349 in main (argc=1, argv=0x7fff44b8d5f8) at main.c:961
read_compound() turns the COMPOUNDRULE items into a
regex so a huge number of COMPOUNDRULE won't work.
No good solution so far.
-- Dominique
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php