Dominique Pelle wrote:

> Vim-7.3.107 crashes when I try to create the Esperanto
> dictionary from OpenOffice-3.
> 
> Steps to reproduce:
> 
> $ wget http://extensions.services.openoffice.org/e-files/3377/1/1.0-dev.oxt
> $ unzip 1.0-dev.oxt
> $ vim -u NONE --noplugin -c 'set nomore' -c 'mkspell! /tmp/eo literumilo'
> Vim: Caught deadly signal SEGV
> 
> Vim: Finished.
> Segmentation fault (core dumped)
> 
> 
> Valgrind memory checker gives errors:
> 
> ==3877== Invalid write of size 1
> ==3877==    at 0x4026087: strcat (mc_replace_strmem.c:176)
> ==3877==    by 0x818AF60: spell_read_aff (spell.c:5493)
> ==3877==    by 0x819359D: mkspell (spell.c:9234)
> ==3877==    by 0x8191F53: ex_mkspell (spell.c:8557)
> ==3877==    by 0x80A81AC: do_one_cmd (ex_docmd.c:2657)
> ==3877==    by 0x80A5A85: do_cmdline (ex_docmd.c:1123)
> ==3877==    by 0x80A513F: do_cmdline_cmd (ex_docmd.c:728)
> ==3877==    by 0x80EAEE5: exe_commands (main.c:2803)
> ==3877==    by 0x80E85CA: main (main.c:881)
> ==3877==  Address 0x7e3b434 is 0 bytes after a block of size 16,012 alloc'd
> ==3877==    at 0x4025230: malloc (vg_replace_malloc.c:236)
> ==3877==    by 0x8117B93: lalloc (misc2.c:918)
> ==3877==    by 0x8117ACB: alloc_clear (misc2.c:829)
> ==3877==    by 0x818F951: getroom (spell.c:7368)
> ==3877==    by 0x818AEFA: spell_read_aff (spell.c:5485)
> ==3877==    by 0x819359D: mkspell (spell.c:9234)
> ==3877==    by 0x8191F53: ex_mkspell (spell.c:8557)
> ==3877==    by 0x80A81AC: do_one_cmd (ex_docmd.c:2657)
> ==3877==    by 0x80A5A85: do_cmdline (ex_docmd.c:1123)
> ==3877==    by 0x80A513F: do_cmdline_cmd (ex_docmd.c:728)
> ==3877==    by 0x80EAEE5: exe_commands (main.c:2803)
> ==3877==    by 0x80E85CA: main (main.c:881)
> (several other errors after that)
> 
> spell.c:
> 
>   5478     else if (is_aff_rule(items, itemcnt, "COMPOUNDRULE", 2))
>   5479     {
>   5480         /* Concatenate this string to previously defined ones, using a
>   5481          * slash to separate them. */
>   5482         l = (int)STRLEN(items[1]) + 1;
>   5483         if (compflags != NULL)
>   5484             l += (int)STRLEN(compflags) + 1;
>   5485         p = getroom(spin, l, FALSE);
>   5486         if (p != NULL)
>   5487         {
>   5488             if (compflags != NULL)
>   5489             {
>   5490                 STRCPY(p, compflags);
>   5491                 STRCAT(p, "/");
>   5492             }
> !!5493             STRCAT(p, items[1]);
>   5494             compflags = p;
>   5495         }
>   5496     }
> 
> When it crashes, I notice that variable l reaches 16005 which is just
> slightly bigger than SBLOCKSIZE (#define  SBLOCKSIZE 16000 at
> spell.c:4885).

First thing to fix would be to add a check in getroom() for a length
more than what can be handled.

> I changed SBLOCKSIZE from 16000 to 1024000 at spell.c:4885
> and it longer crashes but that's quite a dramatic increase so I
> doubt whether that's right.  While creating the dictionary,
> l variable at spell.c:5485 reached l=564,458.

Pfew, that's a lot of compound rules.  Did you check in the affix file
that this is actually there?  I suppose this must be generated, no human
would type this.

A solution would be to make compflags a growarray.  The way it's done
now it is re-allocated for each COMPOUNDRULE, very inefficient.  This
was made with the assumption there would be only a few COMPOUNDRULEs.

> Vim temporary used 12.5 Gb of memory while creating the
> dictionary which is quite a lot.

Compound rules are indeed a problem, since Vim expands them into all
possible words to be able to build the trie.  It compresses really well,
but lots of memory is needed before the compression happens.

> Size of the created dictionary file 'eo.utf-8.spl' is 587,214 bytes
> and it does not work.  Trying to use it with...
> 
>   $ cp /tmp/eo.utf-8.spl ~/.vim/spell/.
> 
> ... then in Vim...
>   :setlocal spell spelllang=eo
> 
>   ... gives errors after waiting for ~10 sec or so:
> 
>   Error detected while processing /home/pel/.vim/spell/eo.utf-8.spl:
>   E339: Pattern too long
>   E759: Format error in spell file
> 
> ":help E339" says: "This only happens on systems with 16 bit ints".
> 
> Well, that's not true since I get E339 on Linux x86_64 where
> sizeof(int) is 4 (32 bits).
> 
> E339 at regexp.c:1059 is in between SMALL_MALLOC
> but E339 at regexp.c:1077 is not in between SMALL_MALLOC.
> 
> Either help file needs to be updated for E339 or regexp.c
> needs to handle longer regexes.

The help is wrong, it can also happen when an offset gets too big.

> Problem is triggered by the fact that file 'literumilo.aff' contains
> many COMPOUNDRULE lines (Esperanto being an agglutinative
> language).

Did you check which regcomp() call results in E339?

> Esperanto dictionary from OpenOffice-2.x as currently used by
> Vim works fine.

-- 
hundred-and-one symptoms of being an internet addict:
168. You have your own domain name.

 /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\  an exciting new programming language -- http://www.Zimbu.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Raspunde prin e-mail lui