Roi Dayan via dev <[email protected]> writes:

> On 02/11/2023 16:11, Eelco Chaudron wrote:
>> 
>> On 2 Nov 2023, at 14:20, Roi Dayan via dev wrote:
>> 
>>> Add personal words list as spellcheck.txt and load it
>>> into enchant spell checker. This file is generated from
>>> codespell dictionary.txt and contains words users use
>>> but enchant spell checker failed on like
>>> refcount, pthread, enqueuing, etc.
>>>
>>> Signed-off-by: Roi Dayan <[email protected]>
>> Thanks for the patch, but it doesn’t look right to add the full list
>> of words to the OVS repository.
>> 
>> Maybe we can update the extra_keywords list with the most common
>> missing ones, and add a command line option to include a
>> user-defined file for people who want this?
>> 
>> What do you think?
>> 
>
> I think it is needed. It's a dictionary of most commonly used words
> and the enchant spell check does not seem to be enough.
> Some examples enchant fails as I remember are:
> lacp, dereferenced, valgrind, priv, syscall,..

Well, we do add some of those - BUT I see that codespell probably has a
more complete dictionary.

The original implementation using enchant was due to there not being a
clear winner at the time.  Enchant is a spell checking frontend intended
for lots of tools, so seemed like a good idea (for example, used by
libreoffice, AbiWord, and others).

It may be that codepspell is more appropriate since it is targeted at
development spell checking.  OTOH, codespell will miss lots of
misspellings where enchant will have a larger lexicon.  I have no real
opinion on which framework makes sense - but we shouldn't include a
dictionary.  After all, even linux checkpatch.pl will find the codespell
dictionary and just use that as it exists.

However, I will point out that there *is* a difference between the two.
Here's a simple example:

  02:01:19 aconole@RHTPC1VM0NT {(594d145410...)} ~/git/ovs$ echo vailgrind | 
codespell -
  02:01:24 aconole@RHTPC1VM0NT {(594d145410...)} ~/git/ovs$ 

vs.

  02:00:18 aconole@RHTPC1VM0NT {(594d145410...)} ~/git/ovs$ 
./utilities/checkpatch.py -S test.patch
  == Checking "test.patch" ==
  WARNING: Possible misspelled word: "vaigrind"
  Did you mean:  ['grinder', 'valgrind']

So I guess we can probably do something better - and maybe that
something is to find the codespell dictionary cut it up and merge the
values with the current session (instead of add(), ie: 2/2 in this
series).

But I don't think we need to copy the entire codespell dict.

> Also adding the entire dictionary the script is even faster than
> adding word by word as done now.
>
> I think maybe removing the add word by word part at all but checking and doing
> in steps.
>
> Just by adding the dictionary and having the words being added through python
> already exists seems to be faster.
>
> Checking small commit before loading dictionry.txt:
>
> $ time ./utilities/checkpatch.py  -S -1
>
> real    0m28.379s
> user    0m0.272s
> sys     0m0.223s
>
>
> and after:
>
> $ time ./utilities/checkpatch.py  -S -1
>
> real    0m0.238s
> user    0m0.138s
> sys     0m0.038s
>
>> Cheers,
>> 
>> Eelco
>> 
>>> ---
>>>  utilities/automake.mk    |     1 +
>>>  utilities/checkpatch.py  |     4 +-
>>>  utilities/dictionary.txt | 16161 +++++++++++++++++++++++++++++++++++++
>>>  3 files changed, 16165 insertions(+), 1 deletion(-)
>>>  create mode 100644 utilities/dictionary.txt
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to