On 18/02/2025 15:30, Aaron Conole wrote:
> Roi Dayan <[email protected]> writes:
>
>> On 12/02/2025 19:18, Aaron Conole wrote:
>>> Hi Roi,
>>>
>>> Roi Dayan via dev <[email protected]> writes:
>>>
>>>> Load dictionary_code.txt in addition to the default dictionary.
>>>
>>> The code dictionary isn't loaded by default with codespell
>>> (codespell_lib/_codespell.py)::
>>>
>>> _builtin_default = "clear,rare"
>>>
>>> And there are some questionable conversions in that dictionary (like
>>> uint to unit and stdio to studio). I think adding the _rare dictionary
>>> could make sense, but perhaps we should be more careful when adding the
>>> others.
>>>
>>> Can you add the rationale for turning these on? I think it's okay to
>>> turn on more than one codespell dict, but we should consider the
>>> individual dictionaries, too.
>>
>> I don't think it matters what is loaded by default or not as the script
>> uses enchant and not codespell.
>
> Yes, but the point is the codespell authors don't think that this
> dictionary is a good default.
>
>> Also don't look at the conversions as it's not being used since we don't
>> use codespell. In the code below it's being stripped to take only the
>> final wording and add to enchant as allowed words.
>>
>> I looked again also in the others and I think most of the words already in
>> enchant dictionary but loading them won't harm.
>> I do think we can skip the main dictionary_en-GB_to_en-US.txt for example
>> as we use the enchant en_US dictionary which should be equal more or less.
>> The other has more unique words which I think this is what we can say in the
>> commit message.
>>
>> What do you think?
>
> Yes, as you noted most of the words are already there. I actually ran
> through many of the RHS spellings, and they already appear (as you
> noted). Actually, we only are not already getting:
>
> * copiable
> * clonable
> * subpatches
> * traceback
> * tracebacks
>
> Just 5 words and they are not actually universally agreed upon
> spellings. For example, if I use something like wiktionary (not the
> most authoritative source, I agree):
>
> https://en.wiktionary.org/wiki/clonable#English
>
> It says that 'cloneable' is an alternative form used in computing
> context. Enchant suggests 'clone able' or 'clone-able'
>
> Likewise, there isn't an accepted form of copiable (and enchant does
> similar, including with subpatches).
>
> So I guess 'traceback' and 'tracebacks' are for sure the ones that there
> isn't yet any ambiguity.
>
> Anyway, I guess it's okay to add, but we should probably consider
> looking at all the dictionaries and seeing which ones make sense to add
> as well. Otherwise, it's quite a bit of change here for something that
> could be done by just adding the words above directly (ie: you make 7
> lines of change here, vs adding words to extra_keywords).
>
yes but this change allows newer versions of codespell with potential
updates to the dictionary to catch in.
I looked a bit in the other dictionaries.
We probably don't want the main one dictionary_en-GB_to_en-US.txt as we
use enchant for core words.
Also we probably won't need dictionary_usage.txt, dictionary_rare.txt,
dictionary_names.txt as they seem to be more for spelling mistakes rather
than introducing words.
So the only exception is dictionary.txt which is already loaded and
dictionary_code.txt which seems to add those more accepted words
like you noted.
So I don't think we need to add the others. from here we can keep
updating the internal list.
What do you think?
>> Files I see in codespell path:
>>
>> dictionary_code.txt
>> dictionary_en-GB_to_en-US.txt
>> dictionary_informal.txt
>> dictionary_names.txt
>> dictionary_rare.txt
>> dictionary.txt
>> dictionary_usage.txt
>>
>>
>>>
>>>> Signed-off-by: Roi Dayan <[email protected]>
>>>> Acked-by: Salem Sol <[email protected]>
>>>> ---
>>>> utilities/checkpatch.py | 14 ++++++++------
>>>> 1 file changed, 8 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>>>> index f8caeb811604..9571380c291f 100755
>>>> --- a/utilities/checkpatch.py
>>>> +++ b/utilities/checkpatch.py
>>>> @@ -42,14 +42,16 @@ missing_authors = []
>>>> def open_spell_check_dict():
>>>> import enchant
>>>>
>>>> + codespell_files = []
>>>> try:
>>>> import codespell_lib
>>>> codespell_dir = os.path.dirname(codespell_lib.__file__)
>>>> - codespell_file = os.path.join(codespell_dir, 'data',
>>>> 'dictionary.txt')
>>>> - if not os.path.exists(codespell_file):
>>>> - codespell_file = ''
>>>> + for fn in ['dictionary.txt', 'dictionary_code.txt']:
>>>> + fn = os.path.join(codespell_dir, 'data', fn)
>>>> + if os.path.exists(fn):
>>>> + codespell_files.append(fn)
>>>> except:
>>>> - codespell_file = ''
>>>> + pass
>>>>
>>>> try:
>>>> extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
>>>> @@ -121,8 +123,8 @@ def open_spell_check_dict():
>>>>
>>>> spell_check_dict = enchant.Dict("en_US")
>>>>
>>>> - if codespell_file:
>>>> - with open(codespell_file) as f:
>>>> + for fn in codespell_files:
>>>> + with open(fn) as f:
>>>> for line in f.readlines():
>>>> words = line.strip().split('>')[1].strip(',
>>>> ').split(',')
>>>> for word in words:
>>>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev