On 18/02/2025 15:30, Aaron Conole wrote:
> Roi Dayan <[email protected]> writes:
> 
>> On 12/02/2025 19:18, Aaron Conole wrote:
>>> Hi Roi,
>>>
>>> Roi Dayan via dev <[email protected]> writes:
>>>
>>>> Load dictionary_code.txt in addition to the default dictionary.
>>>
>>> The code dictionary isn't loaded by default with codespell 
>>> (codespell_lib/_codespell.py)::
>>>
>>>   _builtin_default = "clear,rare"
>>>
>>> And there are some questionable conversions in that dictionary (like
>>> uint to unit and stdio to studio).  I think adding the _rare dictionary
>>> could make sense, but perhaps we should be more careful when adding the
>>> others.
>>>
>>> Can you add the rationale for turning these on?  I think it's okay to
>>> turn on more than one codespell dict, but we should consider the
>>> individual dictionaries, too.
>>
>> I don't think it matters what is loaded by default or not as the script
>> uses enchant and not codespell.
> 
> Yes, but the point is the codespell authors don't think that this
> dictionary is a good default.
> 
>> Also don't look at the conversions as it's not being used since we don't
>> use codespell. In the code below it's being stripped to take only the
>> final wording and add to enchant as allowed words.
>>
>> I looked again also in the others and I think most of the words already in
>> enchant dictionary but loading them won't harm.
>> I do think we can skip the main dictionary_en-GB_to_en-US.txt for example
>> as we use the enchant en_US dictionary which should be equal more or less.
>> The other has more unique words which I think this is what we can say in the
>> commit message.
>>
>> What do you think?
> 
> Yes, as you noted most of the words are already there.  I actually ran
> through many of the RHS spellings, and they already appear (as you
> noted).  Actually, we only are not already getting:
> 
>   * copiable
>   * clonable
>   * subpatches
>   * traceback
>   * tracebacks
> 
> Just 5 words and they are not actually universally agreed upon
> spellings.  For example, if I use something like wiktionary (not the
> most authoritative source, I agree):
> 
>   https://en.wiktionary.org/wiki/clonable#English
> 
> It says that 'cloneable' is an alternative form used in computing
> context.  Enchant suggests 'clone able' or 'clone-able'
> 
> Likewise, there isn't an accepted form of copiable (and enchant does
> similar, including with subpatches).
> 
> So I guess 'traceback' and 'tracebacks' are for sure the ones that there
> isn't yet any ambiguity.
> 
> Anyway, I guess it's okay to add, but we should probably consider
> looking at all the dictionaries and seeing which ones make sense to add
> as well.  Otherwise, it's quite a bit of change here for something that
> could be done by just adding the words above directly (ie: you make 7
> lines of change here, vs adding words to extra_keywords).
> 

yes but this change allows newer versions of codespell with potential
updates to the dictionary to catch in.

I looked a bit in the other dictionaries.
We probably don't want the main one dictionary_en-GB_to_en-US.txt as we
use enchant for core words.
Also we probably won't need dictionary_usage.txt, dictionary_rare.txt,
dictionary_names.txt as they seem to be more for spelling mistakes rather
than introducing words.

So the only exception is dictionary.txt which is already loaded and
dictionary_code.txt which seems to add those more accepted words
like you noted.

So I don't think we need to add the others. from here we can keep
updating the internal list.

What do you think?

>> Files I see in codespell path:
>>
>> dictionary_code.txt
>> dictionary_en-GB_to_en-US.txt
>> dictionary_informal.txt
>> dictionary_names.txt
>> dictionary_rare.txt
>> dictionary.txt
>> dictionary_usage.txt
>>
>>
>>>
>>>> Signed-off-by: Roi Dayan <[email protected]>
>>>> Acked-by: Salem Sol <[email protected]>
>>>> ---
>>>>  utilities/checkpatch.py | 14 ++++++++------
>>>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>>>> index f8caeb811604..9571380c291f 100755
>>>> --- a/utilities/checkpatch.py
>>>> +++ b/utilities/checkpatch.py
>>>> @@ -42,14 +42,16 @@ missing_authors = []
>>>>  def open_spell_check_dict():
>>>>      import enchant
>>>>  
>>>> +    codespell_files = []
>>>>      try:
>>>>          import codespell_lib
>>>>          codespell_dir = os.path.dirname(codespell_lib.__file__)
>>>> -        codespell_file = os.path.join(codespell_dir, 'data', 
>>>> 'dictionary.txt')
>>>> -        if not os.path.exists(codespell_file):
>>>> -            codespell_file = ''
>>>> +        for fn in ['dictionary.txt', 'dictionary_code.txt']:
>>>> +            fn = os.path.join(codespell_dir, 'data', fn)
>>>> +            if os.path.exists(fn):
>>>> +                codespell_files.append(fn)
>>>>      except:
>>>> -        codespell_file = ''
>>>> +        pass
>>>>  
>>>>      try:
>>>>          extra_keywords = ['ovs', 'vswitch', 'vswitchd', 'ovs-vswitchd',
>>>> @@ -121,8 +123,8 @@ def open_spell_check_dict():
>>>>  
>>>>          spell_check_dict = enchant.Dict("en_US")
>>>>  
>>>> -        if codespell_file:
>>>> -            with open(codespell_file) as f:
>>>> +        for fn in codespell_files:
>>>> +            with open(fn) as f:
>>>>                  for line in f.readlines():
>>>>                      words = line.strip().split('>')[1].strip(', 
>>>> ').split(',')
>>>>                      for word in words:
>>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to