Bug#1028473: dictionaries-common: problem in russian dict. Word '��� ��' contains illegal characters.

2023-01-15 Thread Agustin Martin
Control: reassign -1 irussian
Control: tags -1 + patch pending

El sáb, 14 ene 2023 a las 15:39, Jason Lee Quinn
() escribió:
>
> Package: dictionaries-common
> Version: 1.29.3
> Followup-For: Bug #1028473
> X-Debbugs-Cc: jason.lee.quinn+deb...@gmail.com
>
> Thank you for the reply.
>
> If these dictionaries are installed,
> where are they located? I've searched
> /usr/lib/ispell and many other places can only find
> american and british dictionaries on my machine.

Hi,

You should find relevant files under different dirs. In one of my boxes

$ dir /usr/lib/ispell/ /usr/share/ispell/ /var/lib/ispell/
/var/lib/dictionaries-common/ispell/

/usr/lib/ispell/:
american.affcastellano.hash  english.affespanol.hash
spanish.hash
american.hashdefault.aff espa~nol.affREADME.select-ispell
castellano.affdefault.hash espa~nol.hashspanish.aff

/usr/share/ispell/:
american.med+.mwl.gz  american.mwl.gz  english.aff  espa~nol.mwl.gz

/var/lib/dictionaries-common/ispell/:
iamerican  ispanish

/var/lib/ispell/:
american.compat  american.hashamerican.remove  espa~nol.compat
espa~nol.hash  espa~nol.remove  README

If there is no trace of those dicts in your dirs, synaptic is
upgrading something else (A virtual machine?)

> Also where does the "contains illegal characters" message
> actually come from? Whatever the source of that messgae,
> I'm having trouble tracking it down. A techincal explaination
> about why the message is harmless would also be interesting
> for me. Perhaps the message itself and the logic that produces
> it could be improved to provide a nicer user experience.

Munched word in line 39 of original russian ispell dict contains
whitespace, which is allowed only as word separator, not in the middle
of a word. Then ispell (and aspell) complains about it and skips that
word, that is the message. This is harmless because its only result is
that word is skipped.

Attached patch should strip that word before package is built, thus
making the message go away. I am reassigning this bug report to
irussian (I am also uploader for it)  and will upload a package with
this fix unless maintainer disagree.

Thanks again for your feedback.

-- 
Agustin
diff --git a/debian/rules b/debian/rules
index 565a92f..2ce6cfc 100755
--- a/debian/rules
+++ b/debian/rules
@@ -28,7 +28,7 @@ export LANG LC_ALL
 override_dh_auto_build:
 	# Generate ispell dictionary.
 	grep -h '[3]' $(DICTIONARIES) | tr '\243\263' '\305\345' > yo_subst.koi
-	cat $(DICTIONARIES) yo_subst.koi |./sortkoi8 | uniq > $(ILANGUAGE).dict
+	cat $(DICTIONARIES) yo_subst.koi |./sortkoi8 | uniq | LC_ALL=C grep -v ' ' > $(ILANGUAGE).dict
 	sed -e "s/^\#[ye]//;s/^\#koi/wordchars/" $(ILANGUAGE).aff.koi > $(ILANGUAGE).aff
 	# Generate traditional ispell hash needed by i2myspell.
 	buildhash $(ILANGUAGE).dict $(ILANGUAGE).aff $(ILANGUAGE).hash


Bug#1028473: dictionaries-common: problem in russian dict. Word '��� ��' contains illegal characters.

2023-01-14 Thread Jason Lee Quinn
Package: dictionaries-common
Version: 1.29.3
Followup-For: Bug #1028473
X-Debbugs-Cc: jason.lee.quinn+deb...@gmail.com

Thank you for the reply.

If these dictionaries are installed,
where are they located? I've searched
/usr/lib/ispell and many other places can only find
american and british dictionaries on my machine.

Also where does the "contains illegal characters" message
actually come from? Whatever the source of that messgae,
I'm having trouble tracking it down. A techincal explaination
about why the message is harmless would also be interesting
for me. Perhaps the message itself and the logic that produces
it could be improved to provide a nicer user experience.



Bug#1028473: dictionaries-common: problem in russian dict. Word '��� ��' contains illegal characters.

2023-01-11 Thread Agustin Martin
El mié, 11 ene 2023 a las 17:15, Jason Lee Quinn
() escribió:
>
> Package: dictionaries-common
> Version: 1.29.3
> Severity: minor
> X-Debbugs-Cc: jason.lee.quinn+deb...@gmail.com
>
> Dear Maintainer,
>
> About two weeks ago on a fresh install of Debian Bookworm
> from a daily installer build I
> came accross a dictionary error related to the
> installation of synaptic. The relavent output is
>
> 
>
> Setting up synaptic (0.91.2) ...
> Processing triggers for dictionaries-common (1.29.3) ...
> ispell-autobuildhash: Processing 'american' dict.
> ispell-autobuildhash: Processing 'brasilero' dict.
> ispell-autobuildhash: Processing 'british' dict.
> ispell-autobuildhash: Processing 'catala' dict.
> ispell-autobuildhash: Processing 'danish' dict.
> ispell-autobuildhash: Processing 'espa-nol' dict.
> ispell-autobuildhash: Processing 'lietuviu' dict.
> ispell-autobuildhash: Processing 'ngerman' dict.
> ispell-autobuildhash: Processing 'polish' dict.
> ispell-autobuildhash: Processing 'portugues' dict.
> ispell-autobuildhash: Processing 'russian' dict.
>
> Word '��� ��' contains illegal characters.
> ispell-autobuildhash: Processing 'swiss' dict.

HI,

This is a harmless message during hash creation for russian ispell
dict. Nothing to worry about.

> It looks to be an error in a dictionary file
> but I never selected any language except English so
> this is default behavior and as far as I can
> tell I do not even have the russian dictionary
> installed at all.

By the way, you have all those ispell dicts installed although you may
have not explicitly installed them.

Thanks for your contribution to Debian

-- 
Agustin



Bug#1028473: dictionaries-common: problem in russian dict. Word '��� ��' contains illegal characters.

2023-01-11 Thread Jason Lee Quinn
Package: dictionaries-common
Version: 1.29.3
Severity: minor
X-Debbugs-Cc: jason.lee.quinn+deb...@gmail.com

Dear Maintainer,

About two weeks ago on a fresh install of Debian Bookworm
from a daily installer build I
came accross a dictionary error related to the
installation of synaptic. The relavent output is



Setting up synaptic (0.91.2) ...
Processing triggers for dictionaries-common (1.29.3) ...
ispell-autobuildhash: Processing 'american' dict.
ispell-autobuildhash: Processing 'brasilero' dict.
ispell-autobuildhash: Processing 'british' dict.
ispell-autobuildhash: Processing 'catala' dict.
ispell-autobuildhash: Processing 'danish' dict.
ispell-autobuildhash: Processing 'espa-nol' dict.
ispell-autobuildhash: Processing 'lietuviu' dict.
ispell-autobuildhash: Processing 'ngerman' dict.
ispell-autobuildhash: Processing 'polish' dict.
ispell-autobuildhash: Processing 'portugues' dict.
ispell-autobuildhash: Processing 'russian' dict.

Word '��� ��' contains illegal characters.
ispell-autobuildhash: Processing 'swiss' dict.



It looks to be an error in a dictionary file
but I never selected any language except English so
this is default behavior and as far as I can
tell I do not even have the russian dictionary
installed at all.

My best guess is that this is a issue in
dictionaries-common/dc-deconf-select.pl and/or
related files related to dpkg triggers.

If you'd like more details about this just suggest
what extra info you'd like and I can try to 
supply it.

Cheers,
Jason



-- System Information:
Debian Release: bookworm/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 6.0.0-6-amd64 (SMP w/24 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages dictionaries-common depends on:
ii  debconf [debconf-2.0]  1.5.81
ii  emacsen-common 3.0.5
ii  libtext-iconv-perl 1.7-8

dictionaries-common recommends no packages.

Versions of packages dictionaries-common suggests:
ii  aspell0.60.8-4+b1
ii  ispell3.4.05-1
ii  wamerican [wordlist]  2020.12.07-2

-- debconf information:
  dictionaries-common/ispell-autobuildhash-message:
* dictionaries-common/default-ispell: american (American English)
* dictionaries-common/default-wordlist: american (American English)
  dictionaries-common/selecting_ispell_wordlist_default:
  dictionaries-common/invalid_debconf_value:
  dictionaries-common/debconf_database_corruption:
  dictionaries-common/old_wordlist_link: true