[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 --- Comment #20 from Henrik Krohns --- Now UTF-8 rules might actually work: Sending spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm Sendingspamassassin-3.4/sa-compile.raw Sendingtrunk/sa-compile.raw Sendingtrunk/t/sa_compile.t Transmitting file data done Committing transaction... Committed revision 1898791. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 --- Comment #19 from Henrik Krohns --- Origin of the warning seemed to be from fixup_re which created utf8 encoded strings, should be silenced now. Judging from sa-compile temp files, nothing changed, so nothing should break (assuming the utf-8 stuff works properly in the first place, there aren't any unit tests for it). Sendingtrunk/lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm Transmitting file data .done Committing transaction... Committed revision 1898776. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 --- Comment #18 from Henrik Krohns --- There is no issue if one doesn't put raw UTF-8 in cf files, some guidelines have been put into documentation about that. And as said, probably sa-compile will be gone in 4.0 (per Bug 7962). -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 --- Comment #17 from e...@gmx.net --- Thanks Henrik. Just to confirm, you are saying the issue does no longer exist in sa-compile v4+, so we can stop tracking at this point? If it still exists we may want to open a bug on v4 for tracking, until deprecation of sa-compile has been confirmed, or simply define/document the re2c input requirements more strictly. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 Henrik Krohns changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #16 from Henrik Krohns --- Closing this as 3.4 will not receive any more fixes, and I'm considering sa-compile deprecated for 4.0.0 (atleast the project should vote on it officially). -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 Bug 7645 depends on bug 7656, which changed state. Bug 7656 Summary: UTF8 rules, normalize_charset etc overhaul https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 Daniel Migowski changed: What|Removed |Added CC||dmigow...@ikoffice.de --- Comment #15 from Daniel Migowski --- I would wish for a better error message, one which says WHICH channel on was parsing. I also have heinlein, but also schaal-it.net, and cannot say for sure without more testing which of them delivers wrong characters now. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 Henrik Krohns changed: What|Removed |Added CC||jida...@jidanni.org --- Comment #14 from Henrik Krohns --- *** Bug 5607 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 --- Comment #13 from Bill Cole --- (In reply to Robert R. Richter from comment #11) > I am no expert, so is it safe to just ignore these "Wide character in print > at..." warnings/errors? Or are there any other sideeffects so that I should > remove this ruleset? "Safe" is an imprecise concept, but I think ignoring those messages is safe for my understanding of safety. My understanding is that all of the rules are still being converted into compilable C and that only the specific rules that contain utf8 characters are being mangled in the process, making them generally non-matchable. See Henrik's comments above (comment #6 and comment #12) > FYI: I still have one 3.4.1 installation left and there are no such warnings > using this ruleset on 3.4.1. Seems to be an issue only on 3.4.2. That's probably because 3.4.1 was liberally sprinkled with "use bytes;" pragmas, which effectively removed handling of "wide" characters as characters rather than as a sequence of unrelated bytes. That wasn't a maintainable strategy given the modern reality of how Perl handles Unicode. If you want to understand the details, "perldoc bytes" is a place to start and it references additional documentation that may be helpful. Because this could be seen as a problem with a 3rd-party rule distribution that is distributing rules in a bad format, I am tempted to just close this as "INVALID" (i.e. not OUR problem,) but I do think we need to nail down the code truth in documentation and probably rework sa-compile for 4.0 to create re2c input files in a more tightly specified way. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 --- Comment #12 from Henrik Krohns --- Well if Heinlein is reading this, do not use UTF8 in rule files. That's the most simple fix. Write rules in pure latin1: /füübar/ Or better yet, with UTF8 byte alternatives: $ perl -MEncode -e 'print unpack("H*", encode("UTF-8", "ü"))' c3bc /f(?:ü|\xc3\xbc)(?:ü|\xc3\xbc)bar/ Most portable: $ perl -e 'print unpack("H*", "ü")' fc /f(?:\xfc|\xc3\xbc)(?:\xfc|\xc3\xbc)bar/ Some related thread: http://spamassassin.1065346.n5.nabble.com/UTF8-character-in-doesn-t-match-td154199.html -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 --- Comment #11 from Robert R. Richter --- I am no expert, so is it safe to just ignore these "Wide character in print at..." warnings/errors? Or are there any other sideeffects so that I should remove this ruleset? FYI: I still have one 3.4.1 installation left and there are no such warnings using this ruleset on 3.4.1. Seems to be an issue only on 3.4.2. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 Bill Cole changed: What|Removed |Added Target Milestone|Undefined |4.0.0 --- Comment #10 from Bill Cole --- (In reply to Robert R. Richter from comment #9) > same problem here under Gentoo: > > Wide character in print at /usr/bin/sa-compile line 433, <$fh> line 9716. > Wide character in print at /usr/bin/sa-compile line 433, <$fh> line 10641. > > spamassassin 3.4.2 and perl 5.26.2 > > I am also using spamassassin.heinlein-support.de > > Any news on this topic? Not really. It's a low priority because it seems to be purely cosmetic and only occur with a third-party ruleset. 1. A simple direct POSSIBLE fix with UNKNOWN side-effects may be to add this UNTESTED line after line 22: use open OUT => ':utf8'; 2. A better fix will be to not use STDOUT for building the .re files. Either change is unfit for the 3.4.3 release, which will be the terminal release for the 3.4 branch. The untested one-line possible fix may not work and may not quiet the warning while possibly breaking the rules involved. The refactoring of .re generation is simply too big to put in the final cleanup of the 3.4 branch. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 Robert R. Richter changed: What|Removed |Added CC||duncan@tucan-entertainment. ||com --- Comment #9 from Robert R. Richter --- same problem here under Gentoo: Wide character in print at /usr/bin/sa-compile line 433, <$fh> line 9716. Wide character in print at /usr/bin/sa-compile line 433, <$fh> line 10641. spamassassin 3.4.2 and perl 5.26.2 I am also using spamassassin.heinlein-support.de Any news on this topic? -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 Henrik Krohns changed: What|Removed |Added Depends on||7656 Referenced Bugs: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656 [Bug 7656] UTF8 rules, normalize_charset etc overhaul -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 --- Comment #8 from e...@gmx.net --- Thanks Henrik. I notified the maintainers of spamassassin.heinlein-support.de and pointed them here. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 --- Comment #7 from Henrik Krohns --- Noticed something that made me think of this bug.. https://bz.apache.org/SpamAssassin/show_bug.cgi?id=5691#c16 "SA rules files are encoded in ISO-8859-1, not UTF-8. You have to either encode Japanese characters in pattern tests using \x sequences or develop a new feature adding support for UTF-8 config files to SA." I don't know if this (still) true of false, but perhaps we should clarify this somewhere and optionally reject any non-ascii configuration lines. No time to investicate right now. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 Todd Rinaldo changed: What|Removed |Added CC||to...@cpanel.net -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 apache-bugzi...@andre.geddert.net changed: What|Removed |Added CC||apache-bugzilla@andre.gedde ||rt.net -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 Henrik Krohns changed: What|Removed |Added CC||h...@hege.li --- Comment #6 from Henrik Krohns --- There's some utf8 rules, for example (I've used "cat -v" to print them..) body HS_BODY_899 /The seller hasnM-CM-"M-bM-^BM-,M-bM-^DM-"t provided any postage details yet/ body HS_BODY_1575 /diesem Grund folgende Zahlung zu stornieren. Um den dafM-CM-
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 --- Comment #4 from Jan Brodda --- (In reply to Bill Cole from comment #3) > What version of Perl are you using? Perl 5.22.1 on Ubuntu 16.04.5 -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 --- Comment #5 from Fabian Dellwing --- > [14:50 root@mail ~] > perl -V > Summary of my perl5 (revision 5 version 18 subversion 2) configuration: > > Platform: > osname=linux, osvers=4.4.0-127-generic, > archname=i686-linux-gnu-thread-multi-64int > uname='linux lgw01-amd64-009 4.4.0-127-generic #153-ubuntu smp sat may 19 > 10:58:46 utc 2018 i686 i686 i686 gnulinux ' > config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN > -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 > -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions > -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -Wl,-z,relro > -Dcccdlflags=-fPIC -Darchname=i686-linux-gnu -Dprefix=/usr > -Dprivlib=/usr/share/perl/5.18 -Darchlib=/usr/lib/perl/5.18 > -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 > -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.18.2 > -Dsitearch=/usr/local/lib/perl/5.18.2 -Dman1dir=/usr/share/man/man1 > -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 > -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl > -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm > -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib > -Dlibperl=libperl.so.5.18.2 -des' > hint=recommended, useposix=true, d_sigaction=define > useithreads=define, usemultiplicity=define > useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef > use64bitint=define, use64bitall=undef, uselongdouble=undef > usemymalloc=n, bincompat5005=undef > Compiler: > cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > -D_FILE_OFFSET_BITS=64', > > optimize='-O2 -g', > > > > cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector > -fno-strict-aliasing -pipe -I/usr/local/include' > > > ccversion='', gccversion='4.8.4', gccosandvers='' > > > > intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678 > > > > d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 > > > > ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', > lseeksize=8 > > > alignbytes=4, prototype=define > > > > Linker and Libraries: > > > > ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' > > > > libpth=/usr/local/lib /lib/i386-linux-gnu /lib/../lib > /usr/lib/i386-linux-gnu /usr/lib/../lib /lib /usr/lib > > > libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt > > > > perllibs=-ldl -lm -lpthread -lc -lcrypt > > > > libc=, so=so, useshrplib=true,
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 Bill Cole changed: What|Removed |Added CC||billc...@apache.org --- Comment #3 from Bill Cole --- What version of Perl are you using? -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 --- Comment #2 from Jan Brodda --- (In reply to eqx from comment #1) > I get the same after adding the heinlein channel. I am using the same SA rules from channel "spamassassin.heinlein-support.de", so this might be a similarity here.. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645 Jan Brodda changed: What|Removed |Added CC||sa-bugzi...@janbrodda.de -- You are receiving this mail because: You are the assignee for the bug.