[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2022-03-09 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

--- Comment #20 from Henrik Krohns  ---
Now UTF-8 rules might actually work:

Sending   
spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Sendingspamassassin-3.4/sa-compile.raw
Sendingtrunk/sa-compile.raw
Sendingtrunk/t/sa_compile.t
Transmitting file data done
Committing transaction...
Committed revision 1898791.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2022-03-09 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

--- Comment #19 from Henrik Krohns  ---
Origin of the warning seemed to be from fixup_re which created utf8 encoded
strings, should be silenced now. Judging from sa-compile temp files, nothing
changed, so nothing should break (assuming the utf-8 stuff works properly in
the first place, there aren't any unit tests for it).

Sendingtrunk/lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Transmitting file data .done
Committing transaction...
Committed revision 1898776.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2022-03-06 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

--- Comment #18 from Henrik Krohns  ---
There is no issue if one doesn't put raw UTF-8 in cf files, some guidelines
have been put into documentation about that. And as said, probably sa-compile
will be gone in 4.0 (per Bug 7962).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2022-03-06 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

--- Comment #17 from e...@gmx.net ---
Thanks Henrik. Just to confirm, you are saying the issue does no longer exist
in sa-compile v4+, so we can stop tracking at this point?

If it still exists we may want to open a bug on v4 for tracking, until
deprecation of sa-compile has been confirmed, or simply define/document the
re2c input requirements more strictly.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2022-03-06 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

Henrik Krohns  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #16 from Henrik Krohns  ---
Closing this as 3.4 will not receive any more fixes, and I'm considering
sa-compile deprecated for 4.0.0 (atleast the project should vote on it
officially).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2022-03-06 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645
Bug 7645 depends on bug 7656, which changed state.

Bug 7656 Summary: UTF8 rules, normalize_charset etc overhaul
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2020-02-10 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

Daniel Migowski  changed:

   What|Removed |Added

 CC||dmigow...@ikoffice.de

--- Comment #15 from Daniel Migowski  ---
I would wish for a better error message, one which says WHICH channel on was
parsing. I also have heinlein, but also schaal-it.net, and cannot say for sure
without more testing which of them delivers wrong characters now.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2019-06-24 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

Henrik Krohns  changed:

   What|Removed |Added

 CC||jida...@jidanni.org

--- Comment #14 from Henrik Krohns  ---
*** Bug 5607 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2019-02-03 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

--- Comment #13 from Bill Cole  ---
(In reply to Robert R. Richter from comment #11)
> I am no expert, so is it safe to just ignore these "Wide character in print
> at..." warnings/errors? Or are there any other sideeffects so that I should
> remove this ruleset?

"Safe" is an imprecise concept, but I think ignoring those messages is safe for
my understanding of safety. My understanding is that all of the rules are still
being converted into compilable C and that only the specific rules that contain
utf8 characters are being mangled in the process, making them generally
non-matchable. See Henrik's comments above (comment #6 and comment #12) 

> FYI: I still have one 3.4.1 installation left and there are no such warnings
> using this ruleset on 3.4.1. Seems to be an issue only on 3.4.2.

That's probably because 3.4.1 was liberally sprinkled with "use bytes;"
pragmas, which effectively removed handling of "wide" characters as characters
rather than as a sequence of unrelated bytes. That wasn't a maintainable
strategy given the modern reality of how Perl handles Unicode. If you want to
understand the details, "perldoc bytes" is a place to start and it references
additional documentation that may be helpful. 

Because this could be seen as a problem with a 3rd-party rule distribution that
is distributing rules in a bad format, I am tempted to just close this as
"INVALID" (i.e. not OUR problem,) but I do think we need to nail down the code
truth in documentation and probably rework sa-compile for 4.0 to create re2c
input files in a more tightly specified way.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2019-02-03 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

--- Comment #12 from Henrik Krohns  ---
Well if Heinlein is reading this, do not use UTF8 in rule files. That's the
most simple fix.

Write rules in pure latin1:

/füübar/

Or better yet, with UTF8 byte alternatives:

$ perl -MEncode -e 'print unpack("H*", encode("UTF-8", "ü"))'
c3bc

/f(?:ü|\xc3\xbc)(?:ü|\xc3\xbc)bar/

Most portable:

$ perl -e 'print unpack("H*", "ü")'
fc

/f(?:\xfc|\xc3\xbc)(?:\xfc|\xc3\xbc)bar/

Some related thread:
http://spamassassin.1065346.n5.nabble.com/UTF8-character-in-doesn-t-match-td154199.html

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2019-02-03 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

--- Comment #11 from Robert R. Richter  ---
I am no expert, so is it safe to just ignore these "Wide character in print
at..." warnings/errors? Or are there any other sideeffects so that I should
remove this ruleset?

FYI: I still have one 3.4.1 installation left and there are no such warnings
using this ruleset on 3.4.1. Seems to be an issue only on 3.4.2.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2019-02-03 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

Bill Cole  changed:

   What|Removed |Added

   Target Milestone|Undefined   |4.0.0

--- Comment #10 from Bill Cole  ---
(In reply to Robert R. Richter from comment #9)
> same problem here under Gentoo:
> 
> Wide character in print at /usr/bin/sa-compile line 433, <$fh> line 9716.
> Wide character in print at /usr/bin/sa-compile line 433, <$fh> line 10641.
> 
> spamassassin 3.4.2 and perl 5.26.2
> 
> I am also using spamassassin.heinlein-support.de
> 
> Any news on this topic?

Not really. It's a low priority because it seems to be purely cosmetic and only
occur with a third-party ruleset. 

1. A simple direct POSSIBLE fix with UNKNOWN side-effects may be to add this
UNTESTED line after line 22:

  use open OUT => ':utf8';

2. A better fix will be to not use STDOUT for building the .re files.

Either change is unfit for the 3.4.3 release, which will be the terminal
release for the 3.4 branch. The untested one-line possible fix may not work and
may not quiet the warning while possibly breaking the rules involved. The
refactoring of .re generation is simply too big to put in the final cleanup of
the 3.4 branch.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2019-02-03 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

Robert R. Richter  changed:

   What|Removed |Added

 CC||duncan@tucan-entertainment.
   ||com

--- Comment #9 from Robert R. Richter  ---
same problem here under Gentoo:

Wide character in print at /usr/bin/sa-compile line 433, <$fh> line 9716.
Wide character in print at /usr/bin/sa-compile line 433, <$fh> line 10641.

spamassassin 3.4.2 and perl 5.26.2

I am also using spamassassin.heinlein-support.de

Any news on this topic?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2018-11-17 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

Henrik Krohns  changed:

   What|Removed |Added

 Depends on||7656


Referenced Bugs:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656
[Bug 7656] UTF8 rules, normalize_charset etc overhaul
-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2018-11-14 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

--- Comment #8 from e...@gmx.net ---
Thanks Henrik. I notified the maintainers of spamassassin.heinlein-support.de
and pointed them here.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2018-11-14 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

--- Comment #7 from Henrik Krohns  ---
Noticed something that made me think of this bug..
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=5691#c16

"SA rules files are encoded in ISO-8859-1, not UTF-8.  You have to either
encode
Japanese characters in pattern tests using \x sequences or develop a new
feature
adding support for UTF-8 config files to SA."

I don't know if this (still) true of false, but perhaps we should clarify this
somewhere and optionally reject any non-ascii configuration lines. No time to
investicate right now.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2018-11-13 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

Todd Rinaldo  changed:

   What|Removed |Added

 CC||to...@cpanel.net

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2018-11-11 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

apache-bugzi...@andre.geddert.net changed:

   What|Removed |Added

 CC||apache-bugzilla@andre.gedde
   ||rt.net

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2018-11-09 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

Henrik Krohns  changed:

   What|Removed |Added

 CC||h...@hege.li

--- Comment #6 from Henrik Krohns  ---
There's some utf8 rules, for example

(I've used "cat -v" to print them..)
body HS_BODY_899 /The seller hasnM-CM-"M-bM-^BM-,M-bM-^DM-"t provided any
postage details yet/
body HS_BODY_1575 /diesem Grund folgende Zahlung zu stornieren. Um den
dafM-CM-

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2018-11-09 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

--- Comment #4 from Jan Brodda  ---
(In reply to Bill Cole from comment #3)
> What version of Perl are you using?

Perl 5.22.1 on Ubuntu 16.04.5

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2018-11-09 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

--- Comment #5 from Fabian Dellwing  ---
> [14:50 root@mail ~] > perl -V
> Summary of my perl5 (revision 5 version 18 subversion 2) configuration:
>
>   Platform:
> osname=linux, osvers=4.4.0-127-generic, 
> archname=i686-linux-gnu-thread-multi-64int
> uname='linux lgw01-amd64-009 4.4.0-127-generic #153-ubuntu smp sat may 19 
> 10:58:46 utc 2018 i686 i686 i686 gnulinux '
> config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN 
> -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 
> -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions 
> -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -Wl,-z,relro 
> -Dcccdlflags=-fPIC -Darchname=i686-linux-gnu -Dprefix=/usr 
> -Dprivlib=/usr/share/perl/5.18 -Darchlib=/usr/lib/perl/5.18 
> -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 
> -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.18.2 
> -Dsitearch=/usr/local/lib/perl/5.18.2 -Dman1dir=/usr/share/man/man1 
> -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 
> -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl 
> -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm 
> -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib 
> -Dlibperl=libperl.so.5.18.2 -des'
> hint=recommended, useposix=true, d_sigaction=define
> useithreads=define, usemultiplicity=define
> useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
> use64bitint=define, use64bitall=undef, uselongdouble=undef
> usemymalloc=n, bincompat5005=undef
>   Compiler:
> cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector 
> -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE 
> -D_FILE_OFFSET_BITS=64',  
>   
> optimize='-O2 -g',
>   
>   
>
> cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector 
> -fno-strict-aliasing -pipe -I/usr/local/include'  
>   
>  
> ccversion='', gccversion='4.8.4', gccosandvers='' 
>   
>   
>
> intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
>   
>   
>
> d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12   
>   
>   
>
> ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', 
> lseeksize=8   
>   
>  
> alignbytes=4, prototype=define
>   
>   
>
>   Linker and Libraries:   
>   
>   
>
> ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'   
>   
>   
>
> libpth=/usr/local/lib /lib/i386-linux-gnu /lib/../lib 
> /usr/lib/i386-linux-gnu /usr/lib/../lib /lib /usr/lib 
>   
>
> libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt 
>   
>   
>
> perllibs=-ldl -lm -lpthread -lc -lcrypt   
>   
>   
>
> libc=, so=so, useshrplib=true, 

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2018-11-09 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

Bill Cole  changed:

   What|Removed |Added

 CC||billc...@apache.org

--- Comment #3 from Bill Cole  ---
What version of Perl are you using?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2018-11-09 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

--- Comment #2 from Jan Brodda  ---
(In reply to eqx from comment #1)
> I get the same after adding the heinlein channel.

I am using the same SA rules from channel "spamassassin.heinlein-support.de",
so this might be a similarity here..

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7645] Wide character in print at /usr/bin/sa-compile line 433

2018-11-08 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645

Jan Brodda  changed:

   What|Removed |Added

 CC||sa-bugzi...@janbrodda.de

-- 
You are receiving this mail because:
You are the assignee for the bug.