On Wed, 18 Sep 2013, Axel Rau wrote:

Ack .. but to do it right, I think we need to know which encoding was used
in the first place. Well, ok, to get started something that simply
"fixes"/discards invalid UTF-8 may work decently enough and in any case
better than what we currently have ;)
Ack to both.



Axel
PS: I have some experimental code handy, which should do the job.


Any place to look at it? ;)
https://www.chaos1.de/downloads/utf-fix-2.c
I think a script function (like utf8fix())
Indeed. (-;
would probably be a good and fast enough to implement solution. A very
basic fix function would probably simply remove those invalid sequences,
but if there is more elaborate fixing possible, I am all interested in it
(again, I think the ultimate solution must be a conversion based on known
charset).
Yes, but there are cases where it is not known.
Example: Mail clients, producing headers in local encoding, without using the 
correct escapes.
(A correct example would be:
        Subject: =?iso-8859-1?Q?Oh_Du_wundersch=F6ner_M=E4rz?=


and there are always going to be cases where the submitter just gets it wrong.

Since the world seems to be going UTF8 and UTF8 is a strict superset of the ASCII that rsyslog has traditionally supported, I think there is always going to be a need to sanitize this.

I would suggest enhancing the control character escape handling to have a new option

EscapeInfvalidUTF8

any byte sequences that are not valid UTF8 get changed to #nnn just like control characters.

This will allow people to paper over many of the problems and deal with senders that submit invalid UTF8 in the future.

After this we can talk about conversion routines.

David Lang

P.S. If you do muck with the control character handling, please add another config option to not escape tabs
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to