-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 6 Jan 2010 16:14:59 +0100 Marc Schiffbauer <[email protected]> wrote:
> which encoding should be chosen for the database when using postgres? As far as I understand the syslog protocol (at least the legacy one), it has no concept of character encodings at all. So if you simply want to make sure that everything ends up in the database "as is", then choose SQL_ASCII. > My rsyslog version is 4.4.3. > > Which client_encoding does rsyslog use in ompgsql? Right now, it does net set an encoding by itself, so the database default applies. If I'm not mistaken, you can even set that per user from inside of postgres. So I would rather vote against another configuration parameter here. > I currently have set UTF-8 on the database. It worked for a while until > some special message arrived at the server where postgres denies the INSERT: > > 2010-01-06 16:13:11 CET syslog syslog ERROR: invalid byte sequence for > encoding "UTF8": 0xd220 > 2010-01-06 16:13:11 CET syslog syslog HINT: This error can also happen if > the byte sequence does not match the encoding expected by the server, which > is controlled by "client_encoding". Were you able to isolate the message? Or find out which program was sending it? > Now rsyslog is not able to log anything... it is currently spooling to disk > because it "hangs" at this message not being accepted by postgres. This is bad, because if the machine is an open syslog server that simply collects everything it gets, we have a potential DoS vector here. I can think of three options: * Drop the message and report that we did so. That would be rather easy, but might not be what people want. * Re-insert the message after converting it from ASCII to UTF-8 or whatever the DB encoding is. But this might/will produce garbage if the input is not ASCII. It also creates more load on the system if these messages are frequent. Guessing the input encoding is hard or even impossible, depending on the set you guess from. * Make the database SQL_ASCII. This will silently accept anything but will create nonsense from UTF/UCS encoded messages. Also might create trouble for programs like phplogcon that analyze the logs. For me, this sums up to one question: Can we make ompgsql UTF/UCS-clean and at the same time not choke on non-UTF8 strings? Everyone is trying to be UTF-8 clean these days, so it would be bad if ompgsql could not keep up. Comments please. Regards, Jakab Haufe (sur5r) - -- ceterum censeo microsoftem esse delendam. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAktXSW8ACgkQ1YAhDic+adbqXACeIJcx6GW6PhSXFO1YF72PafJG 7t8AoLNwnJYMZ4bssqMZt/nkTIPWs0LI =vuWN -----END PGP SIGNATURE----- _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

