Bug#401259: logcheck: logcheck needs to override locale for grep

2024-05-28 Thread Richard Lewis
On Sat, 3 Apr 2010 02:07:20 +0400 Dmitry Semyonov  wrote:
> Another reason to set LC_ALL=C is  grep slowness in UTF-8 locale.
> Depending on used patterns, I saw "grep -E" to be 6 times slower
> compared to C locale, and I guess this was not the worst case. The
> performance problem seems to be fixed in grep-2.6.* but it is not
> available in Debian yet.
>
> --

You can (and have always been able to) add

LC_ALL=C

or LANG=C
or any other locale setting


into logcheck.conf and it will be honoured.
I dont think debian should do this by default - we should assume the locale
is set up correctly - whichbit usually is


So i think we can close this bug.


Bug#401259: logcheck: logcheck needs to override locale for grep

2010-04-02 Thread Dmitry Semyonov
Another reason to set LC_ALL=C is  grep slowness in UTF-8 locale.
Depending on used patterns, I saw grep -E to be 6 times slower
compared to C locale, and I guess this was not the worst case. The
performance problem seems to be fixed in grep-2.6.* but it is not
available in Debian yet.

-- 
...Bye..Dmitry.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#401259: logcheck: logcheck needs to override locale for grep

2009-08-24 Thread Frédéric Brière
Note to whoever will fix this: take the occasion to revert 30221d3 while
you're at it.


-- 
A Linux machine!  Because a 486 is a terrible thing to waste!
-- Joe Sloan, j...@wintermute.ucr.edu



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#401259: logcheck: logcheck needs to override locale for grep

2009-08-23 Thread Frédéric Brière
On Sat, Dec 02, 2006 at 01:17:28AM -0500, Chris Hanson wrote:
 The reason it doesn't match is that the R in a circle character is
 encoded in the log file as using the ISO 8859-1 code 0xae, but this
 isn't a valid first byte of a UTF-8 code.  Consequently, the .
 pattern doesn't match it.  In fact, I don't think there's _any_ way to
 match this byte sequence in a UTF-8 locale.

I guess [eg]libc's regex functions are a bit strict about their input.
However, grep also comes with its own DFA-based functions, which are
more lax about encoding errors; they are normally skipped for multibyte
encodings, but can be forced with GREP_USE_DFA=1.

 Unfortunately I'm not sure what to do about this, because it's not
 obvious how the log-file messages relate to the locale.  This message

They don't, at least not reliably.  There's stuff in there, like ssh
usernames, that comes directly from nefarious people who don't give a
rat's ass about your particular selection of encoding.

 One thing that works in this case is to set LC_ALL=C prior to
 calling grep.  But if the log files sometimes contain UTF-8 coding,
 this will mess that up

I doubt this would be a problem.  Pretty much everything that is matched
explicitly in any rule (hostname, IP address, process ID) is in ASCII.
Any chunk of arbitrary data should be matched with something like .* or
[^[:space:]]+, which will work whether it was decoded or not.

Now, it's true that POSIX restricts the C locale to 7-bit characters,
but both grep and elibc appear to deal with binary characters just fine.


One unfortunate side-effect is that any error messages from grep will
therefore be in English, but that's probably a lesser evil.
(LC_MESSAGES cannot be left as is, since mixing different encodings is
not supported.)


-- 
Never trust an operating system you don't have sources for. ;-)
-- Unknown source



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#401259: logcheck: logcheck needs to override locale for grep

2006-12-01 Thread Chris Hanson
Package: logcheck
Version: 1.2.51
Severity: normal

Logcheck has an implicit assumption that the default locale should be
used by grep when processing log files.  However, that's not always
the case.  For example, I use the locale en_US.UTF-8, and
consequently grep assumes that its inputs are encoded as UTF-8.  But
the log files appear to be encoded as ISO 8859-1, which means that
sometimes my rules don't match.

Specifically, I have a rule that reads 

^\w{3} [ :0-9]{11} [._[:alnum:]-]+ kernel: input: .* as 
/class/input/input[0-9]+$

which is supposed to ignore messages from the kernel announcing input
devices.  But the following log line isn't handled right:

Dec  2 00:05:01 ravna kernel: input: Microsoft Microsoft Wireless Optical 
Mouse® 1.0A as /class/input/input3

The reason it doesn't match is that the R in a circle character is
encoded in the log file as using the ISO 8859-1 code 0xae, but this
isn't a valid first byte of a UTF-8 code.  Consequently, the .
pattern doesn't match it.  In fact, I don't think there's _any_ way to
match this byte sequence in a UTF-8 locale.

Unfortunately I'm not sure what to do about this, because it's not
obvious how the log-file messages relate to the locale.  This message
comes from the kernel, which presumably doesn't know what the locale
is.  Furthermore, this particular text is coming directly from the
device, and just being passed along by the kernel -- I have no idea if
USB specifies the character coding that is used in these strings, or
if it's just an uninterpreted sequence of bytes that are encoded any
way the manufacturer pleases.

One thing that works in this case is to set LC_ALL=C prior to
calling grep.  But if the log files sometimes contain UTF-8 coding,
this will mess that up  Perhaps the kernel log lines need to be
handled differently?

I hope you have a better idea about how to handle this.

-- System Information:
Debian Release: 4.0
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.19-cph1
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)

Versions of packages logcheck depends on:
ii  adduser  3.100   Add and remove users and groups
ii  cron 3.0pl1-99   management of regular background p
ii  debconf  1.5.9   Debian configuration management sy
ii  exim44.63-10 metapackage to ease exim MTA (v4) 
ii  exim4-daemon-lig 4.63-10 lightweight exim MTA (v4) daemon
ii  grep 2.5.1.ds2-6 GNU grep, egrep and fgrep
ii  lockfile-progs   0.1.10  Programs for locking and unlocking
ii  logtail  1.2.51  Print log file lines that have not
ii  mailx1:8.1.2-0.20050715cvs-1 A simple mail user agent
ii  sysklogd [system 1.4.1-20System Logging Daemon

Versions of packages logcheck recommends:
ii  logcheck-database 1.2.51 database of system log rules for t

-- debconf information:
  logcheck/changes:
  logcheck/install-note:


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]