Bug#605142: sed: incorrectly interprets \x hexadecimal escapes sometimes
Hello, I've added a paragraph about the current behavior with your example to sed's manual: https://git.savannah.gnu.org/cgit/sed.git/commit/?id=a805d57e1f6427b55 regards, - assaf
Bug#605142: sed: incorrectly interprets \x hexadecimal escapes sometimes
Hello, Picking up an old sed bug: Matija Nalis wrote: echo 'a^c' | sed -e 's/\x5e/b/' should produce output "abc" (as it does in "ssed" or "perl -pe"), but in GNU sed it produces "ba^c". I agree this is indeed GNU sed's behavior, but it's not clear it is incorrect behavior (or if there is a correct one). GNU sed internally converts escape seqeunces to their corresponding characters *before* passing it on to glibc's regex engine (or doing other operations). It is glibc's regex that does not support unescaping, which is why this is needed. Few additional cases: Un-escaping works in other sed commands: $ echo 'a^c' | sed 'y/\x5e/b/' abc And gnu grep, which doesn't unescape, passes the string to the regex engine, which then doesn't match at all: $ echo 'a^c' | gnu-grep-2.26 '\x5e' As opposed to freebsd's grep: $ echo 'a^c' | freebsd-grep-2.5.1 '\x5e' a^c Many other sed implementations don't support unescaping at all, and return: $ echo 'a^c' | sed 's/\x5e/b/' a^c $ echo 'a^c' | sed 'y/\x5e/b/' sed: 1: "y/\x5e/b/": transform strings are not the same length This is the case for sed from FreeBSD, OpenBSD, BusyBox, ToyBox. Lastly, POSIX does not mention anything about '\xXX' escape sequences, so this is left to implementors to decide. As such, I would suggest marking #605142 as notabug and closing it. I will add some info about it to sed's manual. If someone wants to suggest this as a new feature, please write to sed-de...@gnu.org . However since this breaks existing behavior, the bar to accepting it might be high. regards, - assaf
Bug#605142: sed: incorrectly interprets \x hexadecimal escapes sometimes
Package: sed Version: 4.2.1-7 Severity: normal If \x escape sequence in LHS resolves to some character which has special meaning in sed LHS (like ^, \, ...), GNU sed does not use it as hex-escaped, but instead as if it was literaly that character. For example: echo 'a^c' | sed -e 's/\x5e/b/' should produce output abc (as it does in ssed or perl -pe), but in GNU sed it produces ba^c. or, echo 'a\\c' | sed -e 's/\x5c/b/' should again produce output abc, but in GNU sed it terminates sed with error: sed: -e expression #1, char 10: Trailing backslash (as it interprets it as sed -e 's/\/b/' and not as sed -e 's/\\/b/') There are several more such problematic characters. Proposed solution: all such \xNN escapes should be treated as if the characters are really escaped, for example '\x5e' in LHS should be treated like '\^' and not like special-meaning '^'. -- System Information: Debian Release: squeeze/sid APT prefers testing APT policy: (650, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores) Locale: LANG=hr_HR, LC_CTYPE=hr_HR (charmap=ISO-8859-2) Shell: /bin/sh linked to /bin/dash Versions of packages sed depends on: ii dpkg 1.15.8.5 Debian package management system ii install-info 4.13a.dfsg.1-6 Manage installed documentation in ii libc6 2.11.2-7 Embedded GNU C Library: Shared lib ii libselinux1 2.0.96-1 SELinux runtime shared libraries sed recommends no packages. sed suggests no packages. -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org