Bug#605142: sed: incorrectly interprets \x hexadecimal escapes sometimes

2017-02-24 Thread Assaf Gordon

Hello,

I've added a paragraph about the current behavior
with your example to sed's manual:

https://git.savannah.gnu.org/cgit/sed.git/commit/?id=a805d57e1f6427b55

regards,
- assaf



Bug#605142: sed: incorrectly interprets \x hexadecimal escapes sometimes

2017-02-23 Thread Assaf Gordon


Hello,

Picking up an old sed bug:

Matija Nalis wrote:

echo 'a^c' | sed -e 's/\x5e/b/'
should produce output "abc" (as it does in "ssed" or "perl -pe"), but 
in GNU sed it produces "ba^c".


I agree this is indeed GNU sed's behavior,
but it's not clear it is incorrect behavior (or if there is 
a correct one).


GNU sed internally converts escape seqeunces to their
corresponding characters *before* passing it on
to glibc's regex engine (or doing other operations).

It is glibc's regex that does not support unescaping,
which is why this is needed.

Few additional cases:

Un-escaping works in other sed commands:

   $ echo 'a^c' | sed 'y/\x5e/b/'
   abc

And gnu grep, which doesn't unescape, passes the string
to the regex engine, which then doesn't match at all:

   $ echo 'a^c' | gnu-grep-2.26 '\x5e'

As opposed to freebsd's grep:

   $ echo 'a^c' | freebsd-grep-2.5.1 '\x5e'
   a^c

Many other sed implementations don't support unescaping
at all, and return:

   $ echo 'a^c' | sed 's/\x5e/b/'
   a^c
   $ echo 'a^c' | sed 'y/\x5e/b/'
   sed: 1: "y/\x5e/b/": transform strings are not the same length

This is the case for sed from FreeBSD, OpenBSD, BusyBox, ToyBox.

Lastly, POSIX does not mention anything about '\xXX' escape
sequences, so this is left to implementors to decide.

As such, I would suggest marking #605142 as notabug and closing it.

I will add some info about it to sed's manual.

If someone wants to suggest this as a new feature,
please write to sed-de...@gnu.org .
However since this breaks existing behavior, the bar to accepting
it might be high.

regards,
- assaf



Bug#605142: sed: incorrectly interprets \x hexadecimal escapes sometimes

2010-11-27 Thread Matija Nalis
Package: sed
Version: 4.2.1-7
Severity: normal


If \x escape sequence in LHS resolves to some character which has special
meaning in sed LHS (like ^, \, ...), GNU sed does not use it as
hex-escaped, but instead as if it was literaly that character.

For example:
echo 'a^c' | sed -e 's/\x5e/b/'
should produce output abc (as it does in ssed or perl -pe), but in GNU
sed it produces ba^c.

or,

echo 'a\\c' | sed -e 's/\x5c/b/'
should again produce output abc, but in GNU sed it terminates sed with error:
sed: -e expression #1, char 10: Trailing backslash
(as it interprets it as sed -e 's/\/b/' and not as sed -e 's/\\/b/')

There are several more such problematic characters.

Proposed solution: all such \xNN escapes should be treated as if the characters 
are really escaped, for example '\x5e' in LHS should be treated like '\^' and
not like special-meaning '^'.

-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (650, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores)
Locale: LANG=hr_HR, LC_CTYPE=hr_HR (charmap=ISO-8859-2)
Shell: /bin/sh linked to /bin/dash

Versions of packages sed depends on:
ii  dpkg  1.15.8.5   Debian package management system
ii  install-info  4.13a.dfsg.1-6 Manage installed documentation in 
ii  libc6 2.11.2-7   Embedded GNU C Library: Shared lib
ii  libselinux1   2.0.96-1   SELinux runtime shared libraries

sed recommends no packages.

sed suggests no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org