This is not surprising since 'grep' is a standard POSIX utility. It uses POSIX 
locales 
(http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html#tag_20_55_08).
 So if you read the POSIX standard carefully, then you are going to find out 
the following: UTF-16 and UTF-32 cannot be supported in POSIX locales because 
these encoding forms imply using 2-byte and 4-byte code-units respectively 
making the encoding of '/' and '.' nonconforming.
Quoting http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html:

"Conforming implementations shall support one or more coded character
sets. Each supported locale shall include the portable character set,
which is the set of symbolic names for characters in Portable Character
Set.

...
POSIX.1-2008 places only the following requirements on the encoded values of 
the characters in the portable character set:

...

The encoded values associated with <slash> and <period> shall be
invariant across all locales supported by the implementation.

The encoded values associated with the members of the portable character
set are each represented in a single byte. Moreover, if the value is
stored in an object of C-language type char, it is guaranteed to be
positive (except the NUL, which is always zero)."

Another issue is that sizeof(wchar_t) is implementation defined. My
tests on Ubuntu show that sizeof(wchar_t) returns 4 (bytes) and you need
some other data type to store UTF-16 code units in a portable way.

I would say that this should not be fixed: you should use iconv in a
pipeline to do the appropriate grepping with UTF-8 (though this might be
resource-intensive for large XML files).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/374807

Title:
  grep does not work for UTF-16 files

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grep/+bug/374807/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to