bug#16578: Wish: Support for non-native endianness in od
Pádraig Brady p...@draigbrady.com writes: Attached in the patch I intend to push in your name. Nice. I also added docs to usage() and the texinfo file, and added a test. I don't quite understand how the test works, but as far as I see, it doesn't test floats? So that's inconsistent with the commit message. BTW I checked if there was any speed difference with the new code. I wasn't expecting this to be a bottleneck, and true enough there is only a marginal change. The new code is consistently a little _faster_ though on my i3-2310M which is a bit surprising. Odd. But performance of x86 is usually pretty hard to predict by just looking at the source or assembly code. I was hoping that in the non-swapped case, the false conditional if (input_swap sizeof(T) 1) should be very friendly to the branch predictor, and hence almost free. Jim Meyering j...@meyering.net writes: One nit: please change the type of j here (identical in attached) to be unsigned, to match that of the upper bound. Makes sense. In my own projects, I tend to use unsigned int for loop counts whereever I don't need to iterate over any negative values. But my impression is that most others prefer to use signed int for everything which doesn't rely on mod 2^n arithmetic, so that's why I made j signed here. That would be our first use of rev. Is it ubiquitous enough to depend on? It appears *not* to be available on my closest solaris box. While on my gnu/linux system, it's provided by util-linux. For the test, I guess rev could be implemented something like while read line printf %s line | tr -d '\n' | sed 's/./.\n/' | tac | tr -d '\n' echo done Maybe rev should be provided by coreutils, similarly to tac? I'd prefer not to think about the unicode issues for rev, though... Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance.
bug#16578: Wish: Support for non-native endianness in od
On 02/09/2014 08:42 AM, Niels Möller wrote: Pádraig Brady p...@draigbrady.com writes: Attached in the patch I intend to push in your name. Nice. I also added docs to usage() and the texinfo file, and added a test. I don't quite understand how the test works, but as far as I see, it doesn't test floats? So that's inconsistent with the commit message. Oops, I removed an 'f' while developing. Added that back now which also gets sizes up to 16 tested. BTW I checked if there was any speed difference with the new code. I wasn't expecting this to be a bottleneck, and true enough there is only a marginal change. The new code is consistently a little _faster_ though on my i3-2310M which is a bit surprising. Odd. But performance of x86 is usually pretty hard to predict by just looking at the source or assembly code. I was hoping that in the non-swapped case, the false conditional if (input_swap sizeof(T) 1) should be very friendly to the branch predictor, and hence almost free. Jim Meyering j...@meyering.net writes: One nit: please change the type of j here (identical in attached) to be unsigned, to match that of the upper bound. Makes sense. In my own projects, I tend to use unsigned int for loop counts whereever I don't need to iterate over any negative values. But my impression is that most others prefer to use signed int for everything which doesn't rely on mod 2^n arithmetic, so that's why I made j signed here. done That would be our first use of rev. Is it ubiquitous enough to depend on? Ugh good point. It appears *not* to be available on my closest solaris box. While on my gnu/linux system, it's provided by util-linux. For the test, I guess rev could be implemented something like while read line printf %s line | tr -d '\n' | sed 's/./.\n/' | tac | tr -d '\n' echo done I went with: rev() { while read line; do printf '%s' $line | sed 's/./\n/g' | tac | paste -s -d '' done } Maybe rev should be provided by coreutils, similarly to tac? I'd prefer not to think about the unicode issues for rev, though... I think so too. It's not Linux specific and we've previously mentioned rev in alternative for adding various functionality to coreutils. Thanks to both of you for the review! I've now pushed: http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=b370924c Pádraig.
bug#16578: Wish: Support for non-native endianness in od
On 02/10/2014 01:59 AM, Paul Eggert wrote: Pádraig Brady wrote: $ time od.new -tx8 --endian=bug od.in 4.97 elapsed If you really used --endian=bug and there was no diagnostic, then there must have been a bug. :-) Ha! I retyped incorrectly rather than copy/pasted. I can confirm the params are checked correctly: $ od -tx8 --endian=bug od.in od: invalid argument ‘bug’ for ‘--endian’ Valid arguments are: - ‘little’ - ‘big’ Try 'src/od --help' for more information.
bug#16578: Wish: Support for non-native endianness in od
On 02/02/2014 01:20 AM, Pádraig Brady wrote: On 01/31/2014 09:44 AM, Niels Möller wrote: ni...@lysator.liu.se (Niels Möller) writes: Pádraig Brady p...@draigbrady.com writes: I agree this would be useful and easy enough to add. I suppose the interface would be --endian=little|big Maybe I can have a look at what it takes. Below is a crude patch (missing: usage message, tests cases, docs, translation). I think it should work fine for floats too. I see no obvious and more beautiful way to do it. (And I think I have copyright assignment papers for coreutils in place, since work on factor some year ago). Regards, /Niels diff --git a/src/od.c b/src/od.c index 514fe50..a71e302 100644 --- a/src/od.c +++ b/src/od.c @@ -259,13 +259,16 @@ static enum size_spec integral_type_size[MAX_INTEGRAL_TYPE_SIZE + 1]; #define MAX_FP_TYPE_SIZE sizeof (long double) static enum size_spec fp_type_size[MAX_FP_TYPE_SIZE + 1]; +bool input_swap; + static char const short_options[] = A:aBbcDdeFfHhIij:LlN:OoS:st:vw::Xx; /* For long options that have no equivalent short option, use a non-character as a pseudo short option, starting with CHAR_MAX + 1. */ enum { - TRADITIONAL_OPTION = CHAR_MAX + 1 + TRADITIONAL_OPTION = CHAR_MAX + 1, + ENDIAN_OPTION, }; static struct option const long_options[] = @@ -278,6 +281,7 @@ static struct option const long_options[] = {strings, optional_argument, NULL, 'S'}, {traditional, no_argument, NULL, TRADITIONAL_OPTION}, {width, optional_argument, NULL, 'w'}, + {endian, required_argument, NULL, ENDIAN_OPTION }, {GETOPT_HELP_OPTION_DECL}, {GETOPT_VERSION_OPTION_DECL}, @@ -406,7 +410,21 @@ N (size_t fields, size_t blank, void const *block, \ { \ int next_pad = pad * (i - 1) / fields;\ int adjusted_width = pad_remaining - next_pad + width;\ - T x = *p++; \ + T x; \ + if (input_swap sizeof(T) 1) \ +{ \ + int j;\ + union { \ +T x;\ +char b[sizeof(T)]; \ + } u; \ + for (j = 0; j sizeof(T); j++) \ +u.b[j] = ((const char *) p)[sizeof(T) - 1 - j]; \ + x = u.x; \ +} \ + else \ +x = *p; \ + p++; \ ACTION; \ pad_remaining = next_pad; \ } \ @@ -1664,6 +1682,24 @@ main (int argc, char **argv) traditional = true; break; +case ENDIAN_OPTION: + if (!strcmp (optarg, big)) +{ +#if !WORDS_BIGENDIAN + input_swap = true; +#endif +} + else if (!strcmp (optarg, little)) +{ +#if WORDS_BIGENDIAN +input_swap = true; +#endif +} + else +error (EXIT_FAILURE, 0, + _(bad argument '%s' for --endian option), optarg); + break; + /* The next several cases map the traditional format specification options to the corresponding modern format specs. GNU od accepts any combination of old- and That looks good. I'll adjust slightly to use XARGMATCH and add some docs/tests. I'm travelling at the moment but merge this soon. Attached in the patch I intend to push in your name. I changed the option handling to reuse the XARGMATCH functionality. Also I changed things slightly so as the last --endian option specified wins. Previously we only set the input_swap variable to true, never to false. On a related point I set the input_swap global to be static. I also added docs to usage() and the texinfo file, and added a test. BTW I checked if there was any speed difference with the new code. I wasn't expecting this to be a bottleneck, and true enough there is only a marginal change. The new code is consistently a little _faster_ though on my i3-2310M which
bug#16578: Wish: Support for non-native endianness in od
On Sat, Feb 8, 2014 at 2:01 PM, Pádraig Brady p...@draigbrady.com wrote: + if (input_swap sizeof(T) 1) \ +{ \ + int j;\ The new patch looks complete. Thanks to both of you. One nit: please change the type of j here (identical in attached) to be unsigned, to match that of the upper bound. + union { \ +T x;\ +char b[sizeof(T)]; \ + } u; \ + for (j = 0; j sizeof(T); j++) \ +u.b[j] = ((const char *) p)[sizeof(T) - 1 - j]; \ Re this function in the new test, +in_swapped() { printf '%s' $in | sed s/.\{$1\}/\\n/g | rev | tr -d '\n'; } That would be our first use of rev. Is it ubiquitous enough to depend on?
bug#16578: Wish: Support for non-native endianness in od
On 01/31/2014 09:44 AM, Niels Möller wrote: ni...@lysator.liu.se (Niels Möller) writes: Pádraig Brady p...@draigbrady.com writes: I agree this would be useful and easy enough to add. I suppose the interface would be --endian=little|big Maybe I can have a look at what it takes. Below is a crude patch (missing: usage message, tests cases, docs, translation). I think it should work fine for floats too. I see no obvious and more beautiful way to do it. (And I think I have copyright assignment papers for coreutils in place, since work on factor some year ago). Regards, /Niels diff --git a/src/od.c b/src/od.c index 514fe50..a71e302 100644 --- a/src/od.c +++ b/src/od.c @@ -259,13 +259,16 @@ static enum size_spec integral_type_size[MAX_INTEGRAL_TYPE_SIZE + 1]; #define MAX_FP_TYPE_SIZE sizeof (long double) static enum size_spec fp_type_size[MAX_FP_TYPE_SIZE + 1]; +bool input_swap; + static char const short_options[] = A:aBbcDdeFfHhIij:LlN:OoS:st:vw::Xx; /* For long options that have no equivalent short option, use a non-character as a pseudo short option, starting with CHAR_MAX + 1. */ enum { - TRADITIONAL_OPTION = CHAR_MAX + 1 + TRADITIONAL_OPTION = CHAR_MAX + 1, + ENDIAN_OPTION, }; static struct option const long_options[] = @@ -278,6 +281,7 @@ static struct option const long_options[] = {strings, optional_argument, NULL, 'S'}, {traditional, no_argument, NULL, TRADITIONAL_OPTION}, {width, optional_argument, NULL, 'w'}, + {endian, required_argument, NULL, ENDIAN_OPTION }, {GETOPT_HELP_OPTION_DECL}, {GETOPT_VERSION_OPTION_DECL}, @@ -406,7 +410,21 @@ N (size_t fields, size_t blank, void const *block, \ { \ int next_pad = pad * (i - 1) / fields;\ int adjusted_width = pad_remaining - next_pad + width;\ - T x = *p++; \ + T x; \ + if (input_swap sizeof(T) 1) \ +{ \ + int j;\ + union { \ +T x;\ +char b[sizeof(T)]; \ + } u; \ + for (j = 0; j sizeof(T); j++) \ +u.b[j] = ((const char *) p)[sizeof(T) - 1 - j]; \ + x = u.x; \ +} \ + else \ +x = *p; \ + p++; \ ACTION; \ pad_remaining = next_pad; \ } \ @@ -1664,6 +1682,24 @@ main (int argc, char **argv) traditional = true; break; +case ENDIAN_OPTION: + if (!strcmp (optarg, big)) +{ +#if !WORDS_BIGENDIAN + input_swap = true; +#endif +} + else if (!strcmp (optarg, little)) +{ +#if WORDS_BIGENDIAN +input_swap = true; +#endif +} + else +error (EXIT_FAILURE, 0, + _(bad argument '%s' for --endian option), optarg); + break; + /* The next several cases map the traditional format specification options to the corresponding modern format specs. GNU od accepts any combination of old- and That looks good. I'll adjust slightly to use XARGMATCH and add some docs/tests. I'm travelling at the moment but merge this soon. thanks! Pádraig.
bug#16578: Wish: Support for non-native endianness in od
ni...@lysator.liu.se (Niels Möller) writes: Pádraig Brady p...@draigbrady.com writes: I agree this would be useful and easy enough to add. I suppose the interface would be --endian=little|big Maybe I can have a look at what it takes. Below is a crude patch (missing: usage message, tests cases, docs, translation). I think it should work fine for floats too. I see no obvious and more beautiful way to do it. (And I think I have copyright assignment papers for coreutils in place, since work on factor some year ago). Regards, /Niels diff --git a/src/od.c b/src/od.c index 514fe50..a71e302 100644 --- a/src/od.c +++ b/src/od.c @@ -259,13 +259,16 @@ static enum size_spec integral_type_size[MAX_INTEGRAL_TYPE_SIZE + 1]; #define MAX_FP_TYPE_SIZE sizeof (long double) static enum size_spec fp_type_size[MAX_FP_TYPE_SIZE + 1]; +bool input_swap; + static char const short_options[] = A:aBbcDdeFfHhIij:LlN:OoS:st:vw::Xx; /* For long options that have no equivalent short option, use a non-character as a pseudo short option, starting with CHAR_MAX + 1. */ enum { - TRADITIONAL_OPTION = CHAR_MAX + 1 + TRADITIONAL_OPTION = CHAR_MAX + 1, + ENDIAN_OPTION, }; static struct option const long_options[] = @@ -278,6 +281,7 @@ static struct option const long_options[] = {strings, optional_argument, NULL, 'S'}, {traditional, no_argument, NULL, TRADITIONAL_OPTION}, {width, optional_argument, NULL, 'w'}, + {endian, required_argument, NULL, ENDIAN_OPTION }, {GETOPT_HELP_OPTION_DECL}, {GETOPT_VERSION_OPTION_DECL}, @@ -406,7 +410,21 @@ N (size_t fields, size_t blank, void const *block, \ { \ int next_pad = pad * (i - 1) / fields;\ int adjusted_width = pad_remaining - next_pad + width;\ - T x = *p++; \ + T x; \ + if (input_swap sizeof(T) 1) \ +{ \ + int j;\ + union { \ +T x;\ +char b[sizeof(T)]; \ + } u; \ + for (j = 0; j sizeof(T); j++) \ +u.b[j] = ((const char *) p)[sizeof(T) - 1 - j]; \ + x = u.x; \ +} \ + else \ +x = *p; \ + p++; \ ACTION; \ pad_remaining = next_pad; \ } \ @@ -1664,6 +1682,24 @@ main (int argc, char **argv) traditional = true; break; +case ENDIAN_OPTION: + if (!strcmp (optarg, big)) +{ +#if !WORDS_BIGENDIAN + input_swap = true; +#endif +} + else if (!strcmp (optarg, little)) +{ +#if WORDS_BIGENDIAN +input_swap = true; +#endif +} + else +error (EXIT_FAILURE, 0, + _(bad argument '%s' for --endian option), optarg); + break; + /* The next several cases map the traditional format specification options to the corresponding modern format specs. GNU od accepts any combination of old- and -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance.
bug#16578: Wish: Support for non-native endianness in od
Pádraig Brady p...@draigbrady.com writes: On 01/28/2014 12:54 PM, Niels Möller wrote: For the od program, it would be nice with a flag to specify the endianness for all types which are larger than a byte. Possible alternatives could be big endian, little endian, native endian. I agree this would be useful and easy enough to add. I suppose the interface would be --endian=little|big Maybe I can have a look at what it takes. And for floats, besides endianness, it would be nice to be able to specify native format or ieee format, for systems where these are different. That's a bit less useful I think and harder to implement. I agree that's a bit more obscure. So I understand if you don't want to do that until there's some concrete usecase. Endianness for float types should be easier, I hope. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance.
bug#16578: Wish: Support for non-native endianness in od
On 01/28/2014 12:54 PM, Niels Möller wrote: For the od program, it would be nice with a flag to specify the endianness for all types which are larger than a byte. Possible alternatives could be big endian, little endian, native endian. I agree this would be useful and easy enough to add. I suppose the interface would be --endian=little|big We could augment that with specific byte order spec, but those two are probably enough. And for floats, besides endianness, it would be nice to be able to specify native format or ieee format, for systems where these are different. That's a bit less useful I think and harder to implement. We say this in the info docs: Almost all modern systems use IEEE-754 floating point, and it is typically portable to assume IEEE-754 behavior these days. thanks, Pádraig.
bug#16578: Wish: Support for non-native endianness in od
For the od program, it would be nice with a flag to specify the endianness for all types which are larger than a byte. Possible alternatives could be big endian, little endian, native endian. And for floats, besides endianness, it would be nice to be able to specify native format or ieee format, for systems where these are different. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance.